Sunday, June 8, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

From the Perceptron to Adaline. Setting the foundations right | by Pan Cretan | Nov, 2023

November 28, 2023
in AI Technology
Reading Time: 6 mins read
0 0
A A
0
Share on FacebookShare on Twitter



Setting the foundations right

Photo by Einar Storsul on Unsplash

Introduction

In a previous article I tried to explain the most basic binary classifier that has likely ever existed, Rosenblatt’s perceptron. Understanding this algorithm has educational value and can serve as a good introduction in elementary machine learning courses. It is an algorithm that can be coded from scratch in a single afternoon and can spark interest, a sense of achievement and motivation to delve into more complex topics. Still, as an algorithm it leaves much to be desired because convergence is only guaranteed when the classes are linearly separable that is often not the case.

In this article we will continue the journey on mastering classification concepts. A natural evolution from the Rosenblatt’s perceptron is the adaptive linear neuron classifier, or adaline as it is colloquially known. Moving from the perceptron to adaline is not a big leap. We simply need to change the step activation function to a linear one. This small change leads to a continuous loss function that can be robustly minimised. This allows us to introduce many useful concepts in machine learning, such as vectorisation and optimisation methods.

In future articles we will also cover further subtle changes of the activation and loss functions that will take us from adaline to logistic regression, that is already a useful algorithm in daily practice. All of the above algorithms are essentially single layer neural networks and can be readily extended to multilayer ones. In this sense, this article takes the reader a step further through this evolution and builds the foundations to tackle more advanced concepts.

We will need some formulas. I used the online LaTeX equation editor to develop the LaTeX code for the equation and then the chrome plugin Maths Equations Anywhere to render the equation into an image. The only downside of this approach is that the LaTeX code is not stored in case you need to render it again. For this purpose I provide the list of equations at the end of this article. If you are not familiar with LaTex this may have its own educational value. Getting the notation right is part of the journey in machine learning.

Adaptive linear neuron classifier (adaline)

So what is the adaline algorithm? Adaline is a binary classifier as the perceptron. A prediction is made by using a set of input values for the features [x₁, .. , xₘ] where m is the number of features. The input values are multiplied with the weights [w₁, .. , wₘ] and the bias is added to obtain the net input z = w₁x₁ + .. + wₘxₘ + b. The net input is passed to the linear activation function σ(z) that is then used to make a prediction using a step function as with the perceptron:

A key difference with the perceptron is that the linear activation function is used for learning the weights, whilst the step function is only used for making the prediction at the end. This sounds like a small thing, but it is of significant importance. The linear activation function is differentiable whilst the step function is not! The threshold 0.5 above is not written in stone. By adjusting the threshold we can regulate the precision and recall according to our use case, i.e. based on what is the cost of false positives and false negatives.

In the case of adaline the linear activation function is simply the identity, i.e. σ(z) = z. The objective function (also known as loss function) that needs to be minimised in the training process is:

ℓ(w,b) = ∑(ℓ(z,y))

where w are the weights and b is the bias. The summation is over all of the examples in the training set. In some implementations the loss function also includes a 1/2 coefficient for convenience. This cancels out once we take the gradients of the loss function with respect to the weights and bias and, as we will see below, has no effect other than scaling the learning rate by a factor of 2. In this article we do not use the 1/2 coefficient.

For each example, we compute the square difference between the calculated outcome and the true class label. Note that the input vector is understood to be a matrix with shape (1, m), i.e. as we will see later is one row of our feature matrix x with shape (n, m).

The training is nothing else than an optimisation problem. We need to adjust the weights and bias so that the loss function is minimised. As with any minimisation problem we need to compute the gradients of the objective function with respect to the independent variables that in our case will be the weights and the bias. The partial derivative of the loss function with regard to the weight wⱼ is:

The last row introduces important matrix notation. The feature matrix x has shape (n, m) and we take the transpose of its column j, i.e. a matrix with shape (1, n). The true class labels y is a matrix with shape (n, 1). The net output of all samples z is also a matrix with shape (n, 1), that does not change after the activation that is understood to apply to each of its elements. The final result of the above formula is a scalar. Can you guess how we could express the gradients with respect to all weights using the matrix notation?
where the transpose of the feature matrix has shape (m, n). The end result of this operation is a matrix with shape (m, 1). This notation is important. Instead of using loops, we will be using exactly this matrix multiplication using numpy. In the era of neural networks and GPUs, the ability to apply vectorization is essential!

What about the gradient of the loss function with respect to the bias?

where the overbar denotes the mean of the vector under it. Once more, computing the mean with numpy is a vectorised operation, i.e. summation does not need to be implemented using a loop.

Once we have the gradients we can employ the gradient descent optimisation method to minimise the loss. The weights and bias terms are iteratively updated using:

where η is a suitable chosen learning rate. Too small values can delay convergence, whilst too high values can prevent convergence altogether. Some experimentation is needed, as is generally the case with the parameters of machine learning algorithms.

In the above implementation we assume that the weights and bias are updated based on all examples at once. This is known as full batch gradient descent and is one extreme. The other extreme is to update the weights and bias after each training example, that is known as stochastic gradient descent (SGD). In reality there is also some middle ground, known as mini batch gradient descent, where the weights and bias are updated based on a subset of the examples. Convergence is typically reached faster in this way, i.e. we do not need to run as many iterations over the whole training set, whilst vectorisation is still (at least partially) possible. If the training set is very large (or the model is very complex as is nowadays the case with the transformers in NLP) full batch gradient descent may simply be not an option.

Alternative formulation and closed form solution

Before we proceed with the implementation of adaline in Python, we will make a quick digression. We could absorb the bias b in the weight vector as:

in which case the net output for all samples in the training set becomes:
meaning that the feature matrix has been prepended with a column filled with 1, leading to a shape (n, m+1). The gradient with regard to the combined weights set becomes:
In principle we could derive a closed form solution given that at the minimum all gradients will be zero:
In reality the inverse of the matrix in the above equation may not exist because of singularities or it cannot be computed sufficiently accurately. Hence, such closed form solution is not used in practice neither in machine learning nor in numerical methods in general. Still, it is useful to appreciate that adaline resembles linear regression and as such it has a closed form solution.

Implementing adaline in Python

Our implementation will use mini batch gradient descent. However, the implementation is flexible and allows optimising the loss function using both stochastic gradient descent and full batch gradient descent as the two extremes. We will examine the convergence behaviour by varying the…



Source link

Tags: AdalineCretanFoundationsNovpanPerceptronSetting
Previous Post

Canonical releases low-touch private cloud MicroCloud

Next Post

Tornado Cash Token Plummets Following Binance Delisting Announcement

Related Posts

How insurance companies can use synthetic data to fight bias
AI Technology

How insurance companies can use synthetic data to fight bias

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset
AI Technology

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper
AI Technology

Decoding Decoder-Only Transformers: Insights from Google DeepMind’s Paper

June 9, 2024
How Game Theory Can Make AI More Reliable
AI Technology

How Game Theory Can Make AI More Reliable

June 9, 2024
Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs
AI Technology

Buffer of Thoughts (BoT): A Novel Thought-Augmented Reasoning AI Approach for Enhancing Accuracy, Efficiency, and Robustness of LLMs

June 9, 2024
Deciphering Doubt: Navigating Uncertainty in LLM Responses
AI Technology

Deciphering Doubt: Navigating Uncertainty in LLM Responses

June 9, 2024
Next Post
Tornado Cash Token Plummets Following Binance Delisting Announcement

Tornado Cash Token Plummets Following Binance Delisting Announcement

Converting Existing SAP HANA Project to Cloud Application Programming Model

Converting Existing SAP HANA Project to Cloud Application Programming Model

‘Three things that need to happen in Gaza…’: Elon Musk to Israeli President after visiting kibbutz near Gaza

'Three things that need to happen in Gaza...': Elon Musk to Israeli President after visiting kibbutz near Gaza

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
Accenture creates a regulatory document authoring solution using AWS generative AI services

Accenture creates a regulatory document authoring solution using AWS generative AI services

February 6, 2024
Managing PDFs in Node.js with pdf-lib

Managing PDFs in Node.js with pdf-lib

November 16, 2023
Graph neural networks in TensorFlow – Google Research Blog

Graph neural networks in TensorFlow – Google Research Blog

February 6, 2024
13 Best Books, Courses and Communities for Learning React — SitePoint

13 Best Books, Courses and Communities for Learning React — SitePoint

February 4, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In