The Perceptron model forms the basis of any neural network. This is where it all began and finally led to the development of “Neural Networks” or “Deep Learning” which is kind of the buzz word nowadays. In this article, I am going to show the mathematics behind the well-known Perceptron algorithm using a 2-input 2-output model. Although some amount of linear algebra is required to understand the mechanics behind the Perceptron, I have tried to keep the mathematics part as simple as possible.
Before jumping to the brain storming part let’s go through some of the basic terminologies which I will be using while designing the model. Here is a diagram of the “2-input 2-output Perceptron model”.
From the above diagram, we can easily understand what the inputs, outputs, targets, and weights of a Perceptron are, however, let me give a brief explanation about the learning rate, epoch, Bias, Activation function, and Error.
The value of the learning rate decides how quickly the Perceptron learns, more precisely it decides at what rate the weights need to be updated (this part might seem confusing but I will explain it in detail later on).
Epoch is basically how many times the model needs to be trained on the same set of data so that the weights are finally adjusted such that our output meets the desired target desired target.
It computes the inaccuracies of the network as a function of the output and targets. The error plays a very important role in updating the weights.
It is a mathematical function that describes the firing of the neuron as a response to the weighted inputs, such as the threshold function.
Sometimes all the inputs going into the network might have a zero value resulting in Zero output which is not desirable. This can be fixed using a bias or I like to call a “pseudo input” that prevents such misery from happening.
Inside the Perceptron
Basically, what’s happening inside the Perceptron is that we use some weights or linkages to link the input nodes and output nodes as you can see in the above figure. These weights are nothing but some random values (for e.g. say 0.5) initially chosen which we keep on updating as we train the model to finally match the output with the desired target values. How this is done? To answer this question lets break the algebra step by step. Once again, I will be using the above figure.
Set all of the weights (Wij) to small (can be positive or negative) random numbers.
For say ‘n’ iterations or epochs:
For each input vector [X1, X2]:
Compute the activation or output:
Updating the weights:
Wij = Wij + ŋ * (Tj-Yj) * Xi
Breaking the Algebra using Matrices
Updating the weights of the Perceptron Algorithm
I hope the mathematics was simple enough to understand, but still, something is missing. Let’s use the above algebra to write down a code for implementing the perception model and also see some examples.
I have used python to code the Perceptron model.
The code given below is for a generalized Perceptron model i.e it can be used for any no of input and output nodes.
This is what a typical OR gate looks like with two inputs and target values as shown. Now, let’s use our Perceptron model to classify an OR gate using the inputs(input1, input2).
Well, our perceptron model output matches the target values. That’s good. Now let’s see another example.
Using the Perceptron model to identify the XOR Gate.
Oh, what is happening!!! It seems that our Perceptron model cannot identify the XOR GATE. What is the problem here? To answer these, let’s draw some plots.
Well, you can see that for the OR gate the points representing True can be linearly (green line) separated from the points indicating false. Let’s try these for the XOR gate.
But for the XOR gate fate is not on our side. The True and False points are not linearly separable (cannot draw a straight line in between). This means that our Perceptron model is a form of a linear classifier. Now the question arises? Is there a solution for this? Of course, there is. We need to transform the data to a higher dimension (What?). No need to panic it means that we just need to add an extra input. You will get a better understanding once you see the plot.
Adding a third input (input3) and now let’s plot the same.
This is a 3-D view of the data points. Now imagine your corner of the room as the three axes and two red balls (true) and two blue balls (false) lying in the exact positions as above. If we take a thin plastic sheet we can easily separate the blue from the red balls i.e. the balls lying below the sheet is red and above is blue.
Well, let’s see if the Perceptron model gets this idea.
I think fate has finally returned to our side, it’s working transforming the data to a higher dimension has made the Perceptron model to classify the XOR gate. This is how you tackle the linearity problem. But choosing the third input is a critical task. Now is there a solution to this? Might be but for now, let’s stick to linear problems. Hope you enjoyed the article!!!