In my previous article on “Introduction to the perceptron algorithm” we had seen how a single layer perceptron model can be used to classify an OR gate. But when the same model was used to classify a XOR gate it failed miserably. The problem was with linearity, i.e. if the classes are not linearly separable “Single layer perceptron model” cannot classify the same. However, adding an extra set of input feature to the existing data-set solved the problem of nonlinearity and the perceptron model was able to classify the XOR gate. Adding an extra set of input feature altered the dimension of the data-set(from 2-D to 3-D) and in this dimension, the classes were linearly separable. But now the question arises, how do we get this extra set of input feature? Or let me put it this way can we transform our existing data-set in such a way that our perceptron can separate the classes? Let’s find it out!!!
Radial Basis Function Network
To begin with let’s start with a simple thought. Suppose you are sitting on a chair, reading a newspaper or some article on analyticsdefined and suddenly you feel an insect crawling onto your feet. What would be your first reaction, to calmly look down and see what is crawling on your feet? Or start thumping your legs as hard as you can? Most of you would go for the later one. Well, this is how biology works, the part of the body (in this case the legs) which is close to the insect will react first and may be later on your hands or eyes.This is the intuition behind the RBF network, instead of all the neurons firing at once to a particular set of input only that neuron will fire which is close to that input feature. Seem’s a bit confusing though, you will get a better idea once you look at this figure.
The figure on the left is what a typical RBF network looks like with a set of input nodes(in this case 3), output nodes (in this case 2) and some weights acting as links between the input and output nodes. Basically, the structure is same as a “Perceptron network”, but what makes it apart is representing the neurons in a weight space.The figure on the right shows the two neurons in a “weight space“(space where the weights represent the axes). But what to do with it? Well, these neurons act as nodal centers and any set of input feature which is close to it will cause the neuron to fire.
If you look at the figure carefully we have 3 input features in a particular set. This can also be represented in the weighted space. So, any set of feature which is close to any one particular neuron (say the red one) will cause it to fire more than the other one(pink neuron). Let’s look at another figure to make things more clear.
The above figure consists of four neurons(red, blue, yellow & pink) represented in a 2-D weighted space.The circle surrounding them represents their area of proximity and any set of input which is close to the nodal center will cause the neurons to fire. The closer, the firing will be more. There are also certain inputs which lie in the proximity area of two neurons(blue and yellow). This will cause both of them to fire at once.
How do we set this proximity limit? One thing we know for sure that distance plays a major role here i.e. as the distance of the inputs from the nodal centers increases the firing capacity of the neuron decreases. Can we use distance as a measure to decide the proximity limit? Fortunately yes, there are a lot of functions with this property but for now, we will use the famous “Gaussian function“.
Here ||x – w|| is the distance measured from input features to the nodal centers. Using the first figure,
Having said all these, let’s look at the algorithm behind the radial basis function. It’s just a two step approach.
Algorithm for the Radial Basis Function
- position the RBF centers
- calculate the actions of the RBF nodes using the above Gaussian function.
To find the position of the RBF centers we can randomly pick data points. We can also use some sort of algorithm to find optimal nodal centers. Now, what sort of algorithm can we use? The answer is “k-means algorithm“. K-means is the simplest and most efficient way to find the optimal RBF centers.
You can find the code for K-means algorithm in the following link https://akabhishekrony16.github.io/k-means
Using the above algorithm I have written the code for RBF function in python and we will use this code to transform the XOR gate data.
Code for the RBF network
Transforming the XOR gate feature points
For selecting the value of standard deviation in Gaussian function use the formula sigma = d/sqrt(2M) where ‘d’ is the maximum distance between the location of RBF centers and ‘M’ is the number of RBF neurons.
Our final data points after transforming looks like this. Now, why did I choose four RBF neurons? I leave it for you to find the answer (hint: look at the plot for XOR gate).
Ok, so now we have the data ready what to do next?Let’s feed this data to the perceptron model and see if it can classify the XOR gate now.
Code for the perceptron model
Our output matches perfectly with the target values, which seemed impossible for the Perceptron model alone to do. But with the help of RBF, we made it possible. Hope you enjoyed the article!!!