When you think of neural networks, can you imagine how a computer simulates the human brain? Let me spoil it for you, it’s not magic elves doing all the dirty work. In order to understand how a neural network is able to `learn` things, we will have to see how it works internally.
Lets summarize what you will be reading in the next few paragraphs. It all comes down to mathematical operations, we have input data (known as features) which need to process with a weight (the value we end up learning) and a bias variable(which is used for optimization). We pass our data forward to the next layer of neurons through the interaction of weights and a bias modifier. Depending on the situation, we will need to normalize these values. We apply an activation function, which will normalize the value to something friendlier and easier to process, such as anything between 0 and 1. Then we calculate the difference between the output and the expected value, which we can consider the process of evaluating a cost function. We then need to optimize these processes to output the value closest to the expected, for that we need to view how the graph is behaving and modify the weights accordingly. The process of optimization is also known as backwards propagation. If that sound very complex, don’t worry. I will explain in detail what each part does and how it works.
Overview of all neural network components
As we can see from the summary, a neural network is composed of several components. We will look into the importance of variable initialization and what to keep in mind. We’ll go over forward propagation, what it does and how. We will be looking at the following architecture and breaking it down to pieces. I hope that by looking at the picture and working down to its core elements, you will have a better understanding of what each segment is and how it ends up learning.
What is forward propagation?
Remember that machine learning is the process of defining a function that will best fit your data. So think of forward propagation as the processing of the input data to reach a value of prediction based on that data. If were working on a classification model, the output will usually be a number indicating the chance that something is a certain class. That said, once we have a trained model, we can use forward propagation to predict an actual output.
We can visualize the forward flow of a neural network as the process of taking the input features to the output layer.
What is variable initialization?
This process is usually done once when designing the neural network, and it can be pretty tricky at first. First off, there are a few variables that need to be instantiated before processing a neural network. The weights and biased variable that will be used should be randomized when starting up the neural net. The reasoning behind random weight values, is so that each neuron will compute a different value straight from the first run. If all neurons are computing the same value for each feature, then it will either never learn anything, or take a large time to learning. The tricky part of initializing weights, is that you will most likely be working with a matrix, and that means that the shape of the weights need to be specified at this point. The size of the weights usually follows this pattern:
W(output layers, features)
For example, let’s say we have 5 features and 10 sets of data. With a 1 layer network. That means our X value will be a matrix of [5,10] and our weights will be of size [1, 5]. In some occasions, you might need to Transpose the matrix. This means that if you have a W[1,5] it will transform it into a W[5,1].
What is an activation function?
An activation function is a function that is applied to the output of each neuron, this will actually modify the value to a specific range which we can easily process. There are plenty of activation functions, such as sigmoid, tanh, relu, etc. Each one usually transforms our data in to a range of values, for example relu will give us a value between 0 to infinity on with a slope of 1/1. Where tanh will give us values between -1 and 1. This is also used to remove any linearity from the predicted model, allowing for a more complex algorithm to be learned.
What is a cost function?
I like to think of the cost function, as the comparison of what has been predicted to the actual output value. This will let the neural network how close it is to find the correct value. There can be various types of methods applied to find the actual cost of a function, one of the most common one is the mean square value error.
The lower the value we get for the cost function, the closer we are to predict the expected value. This can be a bit deceiving, since we can get a high accuracy then when taken into the real world use, we can still get bad classification issues. The case mentioned previously is known as overfitting the model.
Overfitting basically means that we have overtrained the model with our current test data. The model is specialising in detecting on those images. To avoid overfitting, you can create a deeper network or you could always add more test data. Usually a good concept to follow, is the 80/20 model, where 80% of the data is used for training and 20% for testing.
What is backwards propagation?
This is the process of finding the correct modification to the weights, so it can fit the model and give results that will predict with higher accuracy. To find that correct modification to the weights, we need to find out how the graph behaves in a given point and then deduce the optimal modification to the weight. This is also known as the derivative of the function.