Let’s introduce the architecture that revolutionized everything: the Neural Network. Most of the modern AI and Deep Learning advancements are based on Neural Networks.
It is a remarkably simple model inspired by the functioning of the brain. Today, we will look into the simplest form of a Neural Network, known as a Feed-forward Neural Network (FNN for short).
So, what exactly is a neural network?
The fundamental unit of the FNN is the neuron. It’s not as complex as it may seem. You’ve actually seen it before... The neuron is essentially the perceptron! Typically, it is represented as follows:
The neuron consists of a weighted sum (linear combination) of the inputs:
$$y = w_0 + w_1 x_1 + w_2 x_2 + \dots + w_k x_k$$
where $w_0$ is a bias parameter, $x_i$ is the i-th input, and $w_i$ is an arbitrary multiplication factor.$$y = w_0 + w_1 x_1 + w_2 x_2 + \dots + w_k x_k$$
The result $y$ is then transformed by passing it through a function $g(\cdot)$:
$$z = g(y)$$
The function $g(\cdot)$ is commonly referred to as the activation function.
$$y = w_0 + w_1 x_1 + w_2 x_2 + \dots + w_k x_k$$
$$z = g(y)$$
These equations completely define the concept of the Neuron within the context of the FNN.
Activation Functions are usually non-linear functions used to transform the linear output of a neuron. They help increase the expressive power of our Neural Network.
Among these options, ReLU is the most commonly used nowadays*.
*The ReLU function has numerous variations that are used instead to help mitigate some of its shortcomings.Neurons are usually assembled in a structure called a layer.
An arbitrary number of neurons can be arranged side by side to create a layer with $m$ outputs, where $m$ is the number of neurons in the layer.
This kind of layer is sometimes called a Dense layer, Fully-connected layer or Linear layer.
Layers, in turn, can be stacked on top of each other to enhance the complexity of the final model and increase its expressive power.
The output of a layer will be the input for the next layer.
More specifically, the single Neuron of a layer receives inputs from all the outputs of the previous layer, hence the name Fully-connected layer.
N.B. The propagation of signals goes only from the input to the output, there are no loops! So, that's why we call it a Feed-forward Neural Network!
Given a certain amount of neurons and layers, a FNN can approximate any possible function.
If it weren't for activation functions, a neural network would be just a sum of linear functions.
A sum of linear functions is still a linear function.
Activation functions introduce non-linearity in the model, allowing us to model functions that are non-linear.
Even though FNNs are conceptually simple, they can become quite big and complex very easily, which brings forth all sorts of problems: they can be hard to train and are prone to overfitting. Exercise with maximum care!