1.1.4.2 Deep neural networks
Deep neural networks or Artificial neural network (ANN) can be
divided into three main parts: The input layer:we feed the attribute to the
nodes of the first layer.
The hidden layer :each node in the hidden layer takes it's
input from the previous layer which is the weighted sum of the outputs of the
previous layer given by the expression:
n
X WiXi (1.1) i=1
the neuron apply a function of the weighted sum ,this function
is called the activation function 1.3:
f(
Xn i=1
WiXi) (1.2)
The output layer:the nodes of the output layer takes the input
from the last hidden layer and apply it's own activation function
Definition 1.3
it is a measure of how accurate the prediction is compared
to the correct solution,the lower the error the better the performance.
[8]
?
5
|
|
1.1 Machine Learning
|
|
|
|
|
Index
|
Concept
|
Explanation
|
|
1
|
sigmoid function
|
1
f(x)
=
|
(1.3)
|
1 + e-x
|
2
|
ReLU
|
f(x) = max(0,x)
|
(1.4)
|
3
|
Leaky ReLU
|
f(x) = max(0.1x, x)
|
(1.5)
|
4
|
Softmax
|
exp(Zi)
|
(1.6)
|
softmax(Zi) = P
j exp(Zj)
|
5
|
Hyperbolic Tangent (Tanh)
|
(ex - e-x)
|
(1.7)
|
f(x) =
(ex + e-x)
|
Table 1.3: Activation Functions
· Note the process of calculating the
output of every layer and passing it to the next layer is called
Feed-Forward and it boils down to matrices
multiplication.
1.1.4.3 Error functions
Error functions 1.4 is a measure of how far a prediction is from
the right answer.
6
|
|
1.1 Machine Learning
|
|
|
|
|
|
Index
|
Concept
|
|
Explanation
|
|
1
|
Mean
error(MSE)
|
square
|
it takes the average of the squared sum of all the errors:
1
E(W, b) (ày -
= XN
yi)2
|
(1.8)
|
|
|
Mean
error(MSE)
|
absolute
|
it takes the average of the absolute valeu of the sum of all
the
1
E(W,b) |(ày - =
XN
|
errors:
(1.9)
|
|
|
Binary entropy:
|
cross-
|
it is mostly used in classification problems:
E(W,b) = - Xm
yilog(pi)
i=1
|
(1.10)
|
|
Table 1.4: error Functions
1.1.4.4 Optimization algorithms:
in order to improve the performance of the network we need to
minimize the error and find the optimal weights ,this process of framing a
problem and trying to minimize a value is called optimization.
Definition 1.4
optimization algorithms are a group of algorithms that
use mathematical tools in order to optimize weights and reach optimal
performance in neural networks. [8]
?
these are examples for optimiztion algorithms 1.5
7
1.1 Machine Learning
Index Concept Explanation
1 Batch gradient in this algorithm we use batchs of data to
update the weights iteratively
descent(BGD): in order to descent the slope of the curve until
we reach the minimal
error.
dE
LWi = -á (1.11) dWi
Wnext-step = Wcurrent + L (1.12)
2 Stochastic gradient stochastic is just a fancy way to say
random, this algorithm uses random
descent SGD): instances of the data instead of the entire batch
,this gives it the advantage of being faster than BGD,it is vastly used in
deep networks.
Table 1.5: optimization algorithms
|