## 5.4 Deep Neural Networks

### 5.4.1 Introduction

In a brain, a neuron’s output is an input to another neuron.

We could try aligning neurons into many interconnected layers.

### 5.4.2 Activation Functions

Each layer’s outputs should be transformed by some non-linear activation function. Otherwise, we’d end up with linear combinations of linear combinations, which are linear combinations themselves.

Example activation functions that can be used in hidden (inner) layers:

• relu – The rectified linear unit: $\psi(t)=\max(t, 0),$
• sigmoid – The logistic sigmoid: $\phi(t)=1 / (1 + \exp(-t)),$
• tanh – The hyperbolic function: $\mathrm{tanh}(t) = (\exp(t) - \exp(-t)) / (\exp(t) + \exp(-t)).$

There is not much difference between them, but some might be more convenient to handle numerically than the others, depending on the implementation.

### 5.4.3 Example in R - 2 Layers

2-layer Neural Network 784-800-10

model <- keras_model_sequential()
layer_dense(model, units=800, activation='relu')
layer_dense(model, units=10,  activation='softmax')
compile(model, optimizer='sgd',
loss='categorical_crossentropy')
fit(model, X_train2, Y_train2, epochs=5)

Y_pred2 <- predict(model, X_test2)
Y_pred <- apply(Y_pred2, 1, which.max)-1 # 1..10 -> 0..9
mean(Y_test == Y_pred) # accuracy on the test set
## [1] 0.943

Performance metrics for each digit separately:

i Acc Prec Rec F TN FN FP TP
0 0.9941 0.9591226 0.9816327 0.9702471 8979 18 41 962
1 0.9951 0.9729965 0.9841410 0.9785370 8834 18 31 1117
2 0.9871 0.9379243 0.9370155 0.9374697 8904 65 64 967
3 0.9862 0.9404040 0.9217822 0.9310000 8931 79 59 931
4 0.9890 0.9386318 0.9501018 0.9443320 8957 49 61 933
5 0.9862 0.9207589 0.9248879 0.9228188 9037 67 71 825
6 0.9898 0.9439834 0.9498956 0.9469303 8988 48 54 910
7 0.9877 0.9440628 0.9357977 0.9399121 8915 66 57 962
8 0.9858 0.9434968 0.9086242 0.9257322 8973 89 53 885
9 0.9850 0.9223206 0.9296333 0.9259625 8912 71 79 938

### 5.4.4 Example in R - 6 Layers

6-layer Deep Neural Network 784-2500-2000-1500-1000-500-10

model <- keras_model_sequential()
layer_dense(model, units=2500, activation='relu')
layer_dense(model, units=2000, activation='relu')
layer_dense(model, units=1500, activation='relu')
layer_dense(model, units=1000, activation='relu')
layer_dense(model, units=500,  activation='relu')
layer_dense(model, units=10,   activation='softmax')
compile(model, optimizer='sgd',
loss='categorical_crossentropy')
fit(model, X_train2, Y_train2, epochs=5)

Y_pred2 <- predict(model, X_test2)
Y_pred <- apply(Y_pred2, 1, which.max)-1 # 1..10 -> 0..9
mean(Y_test == Y_pred) # accuracy on the test set
## [1] 0.973

Performance metrics for each digit separately:

i Acc Prec Rec F TN FN FP TP
0 0.9963 0.9748238 0.9877551 0.9812468 8995 12 25 968
1 0.9969 0.9876325 0.9850220 0.9863255 8851 17 14 1118
2 0.9942 0.9831349 0.9602713 0.9715686 8951 41 17 991
3 0.9941 0.9567723 0.9861386 0.9712335 8945 14 45 996
4 0.9953 0.9804728 0.9714868 0.9759591 8999 28 19 954
5 0.9948 0.9677060 0.9742152 0.9709497 9079 23 29 869
6 0.9956 0.9750520 0.9791232 0.9770833 9018 20 24 938
7 0.9935 0.9868554 0.9494163 0.9677739 8959 52 13 976
8 0.9928 0.9659091 0.9599589 0.9629248 8993 39 33 935
9 0.9925 0.9507722 0.9762141 0.9633252 8940 24 51 985