We have discussed a multinomial logistic regression model as a generalisation of the binary one.
This in turn is a special case of feed-forward neural networks.
There’s a lot of hype (again…) for deep neural networks in many applications, including vision, self-driving cars, natural language processing, speech recognition etc.
Many different architectures of neural networks and types of units are being considered in theory and in practice, e.g.:
- convolutional neural networks apply a series of signal (e.g., image) transformations in first layers, they might actually “discover” deskewing automatically etc.;
- recurrent neural networks can imitate long short-term memory that can be used for speech synthesis and time series prediction.
Main drawbacks of deep neural networks:
- learning is very slow, especially with very deep architectures (days, weeks);
- models are not explainable (black boxes) and hard to debug;
- finding good architectures is more art than science (maybe: more of a craftsmanship even);
- sometimes using deep neural network is just an excuse for being too lazy to do proper data cleansing and pre-processing.
There are many issues and challenges that will be tackled in more advanced AI/ML courses and books, such as (Goodfellow et al. 2016).
5.6.2 Beyond MNIST
The MNIST dataset is a classic, although its use in research is discouraged nowadays – the dataset is not considered challenging anymore – state of the art classifiers can reach \(99.8\%\) accuracy.
See Zalando’s Fashion-MNIST (by Kashif Rasul & Han Xiao) at https://github.com/zalandoresearch/fashion-mnist for a modern replacement.
5.6.3 Further Reading
Recommended further reading:
- keras package tutorials available at: https://cran.r-project.org/web/packages/keras/index.html and https://keras.rstudio.com