This is a draft version (distributed in the hope that it will be useful) of the book Lightweight Machine Learning Classics with R by Marek Gagolewski.

Please submit any feature requests, remarks and bug fixes via the project site at github or by email. Thanks!

Copyright (C) 2020, Marek Gagolewski. This material is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).

You can access this book at:

Aims and Scope

Machine learning has numerous exciting real-world applications, including stock market prediction, speech recognition, computer-aided medical diagnosis, content and product recommendation, anomaly detection in security camera footages, game playing, autonomous vehicle operation and many others.

In this book we will take a unpretentious glance at the most fundamental algorithms which stood the test of time and which still form the basis for the state-of-the-art solutions of the modern-era AI, which is principally (big) data-driven. We will learn how to use the R language (R Development Core Team 2020) for implementing various stages of data processing and modelling activities. For a more in-depth treatment of R, refer to this book’s Appendices and, for instance, (Wickham & Grolemund 2017, Peng 2019, Venables et al. 2020).

We will provide solid underpinnings for further studies related to statistical learning, machine learning data science, data analytics and artificial intelligence, including (Bishop 2006, Hastie et al. 2017, James et al. 2017). We will appreciate the vital role of mathematics as a commonly accepted language for formalising data-intense problems and communicating their solutions. The book is aimed at readers who are yet to be fluent with university-level linear algebra, calculus and probability theory, such as 1st year undergrads or those who have forgotten all the maths they have learned and need a gentle, non-invasive, yet rigorous enough, introduction to the topic. For a nice, machine learning-focused introduction to mathematics alone, see, e.g., (Deisenroth et al. 2020).

About Me

I’m currently a Senior Lecturer in Applied AI at Deakin University in Melbourne, Australia and an Associate Professor in Data Science at Warsaw University of Technology, Poland, where I teach various courses related to R and Python programming, algorithms, data science and machine learning. This book was also influenced by my teaching experience at Data Science Retreat in Berlin, Germany.

I’m an author of several R and Python packages, including stringi, which is among the top 20 most often downloaded R extensions. I’m an author of more than 70 publications, my research interests include machine learning and optimisation algorithms, data aggregation and clustering, statistical modelling and scientific computing.


This book has been prepared with pandoc, Markdown and GitBook. R code chunks have been processed with knitr. A little help of bookdown, good ol’ Makefiles and shell scripts did the trick.

The following R packages are used or referred to in the text: bookdown, fastcluster, FNN, genie, ISLR, keras, knitr, Matrix, microbenchmark, pdist, recommenderlab, rpart, rpart.plot, scatterplot3d, stringi, tensorflow, titanic, vioplot.

During the writing of this book, I’ve been mostly listening to the music featuring John Coltrane, Krzysztof Komeda, Henry Threadgill, Albert Ayler, Paco de Lucia and Tomatito.