Statistical Learning 2018

This is the main website for the Statistical Learning course in autumn 2018, as part of the master Statistical Science for the Life and Behavioural Sciences at Leiden University. Visit this page regularly for changes and updates.

Instructor:	Tim van Erven (tim@ No spam, please timvanerven. No really, no spam nl, for general questions)
Teaching assistant:	Dirk van der Hoeven (d.van.der.hoeven@ No spam, please math.leidenuniv No really, no spam .nl, for questions about the homework)

IMPORTANT: Make sure to enroll in blackboard for grades and course updates, and sign up for the (resit) exam in uSis as soon as possible, but no later than ten calendar days before the actual (resit) exam takes place. (Otherwise I cannot register your grade and you do not get credit.)

General Information

This course gives an overview of techniques to automatically learn the structure, patterns and regularities in complicated data, and to use these patterns to predict future data. Statistical learning is very similar to an area within computer science called machine learning, since many methods have their origin in computer science (pattern recognition, artificial intelligence). The course load is 6 ECTS. The e-prospectus contains a longer course description.

The entry requirements for this year are:

Familiarity with least squares linear regression
Ability to program in R or in Python

Lectures and Exercise Sessions

Lectures take place on Thursdays on the dates indicated in the Course Schedule below, in room 407/409 of the Snellius Building, Niels Bohrweg 1, Leiden.

The first four weeks, course hours are 10h00-16h15. The last four weeks they are 11h00-15h15.

Examination Form

In order to pass the course, it is required to obtain a sufficient grade (5.5 or higher) on both of the following two:

Homework Projects. We will hand out two homework assignments. The final homework grade will be determined as an average of the grades for the two assignments, without any rounding.
A written open-book examination: Wednesday 9 January 14.00-17.00, rooms 408, 407/9 and 412; resit: Friday 1 February 14.00-17.00, rooms 403, 405. NB You are allowed to bring any information on paper to the exam, and it is recommended to bring the ESL book (see below). However, digital copies of the book will not be allowed.

The final grade will be determined as the average of the final homework grade and the final open-book examination. It will be rounded to half points, except for grades between 5 and 6, which will be rounded to whole points.

As an example of the types of questions on the exam, here you can find the exam from 2014, and the exam from 2015. The questions on the exam only cover a sample of the topics covered in class. The topics on this year’s exam are therefore likely to be different from the topics on the previous exams!

Course Materials

The main book for the course is The Elements of Statistical Learning (ESL), 2nd edition, by Hastie, Tibshirani and Friedman, Springer-Verlag 2009. In addition, we will use selected parts from Ch.18 of Computer Age Statistical Inference: Algorithms, Evidence and Data Science (CASI) by Efron and Hastie, Cambridge University Press, 2016. Some supplementary material will also be provided, as listed in the Course Schedule.

You do not need to buy the CASI book. It can be downloaded for free from the link above.

The ESL book can also be downloaded for free at the above link, but you will need a non-digital paper version for the final exam, which is open book! The standard edition is hard cover, but it might be interesting to get the much cheaper soft-cover edition for €24.99. To get the cheaper offer, open this link from an eduroam connection. Or, if that does not work, go here, sign in and use the search function to find the book. Then choose ‘View Online’ and follow the link to “SpringerLink Books Complete”.

About using Wikipedia and other online sources: trust Wikipedia as much as you would trust a fellow student who is also still learning. Some things are good, but other things are poorly explained or plain wrong, so always verify with a trusted source (a book or scientific paper). This holds doubly for any ‘data science’ blogs you might find online.

Course Schedule

Text in bold font indicates important changes made during the course. TBA=To Be Announced.

Date	Topics	Literature
Nov. 1: Introduction, Regression I	General introduction: statistical learning, supervised learning, regression and classification, incorporating nonlinearities by extending the features, overfitting, linear classifiers, nearest-neighbor classification, expected prediction error and Bayes-optimal prediction rule, curse of dimensionality. Interpretations of least squares as ERM and as maximum likelihood.	All of Chapter 1 and parts of Chapter 2 (Sections 2.1-2.5)
Nov. 8: Regression II: Model Selection	Model selection and overfitting: subset selection, shrinkage methods (ridge regression and lasso). Comparison of subset selection, ridge and lasso. Cross-validation.	Sections 3.1 and 3.2 up to 3.2.1. Sections 3.3 and 3.4 up to 3.4.3. Sections 7.10.1, 7.10.2. Optionally: 7.12
Nov. 15: Bayesian methods, Classification Part I	Bayesian methods in a nutshell: Bayesian marginal and predictive distribution, posterior, Laplace rule of succession. Regression: Bayes MAP interpretation of Ridge Regression and Lasso. Classification: Naive Bayes classifier, Naive Bayes and spam filtering.	Section 6.6.3. Optionally: Wikipedia on Naive Bayes [1, 2] (see Wikipedia caveat).
Nov. 22: Classification Part II	Linear Discriminant Analysis (LDA). Surrogate losses. Logistic regression. Discriminative vs. generative models: Naive Bayes versus Logistic Regression	Sections 4.1, 4.2, 4.3 (except 4.3.1, 4.3.2, 4.3.3), 4.4 (except 4.4.3). Additional literature: Andrew Y. Ng, Michael Jordan: On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, NIPS 2001.
Nov. 29: Classification Part III	Discussion of homework 1 by Dirk van der Hoeven Optimal separating hyperplanes, support vector machines (SVMs): the kernel trick, SVM learning as regularized hinge loss fitting	Sections 4.5.2, 12.2, 12.3.1, 12.3.2
Dec. 6: Classification Part IV	Classification and regression trees. Bagging, boosting (AdaBoost), boosting as forward stagewise additive modeling.	Sections 9.2, 8.7, 10.1, 10.2, 10.3., 10.4, 10.5, 10.6 (in 10.6 only the part about classification)
Dec. 13: Unsupervised Learning, Constructing Features	Clustering: K-means, EM with Gaussian Mixtures Basis Expansions and Regularization	Section 14.3 before 14.3.1; Sections 14.3.6, 14.3.7. Sections 5.1, 5.2 (except 5.2.3; in 5.2.1 you can skip the math at the end) NB. The book gives the wrong definition for K-means in Section 14.3.6; Additional material: correct definition of K-means.
Dec. 20: Optimization, Deep Learning	Stochastic Optimization Neural networks, deep learning, gradient descent with backpropagation	From Ch.18 of the CASI book: chapter intro, Sections 18.1, 18.2 (except accelerated gradient methods), Section 18.4 before 'Convolve Layer'. (The remainder of Section 18.4 is optional, but highly recommended.) Additional handout about stochastic optimization.

Homework Assignments

The homework assignments will be made available here. You are encouraged to discuss the assignments, but everyone has to perform their own experiments and write a report individually. NB These assignments will be a significant amount of work, so start early.

Homework	Data	Available	Deadline
Homework 1	housing data, description	Nov. 8	Nov. 25
Homework 2	car evaluation data, description	Nov. 26	Dec. 17

Material Used During Lectures

In case you missed (part of) a lecture, here are some of the slides and my personal hand-written notes, which I used to prepare the lectures and which should be more or less the same as what I wrote on the board. It is recommended to study these before the next class to catch up.

Nov. 1

Handwritten lecture notes 1
Slides 1
Figures used from the book: 2.1-2.5, 2.11

Nov. 8

Handwritten lecture notes 2
Figures used from the book: 3.11

Nov. 15

Nov. 22

Handwritten lecture notes 4
Slides 4
Figures used from the book: 4.5

Nov. 29

Handwritten lecture notes 5
Figures used from the book: 2.5, 12.1, 12.3

Dec. 6

Handwritten lecture notes 6
Figures used from the book: 8.9, 8.10, 8.12, 9.2, 9.3, 10.1, 10.2, 10.3
AdaBoost: Algorithm 10.1 in the book

Dec. 13

Handwritten lecture notes 7
Slides 7
Animation of K-means in action
Correction of mistake in the book in definition of K-means
Figures used from the book: 12.2, 12.3, 5.1, 5.2
Code for some examples from the book by John Weatherwax (not one of the authors), including the example from Section 5.2.2.

Dec. 20

Handwritten lecture notes 8
Animations of gradient descent in action: one, two
Stochastic optimization handout
Figures used: Figure 1 from handout, Figure 18.8 from CASI book