I. Course description II. Course textbooks III. Course outline
IV. Course expectations V. Grading information

Course description

The information explosion of the past few years has us drowning in data but often starved of knowledge. Many companies that gather huge amounts of electronic data have now begun applying data science principles and models to discover and extract pieces of information useful for making smart business decisions.

In this course we will study a variety of techniques for data mining and machine learning. Some emphasis will be given to approaches that are scalable to very large data sets and/or those that are relatively robust when faced with a large number of predictors, and algorithms for heterogeneous or streaming data. We will mostly be using Python (specially, Scikit-Learn), though a few problems in R will also be given.

The central goal of this course is to convey an understanding of the pros and cons of different data-driven models, so that you can (i) make an informed decision on what approaches to consider when faced with real-life problems requiring predictive modeling, (ii) apply models properly on real datasets so to make valid conclusions. This goal will be reinforced through both theory and hands-on experience.

More hands-on exercises are offered in the "Data Science Lab" course. Information about the course instructor and TA(s) is available on the Contact tab above.

PREREQUISITES:

This course is meant for junior/senior level ECE and CS students. You MUST have taken (with grade of C- or better)
EE351K (or Math 362K): Probability and Statistics
AND M340L (or equivalent): Matrices/Linear Algebra
AND (CS314, CS 314H or EE360C): Data structures/algorithms
Graduate students are not allowed in this class. (Integrated BS/MS students from ECE/CS are OK).

Textbooks

The material for the lectures is taken from a wide variety of sources. My notes will be available via Canvas.

For help with programming + concepts, “Hands-On Machine Learning with Scikit-Learn and Tensorflow”, by A. Geron (O’Reilly, 2017) is a great reference. The Scikit-Learn website is also reasonably documented.

Author: Max Kuhn and Kjell Johnson (KJ)
Title: Applied Predictive Modeling
Publisher: Springer
ISBN: 1461468485
Year: 2013
Notes: Available through Amazon, Springer and UT Co-op.

Author: Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (JW)
Title: An Introduction to Statistical Learning with Applications in R
Publisher: Springer
Notes: The authors have kindly provided a free pdf version here, though it may be worth your while to get a hardcopy as well. Relevant sections of these books are indicated next to the topics below. A more detailing listing of readings/videos/other resources is provided through Canvas. Topics without reference to KJ, JW will be covered through lecture notes and a reading list of papers (also see supplementary texts at the bottom).

Course Schedule and Topics

KJ, JW refers to the two texts. HTF and B refer to two of the (more advanced) supplementary books and they are serve as references for the interested student.

1. Overview: The data mining process; model fitting and overfitting; decision theory; probability review; Types of predictive analytics; Local vs. global models; Tackling Big Data; Multiple linear regression. (3 lectures; KJ: Chapter 1,2; JW Chapters 1-3, B, Ch 1, 2.1-2.3; HTF Ch 1, 2.1-2.6)
Objective: Provide overview and context for this class.

2. Regression: generalized regression, bias-variance tradeoff and overfitting; Model tuning; Basis function expansion; Dealing with large number of features; Ridge, Lasso and Stagewise Approaches; Non-linear methods; (3 lectures; KJ: Chapter 4-6, 7.1,7.4, 7.5; JW Ch 6, B 3.1, 3.2; HTF Ch 2.7, 2.8, 3.1-3.4, 7.1-7.3, 11.1-11.8)
Objective: Learn to understand predictive models where desired outcome is a numeric quantity.

3. Classification: Bayes decision theory, Naïve Bayes and Bayesian networks; (Scaled) Logistic regression; LDA; Scaling decision trees to big data; Kernel methods and Support Vector Machines (SVMs) for classification and regression; dealing with class imbalance (6 lectures, KJ: Chapter 11,12,13,14.1,14.2,8.1,16; JW 4, 8, 9; B 4.1-4.3.4; 6.1, 6.2, 7.1, 14.4; HTF Ch 4, 7.10, 9.2, 12, 13.3)
Objective: Learn to build and evaluate predictive models where desired outcome is a class label.

4. Ensemble Methods: Model Averaging, Bagging and Random forests, boosting, Gradient boosting; Bag of Little Bootstraps (2 lectures; KJ: Chapter 14.3-14.8; JW 8.2; B 14.2, 14.3, HTF Ch 8.7, 8.8, 10.1-10.7, 16)
Objective: Understand the benefits of combining multiple predictive models.

5. Data Pre-Processing: Cleaning, Reduction, Feature Extraction and Visualization: Data quality; Curse of dimensionality, Transformations, Imputation, Sampling, Outlier detection, PCA. (3 lectures, KJ: Chapter 3, JW 10.1, 10.2, B 12.1; HTF Ch 14.5, 14.8)
Objective: Understand that good data quality is a pre-requisite for effective models, and study some methods for improving data quality.

6. Clustering and Co-clustering: k-means; hierarchical methods, graph partitioning; co-clustering, semi-supervised learning. Market Basket applications (3 lectures: JW 10.3, 10.5, B 9.1, 9.2; HTF Ch 13.1, 13.2, 14.3, 14.4)
Objective: Learn issues involved in unsupervised learning and the trade-offs among alternative approaches to clustering.

7. Streaming Data Mining, Big Data Analytics: Online learning – basic approaches; Winnow/Voted Perceptrons; Stochastic gradient methods for large data sets. Deep Learning and Tensorflow. (3 lectures)

9. Specialized Topics Topics: (coverage depends on time available and interest of class) Deep learning, Recommender Systems; Customer-product affinity detection;Hadoop/SPARK, Azure ML

10. Term Project Presentations and Discussion (4 classes)

11. Wildcards: A couple of classes may be used for invited talks by visiting experts.

Grading information

There will be no final exam. Quizzes will be held in class and of duration 15-20 minutes. Their objective is to review key concepts introduced in class. At the end of the course, you will get a score out of 100 based on the percentages stated above. Your final grade will be solely based on this score. The grade is primarily based on the curve, i.e. is relative to how the whole class performs; however entire curve may shift up or down a bit depending on how the class as a whole performs relative to past classes.

Grading is NOT based on absolute thresholds, e.g. 90+ = A etc.

Supplementary Texts

Author: Joel Grus
Title: Data Science from Scratch
Publisher: O'Reilly (Wes McKinney's Python for Data Analysis is also helpful.

MOOC: Coursera course by Andrew Ng has some very introductory material on linear algebra, e.g. multiplying a matrix with a vector.
Notes: https://class.coursera.org/ml-003/lecture

Author: Trevor Hastie, Robert Tibshirani, and Jerome Friedman (HTF)
Title: The Elements of Statistical Learning
Publisher: Springer (2nd edition)
ISBN: 0387848576
Notes: Can get it from Amazon, about $70 but well worth it, or download pdf from http://www-stat.stanford.edu/~tibs/ElemStatLearn/

Author: Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (TSK)
Title: Introduction to Data Mining
Publisher: Addison-Wesley (2005)
ISBN: 0-321-32136-7
Notes: Some chapters are downloadable from this website

Author: Christopher M. Bishop (B)
Title: Pattern Recognition and Machine Learning
Publisher: Springer
ISBN: 0387310738
Notes: http://research.microsoft.com/en-us/um/people/cmbishop/prml/

Author: Kevin Murphy
Title: Machine Learning: A Probabilistic Perspective,
Publisher: MIT Press
ISBN: 0262018020
Notes: Covers a very wide range of topics. Lots of examples in Matlab, with source code access.

Disabilities statement: "The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. For more information, contact the Office of the Dean of Students at 471-6259, 471-4641 TTY."

ACADEMIC DISHONESTY, UT Honor Code etc.: