EE 381V: (ADVANCED DATA MINING)

Theme: Big Data Analytics for Healthcare (BDAH)
TTh 11am-12:30pm, PAR (Parlin Hall) 201


Instructor: 
Joydeep Ghosh, jghosh@utexas.edu
Office: POB 3.118, 471-8980; Office Hrs: Mon, Wed  10:30am-noon; Other times by appointment only.
TA:  Tyler Folkman:
tfolkman@cs.utexas.edu, Office Location and Hrs: Tu, Th 12:30-2pm, ACA 112

PREREQUISITES:

1. Completed a graduate level course in Data mining or Machine Learning , OR

2. Enrolled in the graduate progam in SDS, OR

3. Written consent of instructor.  There will be a couple of slots for students who have a background in health informatics/health-IT.  If you are interested, please send me an email with title BDAH. The email should include your CV as well as a paragraph on why you want to take this course and why you are suitable for it.

 

Working knowledge of Python or R  will be assumed.

 

The nationwide move to electronic health records (EHRs), increased instrumentation of medical equipment, the emergence of a wide variety of body sensors and other mobile-health devices, new government requirements for ensuring quality of healthcare delivery, have all contributed to an explosion of healthcare related data in the recent past. Such data pose several unique challenges stemming from heterogeneity of data sources, data quality issues including missing or incorrect records, sparse time series, non-standard terminologies and data acquisition practices, cryptic clinical notes, privacy constraints, etc. However such large datasets have the potential to yield unprecedented insights that can lead to substantially better personalized and population health delivery at lower costs. The disruptive potential of big data analytics for healthcare is clearly reflected in recent startup activity.

This course will provide hands-on experience in analyzing large, complex health datasets. A variety of prediction problems (mortality, readmission, disease progression, costs, etc.) and data-driven phenotyping methods, will be studied.  We will primarily concentrate on MIMIC II (ICU data), DE-SynPUF (Medicare claims), and EHR data from hospitals, if available. Related papers will be studied and there will be several talks by domain experts as well.  More details will be available by late-August. Students will need to take online training modules on proper handling of health data, and may need to sign additional user agreements/NDAs depending on the data sources used.

Syllabus 


Textbook 

 A reading list of contemporary papers will be provided through Canvas.

Grading
5+10+35%: Final Project (groups of 2-4): project outline + short presentation + paper due last class day.
10%     Class presentation based on technical paper(s).
30%     Minor projects/Homeworks
10%     Participation in class
There will be no final exam.