EE 381V: (ADVANCED DATA
MINING)
Theme: Big Data Analytics for Healthcare (BDAH)
TTh 11am-12:30pm, PAR (Parlin Hall) 201
Instructor: Joydeep Ghosh, jghosh@utexas.edu
Office: POB 3.118, 471-8980;
Office Hrs: Mon, Wed 10:30am-noon; Other times by appointment only.
TA: Tyler Folkman: tfolkman@cs.utexas.edu, Office Location and Hrs: Tu, Th 12:30-2pm, ACA 112
PREREQUISITES:
1.
Completed a graduate level course in Data mining or Machine Learning
, OR
2.
Enrolled in the graduate progam in SDS, OR
3.
Written consent of instructor.
There will be a couple of slots for students who have a background in
health informatics/health-IT. If you are interested, please send me an
email with title BDAH. The email should include your CV as well as a paragraph on why you want to
take this course and why you are suitable for it.
Working
knowledge of Python or R will be assumed.
The nationwide
move to electronic health records (EHRs), increased instrumentation of medical
equipment, the emergence of a wide variety of body sensors and other
mobile-health devices, new government requirements for ensuring quality of
healthcare delivery, have all contributed to an explosion of healthcare related
data in the recent past. Such data pose several unique challenges stemming from
heterogeneity of data sources, data quality issues including missing or
incorrect records, sparse time series, non-standard terminologies and data
acquisition practices, cryptic clinical notes, privacy constraints, etc.
However such large datasets have the potential to yield unprecedented insights
that can lead to substantially better personalized and population health
delivery at lower costs. The disruptive potential of big data analytics for
healthcare is clearly reflected in recent startup activity.
This course will
provide hands-on experience in analyzing large, complex health datasets. A
variety of prediction problems (mortality, readmission, disease progression, costs,
etc.) and data-driven phenotyping methods,
will be studied. We will primarily
concentrate on MIMIC II (ICU data), DE-SynPUF (Medicare claims), and EHR data from
hospitals, if available. Related papers will be studied and there will be
several talks by domain experts as well.
More details will be available by late-August.
Students will need to take online training modules on proper handling of health
data, and may need to sign additional user agreements/NDAs depending on the
data sources used.
Syllabus
Textbook
A reading list of contemporary papers
will be provided through Canvas.
Grading
5+10+35%: Final Project (groups of 2-4): project outline + short presentation +
paper due last class day.
10% Class
presentation based on technical paper(s).
30% Minor
projects/Homeworks
10% Participation
in class
There will be no final exam.