IDEAL @ UT

Scalable Clustering of Complex Data Types

Award Abstract

This proposal shall address three key issues pertaining to the clustering of large, complex datasets. First, a unifying view of model-based clustering will be developed to form a theoretical basis for understanding and comparing a wide range of existing clustering algorithms for complex data. This view will be systematically explored to develop improved algorithms for specific applications. Second, complexity arising from domain constraints on balancing and dealing with incrementally acquired non-stationary data, such as newsfeeds, shall be addressed via adaptive clustering techniques. Finally, methods for obtaining a single consensus solution given multiple clustering results shall be investigated. Such methods will facilitate distributed data mining under severe restrictions on data sharing due to privacy and other constraints. Benchmarks for the proposed research areas shall be developed and made available to the research community. Broader impacts of this project also include the design of outreach modules that illustrate data analysis issues and capabilities to high school students using case studies resulting from the proposed work.

Scalable Clustering of Complex Data Types

Sponsored by NSF under grant IIS-0307792

Award Abstract

Clustering Papers by our group