Extraction and Interpretation of Information from Large-Scale Hyperspectral Data for Mapping and Monitoring Wetland Ecosystems

Abstract

Project Summary

Enormous quantities of high-dimensional data of very high spectral resolution are being acquired nowadays by new space-based hyperspectral sensors. Each resulting dataset requires several GBytes of storage and provides far superior characterization of the chemistry-based spectral response of remotely sensed areas. Unfortunately, current analysis tools are woefully inadequate for fully utilizing such rich information to map and monitor changes in ground cover over extended regions. The overall objective of this project is to develop a comprehensive system that can efficiently and intelligently extract, analyze and manage very large hyperspectral datasets used for classifying a large variety of land covers in environmentally sensitive ecosystems. Scalability to large areas will be primarily achieved by adapting models developed for one area to new regions (with somewhat different characteristics) using knowledge transfer and reuse mechanisms, and only small amounts of additional labelled data. In parallel, a knowledge repository of features/classes that can quickly narrow down both the sensor data that are most relevant and the possible class types and hierarchies that are likely to be pertinent to a given area, will be built. This will substantially reduce data storage requirements and processing time. Both knowledge transfer and domain knowledge construction will be facilitated by our recently developed binary hierarchical classifier (BHC) for multiclass problems. The BHC recursively partitions classes based on their natural affinities while simultaneously discovering the most suitable feature space in which to distinguish among each partition. A key feature of this technique is the creation of valuable domain knowledge about class hierarchies and relevant features, that shall be exploited in this proposal to scale to extremely large datasets from multiple areas.

Concurrently, single area analysis capabilities will be substantially enhanced by two approaches to simultaneously deal with the problem of having little labelled data and a large number of potential classes. The first adaptively combines highly correlated, adjacent spectral bands to reduce the feature space by an amount commensurate with the availability of labeled examples, and incorporates this method within multiple category classification systems such as the BHC and error correcting output codes. It forms the foundation for developing simple, adaptive, soft-sensors that can perform band-combinations on-the-fly onboard, and reduce the amount of data that must be downlinked for storage and processing. The second adapts semi-supervised learning and active learning concepts to hyperspectral data analysis within a distributed data mining framework.

Broader Impacts

The proposed research is an interdisciplinary effort that applies data mining approaches to an emerging large-scale problem domain which demands tight interaction between data acquisition and processing/analysis. This domain poses several notable challenges: high-dimensional, spectrally correlated data with tens of classes, natural class hierarchies, class characteristics and mixing proportions that change substantially both in space and time, limited training data but extremely large amounts of unlabelled areas, and multi-resolution properties. Although the project focuses on analysis of hyperspectral data for mapping wetland ecosystems, the research results will be also useful for a wider range data mining applications that exhibit some of these characteristics. Further, knowledge transfer across domains is considered a key technology for rapidly adapting existing solutions to somewhat different but related problems, thus substantially increasing the utility of existing point solutions. The overall framework and analysis capability will also provide a prototype for management and analysis of global datasets acquired by the future constellations of spaceborne multispectral and hyperspectral sensors.

This project will foster interdisciplinary training of students, as it demands close interaction with an applications-oriented user community and with domain experts. The visual nature of results from analysis of remotely sensed data make it a powerful modality of introducing the general population to issues of broad concern, such as the impact of global warming and disaster management. Several modes of community outreach are identified in this proposal.

People

PI: Dr. Joydeep Ghosh

Schlumberger Centennial Chair Professor, Dept. of Electrical & Computer Engineering, The University of Texas at Austin

Director, IDEAL (Intelligent Data Exploration and Analysis Lab) (formerly called Lab for Artificial Neural Systems (LANS)

Co-PI: Dr. Melba Crawford

Professor, Dept. of Civil Engineering, Purdue University

Assistant Dean, Engineering for Interdisciplinary Research

Director, Laboratory for Applications of Remote Sensing

PhD Students:

Suju Rajan, Dept. of Electrical & Computer Engineering, The University of Texas at Austin

Yangchi Chen, Dept. of Operations Research & Industrial Engineering, The University of Texas at Austin

Jisoo Ham, Dept. of Operations Research & Industrial Engineering, The University of Texas at Austin

Publications

Journal Papers

J. T. Morgan, A. Henneguelle, J. Ham, J. Ghosh and M.M. Crawford, “Adaptive Feature Spaces for Land Cover Classification with Limited Ground Truth Data”, in Intl. Jl. of Pattern Recognition and Artificial Intelligence, Spl. Issue on Fusion of Multiple Classifiers, J. Kittler and F. Roli (Eds.), 18(5), Aug 2004, pp. 777-800. pdf

J. Ham, Y. Chen, M. Crawford, and J. Ghosh, "Investigation of the Random Forest Framework for Classification of Hyperspectral Data," IEEE Trans. on Geoscience and Remote Sensing, vol. 43, no. 3, pp. 492-501, March 2005. pdf

S. Rajan, J. Ghosh, and M. M. Crawford. Exploiting class hierarchies for knowledge transfer in hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 44(11):3408 – 3417, 2006. pdf

Conference Papers

M.M. Crawford, J. Ham, Y. Chen, and J. Ghosh, "Random Forests of Binary Hierarchical Classifiers for Analysis of Hyperspectral Data," Proc. IEEE Workshop on Advances in Techniques for Analysis of Remotely Sensed Data, Goddard Space Flight Center, Greenbelt, MD, Oct 27-28, (publication via CD), 2003. pdf

A.Henneguelle, J. Ghosh, and M.M. Crawford, "Polyline Feature Extraction for Land Cover Classification Using Hyperspectral Data," Proc. IICAI-2003, Hyderabad, India, Dec. 18-20, (publication via CD), 2003.

Y. Chen, M.M. Crawford, and J. Ghosh, "Integrating Support Vector Machines in a Hierarchical Output Decomposition Framework," Proc. 2004 International Geoscience and Remote Sensing Symposium, Anchorage, Alaska, Sept 20-24, 949-953, 2004.

S. Rajan and J. Ghosh , "An Empirical Comparison of Hierarchical vs. Two-Level Approaches to Multiclass Problems", in Multiple Classifier Systems, F. Roli, J. Kittler and T. Windeatt (Eds), LNCS 3077, Springer, pp. 283-292. pdf

Y. Chen, M.M. Crawford, and J. Ghosh, "Applying Nonlinear Manifold Learning to Hyperspectral Data for Land Cover Classification", International Geoscience and Remote Sensing Symposium, July 2005. pdf

S. Rajan and Joydeep Ghosh, "Exploiting Class Hierarchies for Knowledge Transfer in Hyperspectral Data", In N.C. Oza, R. Polikar, J. Kittler and F. Roli editors, Multiple Classifier Systems, pages 417-428, LNCS Vol.3541, Springer 2005. pdf

S. Rajan, J.Ghosh, and M. M. Crawford, "An Active Learning Approach to Knowledge Transfer for Hyperspectral Data Analysis", Accepted, IEEE International Geoscience and Remote Sensing Symposium, Colorado, August 2006. pdf

Y. Chen, M. M. Crawford and J. Ghosh, "Improved nonlinear manifold learning for land cover classification via intelligent landmark selection", In 2006 International Geosci. and Sens. Symposium, Denver, Colorado, USA July 31-August 04, 2006. pdf

Data and Software

Data and classification codes can be downloaded from this website. link

Dates for the start / end of project

8/1/03 to 7/31/06

This page is last updated on 11 Jan, 2007