Mining Database Structures and Linkages across Multiple Databases

 

Contact Information

 

Jiawei Han,  PI
Department of Computer Science
University of Illinois, Urbana-Champaign
201 N. Goodwin Ave., Urbana, Illinois 61801 U.S.A.
Office: (217) 333-6903,   Fax: (217) 265-6494

E-mail: hanj at cs.uiuc.edu, URL: http://www.cs.uiuc.edu/~hanj

 

List of Supported Students, Staff, and Collaborators

 

1.      Jiawei Han, PI

2.      Xiaoxin Yin, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign

3.      Philip S.Yu, manager of the Software Tools and Techniques group, IBM Thomas J. Watson Research Center

4.      Jiong Yang, Schroeder Assistant Professor, Electrical Engineering and Computer Science Department, Case Western Reserve University

 

Project Summary
 

Most of structured information in this world is stored in relational databases. Different relations in a database are interconnected with each other according to the database schema created during database design, and the linkages between relations indicate semantic relationships between different objects. The structural information and linkages in relational databases provide a rich source of information for data mining. Unfortunately, most data mining techniques today can only be applied to data stored in single "flat" tables. The scope of this project includes a variety of tasks on data mining and knowledge discovery from relational databases. It focuses on discover structural information and linkages from databases, and using such information in different tasks such as classification, clustering, outlier detection, etc. Our methodology includes designing efficient and scalable method for exploring multi-relational data, and using such methods to discover inherent properties and linkages among such data.

    This study will contribute to the development of principles and new approaches in knowledge discovery in multi-relational data, which are of essential importance in a variety of strategic applications including financial decision support, customer-relationship analysis, and bioinformatics.

 

Publications and Products

1.      X. Li, J. Han, X. Yin, and D. Xin, Mining Evolving Customer-Product Relationships in Multi-Dimensional Space, Proc. 2005 Int. Conf. on Data Engineering (ICDE'05), Tokyo, Japan, April 2005.

2.      X. Yan, X. J. Zhou, J. Han, Mining Closed Relational Graphs with Connectivity Constraints, Proc. 2005 Int. Conf. on Data Engineering (ICDE'05), Tokyo, Japan, April 2005.

3.      X. Yin, J. Han, J. Yang, and P. S. Yu, CrossMine: Efficient Classification across Multiple Database Relations, Proc. 2004 Int. Conf. on Data Engineering (ICDE'04), Boston, MA, March 2004.

4.      X. Yin and J. Han, CPAR: Classification based on Predictive Association Rules, Proc. 2003 SIAM Int.Conf. on Data Mining (SDM'03), San Fransisco, CA, May 2003. 

Project Impact

 

1.      Research Progress: A set of new algorithms and methods (as well as software packages) are developed for mining multi-relational databases. Many of these methods can be used by industry and other agencies.

 

2.      Education: Parts of this research are used in a Data Mining graduate course taught at the University of Illinois at Urbana-Champaign.

 

3.      Collaborations: For this project we have established a cooperation with IBM T.J. Watson Research Center.  Through such cooperation we expect to have access to real datasets and applications and produce more research results.

 

Current and Future Activities

The following are some of the highlights of our ongoing work. Please refer to the section: Publications and Products section for related references

1.      Development of efficient and scalable multi-relational clustering approaches, based on our work of CrossMine published at ICDE'04.

2.      Development of efficient and accurate record linkage approaches based on multi-relational data.

3.      Further development of efficient and accuracy methods for multi-relational classificationmethods, based on our work of CrossMine published at ICDE'04.

Area Background

Multi-relational data mining is a new topic proposed a few years ago. It is related to Inductive Logic Programming, which aims at finding hypothesis by induction based on knowledge that may be represented in relational form. Multi-relational data mining explores a much broader scope in both methodologies and applications, including various data mining tasks such as classification, clustering, outlier detection, temporal analysis, etc.

Area References

         [1] H. Blockeel, L. De Raedt, and J. Ramon. Top-down induction of logical decision trees. In Proc. 1998 Int. Conf. Machine Learning, Madison, WI, Aug. 1998.

         [2] S. Dzeroski, N. Lavac (editors). Relational data mining. Springer, Berlin, 2001.

         [3]. N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1994.

         [4] S. Muggleton. Inductive Logic Programming. Academic Press, New York, NY, 1992.

         [5] S. Muggleton and C. Feng. Efficient induction of logic programs. In Proc. 1990 Conf. Algorithmic Learning Theory, Tokyo, Japan, 1990.

         [6] J. Neville, D. Jensen, L. Friedland, and M. Hay. Learning Relational Probability Trees. Proc. 2003 Int. Conf. Knowledge Discovery and Data Mining, Washtington, DC, 2003.

         [7] J. R. Quinlan and R. M. Cameron-Jones. FOIL: A midterm report. In Proc. 1993 European Conf. Machine Learning, Vienna, Austria, 1993.

         [8] B. Taskar, E. Segal, and D. Koller. Probabilistic classiˉcation and clustering in relational data. In Proc. 2001 Int. Joint Conf. Artiˉcial Intelligence, Seattle, WA, 2001.

Potential Related Projects

The project is closely related to many research projects on knowledge discovery in databases and their applications, such as homeland security, bioinformatics, etc.

Project Web site URL:  http://www.cs.uiuc.edu/~hanj/projs/dbmine.html

Online software:  Online software related to this project can be downloaded at Software Downloads

Online resources:  Research publications related to this project can be downloaded at Selected Publications