E-mail: hanj at cs.uiuc.edu, URL: http://www.cs.uiuc.edu/~hanj
List
of Supported Students and Staff
§
Jiawei Han, PI (1 summer month per year)
§
Xifeng Yan, Ph.D. student,
Department of Computer Science,
§
Deng Cai, Ph.D. student, Department of Computer Science,
§
Jiong Yang, research fellow, Department of Computer
Science,
§
Jianyong Wang, research fellow, Department of Computer
Science,
Project Summary
This project performs a systematic
investigation of the principles, algorithms, and applications of scalable
sequential and structured pattern mining, which covers the following issues: (1)
development of highly scalable mining algorithms, including mining
max-patterns, closed patterns and top-k patterns; (2) investigation of highly
flexible mining methodologies, including mining of multi-dimensional
multi-level sequential and structured patterns and constraint-based mining; (3)
extension of the scope to cover sequential or structured pattern-based
clustering; and (4) application of multi-dimensional, multi-level sequential or
structured pattern mining for intrusion detection, Web mining, and other
important applications. This will lead to a set of efficient, scalable, and
flexible sequential and structured pattern mining methods for scientific and
industrial applications.
Publications and Products
Journal
articles (including accepted)
1.
Chao Liu, Long Fei, Xifeng Yan,
Jiawei Han, and Samuel P. Midkiff,
“SOBER: Statistical Model-based
Fault Localization”, IEEE Transactions on Software Engineering,
(accept with minor revisions), April 2006.
2.
Xifeng Yan, Feida
Zhu, Philip S. Yu, and Jiawei Han, “Feature-based Substructure Similarity Search”,
ACM Transactions on Database Systems, accept for publication, April 2006.
3.
F. Pan, K. Kamath, K. Zhang, S. Pulapura, A.
Achar, J. Nunez-Iglesias,
Y. Huang, X. Yan, J. Han, H. Hu,
M. Xu, X. J. Zhou. “Integrative Array Analyzer: A software package for analysis
4.
of cross-platform and cross-species microarray data”,
Bioinformatics, 2006.
5.
J. Wang, J. Han,
and J. Pei, “Closed
Constrained-Gradient Mining in Retail Databases”, IEEE Transactions
on Knowledge and Data Engineering, 18(6): 764-769, 2006.
6.
X. Yin, J. Han,
J. Yang and P. S. Yu, “Efficient Classification
across Multiple Database Relations: A CrossMine Approach”,
IEEE Transactions on Knowledge and Data Engineering}, 18(6): 770-783, 2006.
7.
Charu Aggarwal, Jiawei Han, Jianyong Wang, and
Philip S. Yu, “A Framework for
On-Demand Classification of Evolving Data Streams”, IEEE Transactions
on Knowledge and Data Engineering}, 18(5):577-789, 2006.
8.
Deng Cai, Xiaofei He, Jiawei Han and Hong-Jiang Zhang, “Orthogonal Laplacianfaces
for Face Recognition”, IEEE Transactions on Image Processing,
(accepted), March 2006.
9.
Hwanjo Yu, Jiong Yang, Jiawei Han, and Xiaolei Li, “Making SVM Scalable to Large Data Sets Using
Hierarchical Indexing”, Data Mining and Knowledge Discovery, 11(3):
295-321, 2005.
10. D. Xin, J. Han, X. Yan and H. Cheng, “On Compressing Frequent Patterns”, Knowledge and Data
Engineering, (Special issue on Intelligent Data Mining), accepted in Nov. 2005.
11. Jiawei Han, Yixin Chen, Guozhu Dong,
12. Xifeng Yan, Philip Yu, and Jiawei Han, “Graph
Indexing Based on Discriminative Frequent Structure Analysis”, ACM Transactions
on Database Systems, 30(4): 960-993 (2005).
13. Deng Cai, Xiaofei He and Jiawei Han, “Document Clustering Using Locality
Preserving Indexing”, IEEE Transactions on Knowledge and Data
Engineering, 17(12):1624-1637, 2005.
14. C. Aggarwal, J. Han, J.
Wang, and P. S. Yu, “On Efficient
Algorithms for High Dimensional Projected Clustering of Data Streams”,
Data Mining and Knowledge Discovery,
10:251-272, 2005.
15. Petre Tzvetkov, Xifeng Yan, Jiawei
Han, “TSP: Mining top-k closed sequential
patterns”, Knowledge and Information Systems (KAIS), 7(4): 438-457,
2005.
16. J. Wang, J. Han, Y. Lu, and P. Tzvetkov,
“TFP: An
Efficient Algorithm for Mining Top-K Frequent Closed Itemsets”,
IEEE Transactions on Knowledge and Data Engineering, 17(5):652-664, 2005.
17. K. Wang, Y. Jiang, J. X. Yu, G. Dong, and J. Han,
“Divide-and-Approximate: A Novel
Constraint Push Strategy for Iceberg Cube Mining”, IEEE Transactions on Knowledge and Data
Engineering, 17(3):354-368, 2005.
18. J. Han, J. Pei, and X. Yan,
“From Sequential Pattern Mining to
Structured Pattern Mining: A Pattern-Growth Approach,” Journal of Computer Science and Technology,
19(3):257-279, 2004.
19. J. Han, J. Pei, Y. Yin and R. Mao, “Mining Frequent Patterns without Candidate
Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, 8(1):53-87, 2004.
20.
H. Yu, J.
Han, K. C.-C. Chang, “PEBL: Web
Page Classification Without Negative Examples,” IEEE Transactions on Knowledge and Data Engineering (Special Issue on
Mining and Searching the Web, 16(1):70-81, 2004.
21. G. Dong, J. Han, J. Lam, J. Pei, K. Wang, and W. Zou, “Mining
Constrained Gradients in Multi-Dimensional Databases,” IEEE Transactions on Knowledge and Data
Engineering, 16(5):922-938, 2004.
22. J. Pei, G. Dong, W. Zou, and
J. Han, “Mining Condensed Frequent
Pattern Bases,” Knowledge and
Information Systems, 2004.
23. J. Pei, J. Han, and L. V. S. Lakshmanan,
“Pushing Convertible Constraints in
Frequent Itemset Mining,” Data
Mining and Knowledge Discovery, 8(3):227-252, 2004.
24. J. Pei, J. Han, B. Mortazavi-Asl,
J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu,
“Mining Sequential Patterns by
Pattern-Growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data
Engineering, 16(11):1424-1440, 2004.
25. A. Doan, Y. Lu, Y. Lee, and J. Han, “Profile-Based Object Matching for
Information Integration,” IEEE
Intelligent System, 18(5):54-59, 2003.
26.
Y. Lu and J. Han,
``Cancer classification using gene expression data'', Information
Systems (Special Issue on Data Management in Bioinformatics), 28(4):
243-268, 2003.
27.
K. H. Tung, H.
Lu, J. Han, and L. Feng, “Breaking the
Barrier of Transactions: Mining Inter-Transaction Association Rules,” IEEE
Transactions on Knowledge and Data Engineering, 15(1): 43-56, 2003.
28.
K. Wang, Y. He
and J. Han, ``Pushing Support Constraints into Association Mining'', IEEE
Transactions on Knowledge and Data Engineering, 15(3): 642-658, 2003.
29.
Han and K. C.-C.
Chang, ``Data Mining for Web Intelligence'', COMPUTER,
35(11):64-70, 2002.
30.
R. Ng and J. Han,
``CLARANS: A Method for Clustering Objects for Spatial Data Mining'', IEEE
Transactions on Knowledge and Data Engineering, 14(5): 1003-1016, 2002.
31.
Feng, J. X. Yu, H. Lu, and J. Han, ``A Template Model
Multidimensional Inter-transactional Association Rules'', The VLDB
Journal, 11(2):153-175, 2002
32.
H. Caldas, L. Soibelman, and J. Han ``Automated Classification of
Construction Project'', Journal of Computing in Civil Engineering,
16(4):234-243, 2002.
33.
Han, R. B.
Altman, V. Kumar, H. Mannila and D. Pregibon, ``Emerging Scientific Applications in Data
Mining'', Communications of ACM, 45(8):54-58, 2002.
Book
or Book Chapters
38. P. Bajcsy, J. Han, L, Liu,
J. Yang, “Survey of Bio-Data
Analysis from Data Mining Perspective,” Chapter 2 of D. Shasha,
et al. (eds.), Data Mining in Bioinformatics,
Springer Verlag, 2005, pp. 9-39.
39. H. Yu, A. Doan, and J. Han, ``Mining for
Information Discovery on the Web: Overview and Illustrative Research,”
N. Zhong and J. Liu (eds.), Intelligent Technologies forInformation Analysis}, Springer Verlag,
2004, pp. 131-163.
40.
Giannella, J. Han, J. Pei,
X. Yan and P.S. Yu, ``Mining Frequent Patterns in
Data Streams at Multiple Time Granularities'', H. Kargupta,
A. Joshi, K. Sivakumar, and Y. Yesha
(eds.), Next Generation Data Mining, AAAI/MIT Press, 2004, pp.105-124.
Refereed Conference Publications (Refereed Workshop Publications are omitted due to limited space)
64. C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff,
“SOBER: Statistical
Model-based Bug Localization”, Proc. 2005 ACM SIGSOFT Symp. on the Foundations of
Software Engineering (FSE 2005),
65. D. Xin, J. Han, X. Yan and H. Cheng, “Mining Compressed
Frequent-Pattern Sets”, Proc. 2005 Int. Conf. on Very Large Data
Bases (VLDB'05),
66. X. Yan, H. Cheng, J. Han,
and D. Xin, “Summarizing Itemset Patterns: A Profile-Based Approach”,
Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05),
67. X. Yan, X. J. Zhou, and J.
Han, “Mining
Closed Relational Graphs with Connectivity Constraints”, Proc. 2005
Int. Conf. on Knowledge Discovery and Data Mining (KDD'05),
68. X. Yin, J. Han, and P.S. Yu, “Cross-Relational
Clustering with User's Guidance”, Proc. 2005 Int. Conf. on Knowledge
Discovery and Data Mining (KDD'05),
69. S. Cong, J. Han, and D.
70. D. Cai and X. He. “Orthogonal Locality
Preserving Indexing”, Proc. 2005 Int. Conf. on Research and
Development in Information Retrieval (SIGIR'05), Salvador, Brazil, Aug. 2005.
71. X. Yin, J. Han, and J. Yang, “Searching for Related
Objects in Relational Databases”, Proc. 2005 Int. Conf. on Scientific
and Statistical Database Management (SSDBM'05),
72. H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, “Mining Coherent Dense Subgraphs across Massive Biological Networks for Functional
Discovery”, Proc. 2005 Int. Conf. on Intelligent Systems for
Molecular Biology (ISMB 2005), Ann Arbor, MI, June 2005.
73. X. Yan, P. S. Yu, and J.
Han, “Substructure
Similarity Search in Graph Databases”, Proc. 2005 ACM-SIGMOD Int.
Conf. on Management of Data (SIGMOD'05), Baltimore, Maryland, June 2005, pp. 766-777.
74. C. Liu, X. Yan, H. Yu, J.
Han, and P. S. Yu, “Mining Behavior
Graphs for Backtrace of Noncrashing
Bugs”, Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport
Beach, CA, April 2005.
75. H. Cheng, X. Yan, and J.
Han, “SeqIndex: Indexing Sequences by Sequential Pattern Analysis”,
Proc. 2005
76. X. Li, J. Han, X. Yin, and D. Xin,
“Mining
Evolving Customer-Product Relationships in Multi-Dimensional Space”,
Proc. 2005 Int. Conf. on Data Engineering (ICDE'05),
77. X. Yan, X. J. Zhou, J. Han,
“Mining Closed
Relational Graphs with Connectivity Constraints”, Proc. 2005 Int.
Conf. on Data Engineering (ICDE'05),
78. W. Jin, J. Han, and M. Ester, “Mining Thick
Skylines over Large Databases”, Proc. 2004 European Conf. on
Principles of Principles and Practice of Knowledge Discovery in Databases
(PKDD’04), Pisa, Italy, Sept. 2004, pp. 255-266.
79. C. Aggarwal, J. Han,
J. Wang, and P. S. Yu, “A Framework for
Projected Clustering of High Dimensional Data Streams”, Proc. 2004
Int. Conf. on Very Large Data Bases (VLDB'04), Toronto, Canada, Aug. 2004.
80. X. Li, J. Han, and H. Gonzalez, “High-Dimensional OLAP: A
Minimal Cubing Approach”, Proc. 2004 Int. Conf. on Very Large
Data Bases (VLDB'04),
81. C. Aggarwal, J. Han, J.
Wang, and P. S. Yu, “On Demand
Classification of Data Streams”, Proc. 2004 Int. Conf. on Knowledge
Discovery and Data Mining (KDD'04),
82. H. Cheng, X. Yan, and J. Han,
“IncSpan: Incremental
Mining of Sequential Patterns in Large Database”, Proc. 2004
Int. Conf. on Knowledge Discovery and Data Mining (KDD'04),
83. B. He, K.C.-C. Chang, and J. Han, “Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining
Approach”, Proc. 2004 Int.
Conf. on Knowledge Discovery and Data Mining (KDD'04),
84. Y. Li, J. Han, and
J. Yang, “Clustering Moving
Objects”, Proc. 2004 Int. Conf. on Knowledge Discovery and Data
Mining (KDD'04), Seattle, WA, Aug. 2004.
85. Wu, M. Garland, and J. Han, “Mining
Scale-Free Networks using Geodesic Clustering”, Proc. 2004 Int. Conf.
on Knowledge Discovery and Data Mining (KDD'04),
86. J. Pei, J. Han, B.
Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U.
Dayal, and M.-C. Hsu, “Mining Sequential
Patterns by Pattern-Growth: The PrefixSpan Approach”,
IEEE Transactions on Knowledge and Data Engineering, 16(10), 2004.
87. Z. Shao, J. Han, and D. Xin, “MM-Cubing:
Computing Iceberg Cubes by Factorizing the Lattice Space”, Proc. 2004
Int. Conf. on Scientific and Statistical Database Management (SSDBM'04),
88. Y. Li, J. Yang, and J. Han, “Continuous K-Nearest
Neighbor Search for Moving Objects”, Proc. 2004 Int. Conf. on
Scientific and Statistical Database Management (SSDBM'04), Santorini
Island, Greece, June 2004.
89. J. Han, J. Pei, Y. Yin and R. Mao, “Mining Frequent
Patterns without Candidate Generation: A Frequent-Pattern Tree Approach”,
Data Mining and Knowledge Discovery, 8(1):53-87, 2004.
90. X. Yan, P. S. Yu, and J.
Han, “Graph
Indexing: A Frequent Structure-based Approach”, Proc. 2004 ACM-SIGMOD
Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June 2004.
91. Y. D. Cai, D. Clutter, G. Pape, J. Han, M. Welge, and L. Auvil, “MAIDS: Mining
Alarming Incidents from Data Streams”, (system demonstration), Proc.
2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'04), Paris, France, June
2004.
92. W.-Y. Kim, Y.-K. Lee, and J. Han, “CCMine:
Efficient Mining of Confidence-Closed Correlated Patterns”, Proc.
2004 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'04),
93. H.Yu, J. Han, K. C.-C. Chang, “PEBL:Web PageClassification
Without Negative Examples”, IEEE Transactions onKnowledge
and Data Engineering (Special Issue on Mining and Searching the Web),16(1):
70-81, 2004.
94. G. Dong, J. Han, J. Lam, J. Pei, K. Wang, and W. Zou, “MiningConstrained
Gradients in Multi-Dimensional Databases”, IEEE Transactions on
Knowledge and Data Engineering, 16(6), 2004.
95. X. Yin, J. Han, J. Yang, and P. S. Yu, “CrossMine: Efficient Classification across Multiple
Database Relations”, Proc. 2004 Int. Conf. on Data Engineering
(ICDE'04),
96. J. Wang and J. Han, “BIDE: Efficient Mining
of Frequent Closed Sequences”, Proc. 2004 Int. Conf. on Data
Engineering (ICDE'04),
97. P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-K Closed Sequential Patterns”,
Proc. 2003 Int. Conf. on Data Mining
(ICDM'03),
98. Y.-K. Lee, W.-Y. Kim, Y. D. Cai, and J. Han, “CoMine: Efficient Mining of Correlated Patterns”, Proc. 2003 Int. Conf. on Data Mining (ICDM'03),
99. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A Framework for
Clustering Evolving Data Streams”,
Proc. 2003 Int. Conf. on Very
Large Data Bases (VLDB'03),
100. D. Xin, J. Han, X. Li,
and B. W. Wah, “Star-Cubing:
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration”,
Proc. 2003 Int. Conf. on Very Large Data
Bases (VLDB'03),
108. S.-J. Ko
and J. Han, ``Mining the Typical Preference of Collaborative User Group'',
Proc. 2003 Int. Conf. on Conceptual Modeling (ER'03),
109. H. Yu, J. Yang, W. Wang, and J. Han, ``Discovering
Compact and Highly Discriminative Features or Feature Combinations of Drug
Activities Using Support Vector Machine'', Proc. 2003 IEEE Computer
Society Bioinformatics Conf. (CSB'03), Stanford,
110. J. Han, J. Wang, Y. Lu, and P. Tzvetkov,
``Mining Top-K Frequent Closed Patterns without Minimum Support'', Proc.
2002 Int. Conf. on Data Mining (ICDM'02),
111. J. Pei, G. Dong, W. Zou, and
J. Han ``On Computing Condensed Frequent Pattern Bases'', Proc.
2002 Int. Conf. on Data Mining (ICDM'02),
112. X. Yan and J. Han ``gSpan: Graph-Based Substructure Pattern Mining'',
Proc. 2002 Int. Conf. on Data Mining (ICDM'02),
113. H. Yu, K. C. C. Chang, and J. Han ``Heterogeneous
Learner for Web Page Classification'', Proc. 2002 Int. Conf. on Data
Mining (ICDM'02),
114. J. Pei, J. Han, and W. Wang, ``Mining Sequential
Patterns with Constraints in Large Databases'', Proc. 2002 Int. Conf.
on Information and Knowledge Management (CIKM'02),
115. L. V. S. Lakshmanan, J. Pei,
and J. Han, ``Quotient Cube: How to Summarize the Semantics of a Data Cube'',
Proc. 2002 Int. Conf. on Very Large Data Bases (VLDB'02),
116. Y. Chen, G. Dong, J. Han, B. W. Wah,
and J. Wang, ``Multi-Dimensional Regression Analysis of Time-Series Data
Streams'', Proc. 2002 Int. Conf. on Very Large Data Bases (VLDB'02),
Hong Kong, China, Aug. 2002, pp. 323-334.
Project
Impact
o
Research
Progress: A set of scalable
algorithms and methods (as well as a set of software packages) are developed for
mining various kinds of patterns and knowledge (including frequent patterns,
sequential patterns, structured patterns, classification, and clustering) in
large databases. Many of these methods, with further developments, can be used
by industry and other agencies for scalable data mining applications.
o
Training: Three Ph.D. students (Deng Cai, Xifeng Yan, Ying Lu) and two Research Associates (Drs. Jiong Yang and Jianyong Wang) are
partially supported by the project. In
the meantime, our whole data mining group (with 13 Ph.D. students) has been
greatly benefited by the support of this research project.
o Education:
Parts of this research are used in a Data Mining courses (CS412 and CS512)
taught at the
o Collaborations:
For this project we have established collaborations with
Current and Future Activities
The following are some of the highlights of our ongoing work. Please refer to the section: Publications and Products section for related references
§
High-dimensional
and scalable data analysis techniques: KDD’03 (CB-SVM), VLDB’04, ICDE’06
(C0Cubing).
§
Efficient and
effective methods for mining sequential patterns: SDM’03 (CloSpan), ICDE’04 (BIDE), and KDD’04 (IncSpan).
§
Pattern
compression methods for sequential and graph patterns: KDD’05 and
VLDB’05.
§
Efficient
methods for mining graph and structured patterns: KDD’03 (CloseGraph), SIGMOD’04 (gIndex),
SDM’05 (SeqIndex), and SIGMOD’05 (Graphfil).
§
Efficient and
scalable methods for classifications, cluster and link analysis across multiple
database relations: CrossMine (ICDE’04), CrossClus (KDD’05), LinkClus
(VLDB’06).
§
Warehousing
and mining RFID and sensor databases: ICDE’06 (RFID Warehousing) and VLDB’06
(RFID FlowCube).
Area Background
Mining frequent patterns, sequential patterns, and structured patterns efficiently in large databases has been an important theme in data mining with many applications. There have been a lot of research activities in this direction. Out work is built upon previous studies on scalable data mining methods, especially frequent and sequential pattern mining algorithms and explore its further extensions and applications.
Area References
§
[1] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules, VLDB
1994.
§
[2] U.M. Fayyad, G. Piatetsky-Shapiro,
P. Smyth, R. Uhturusamy, editors. Advances in
Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996.
§
[3] J. Han and M. Kamber. Data
Mining: Concepts and Techniques, 2nd edition,
Morgan Kaufmann Publishers, 2006.
§
[4] J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns
without Candidate Generation, SIGMOD 2000.
§
[5] D. J. Hand, H. Mannila, and
P. Smyth. Principles of Data Mining, MIT Press, 2001.
§
[6] G. Manku, R. Motwani. Approximate Frequency Counts over Data Streams.
VLDB 2002.
§
[7] X. Yan and J. Han. CloseGraph: Mining Closed Frequent Graph Patterns, KDD
2003.
§
[8] X. Yan, P. S. Yu, and J. Han, Graph Indexing: A Frequent
Structure-based Approach, SIGMOD 2004.
Potential Related Projects
The project is closely related to many research projects on knowledge discovery in databases and their applications, such as homeland security, bioinformatics, etc.
Project Web site URL: http://www.cs.uiuc.edu/~hanj/projs/patternmine.htm
Online software: Online software related to this project can be downloaded at www.illimine.cs.uiuc.edu
Online resources: Research publications related to this project can be downloaded at Selected Publications