Mining Sequential and Structured Patterns: Scalability, Flexibility, Extensibility and Applicability

National Science Foundation Award Number: IIS-0209199 (Aug. 2002-July 2006)

Contact Information

Jiawei Han, PI
Department of Computer Science
University of Illinois, Urbana-Champaign
1304 West Springfield Ave. , Urbana, Illinois 61801 U.S.A.
Office: (217) 333-6903, Fax: (217) 265-6494

E-mail: hanj at cs.uiuc.edu, URL: http://www.cs.uiuc.edu/~hanj

List of Supported Students and Staff

§ Jiawei Han, PI (1 summer month per year)

§ Xifeng Yan, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign

§ Deng Cai, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign

§ Jiong Yang, research fellow, Department of Computer Science, University of Illinois at Urbana-Champaign

§ Jianyong Wang, research fellow, Department of Computer Science, University of Illinois at Urbana-Champaign

Project Award Information

Award Number: IDM IIS-0209199
Duration Aug. 2002-July 2006
Title: Mining Sequential and Structured Patterns: Scalability, Flexibility, Extensibility and Applicability
Keywords data mining, frequent pattern, sequential pattern, structured pattern, scalable mining algorithms, data mining applications.

Project Summary

This project performs a systematic investigation of the principles, algorithms, and applications of scalable sequential and structured pattern mining, which covers the following issues: (1) development of highly scalable mining algorithms, including mining max-patterns, closed patterns and top-k patterns; (2) investigation of highly flexible mining methodologies, including mining of multi-dimensional multi-level sequential and structured patterns and constraint-based mining; (3) extension of the scope to cover sequential or structured pattern-based clustering; and (4) application of multi-dimensional, multi-level sequential or structured pattern mining for intrusion detection, Web mining, and other important applications. This will lead to a set of efficient, scalable, and flexible sequential and structured pattern mining methods for scientific and industrial applications.

Publications and Products

Journal articles (including accepted)

1. Chao Liu, Long Fei, Xifeng Yan, Jiawei Han, and Samuel P. Midkiff, “SOBER: Statistical Model-based Fault Localization”, IEEE Transactions on Software Engineering, (accept with minor revisions), April 2006.

2. Xifeng Yan, Feida Zhu, Philip S. Yu, and Jiawei Han, “Feature-based Substructure Similarity Search”, ACM Transactions on Database Systems, accept for publication, April 2006.

3. F. Pan, K. Kamath, K. Zhang, S. Pulapura, A. Achar, J. Nunez-Iglesias, Y. Huang, X. Yan, J. Han, H. Hu, M. Xu, X. J. Zhou. “Integrative Array Analyzer: A software package for analysis

4. of cross-platform and cross-species microarray data”, Bioinformatics, 2006.

5. J. Wang, J. Han, and J. Pei, “Closed Constrained-Gradient Mining in Retail Databases”, IEEE Transactions on Knowledge and Data Engineering, 18(6): 764-769, 2006.

6. X. Yin, J. Han, J. Yang and P. S. Yu, “Efficient Classification across Multiple Database Relations: A CrossMine Approach”, IEEE Transactions on Knowledge and Data Engineering}, 18(6): 770-783, 2006.

7. Charu Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu, “A Framework for On-Demand Classification of Evolving Data Streams”, IEEE Transactions on Knowledge and Data Engineering}, 18(5):577-789, 2006.

8. Deng Cai, Xiaofei He, Jiawei Han and Hong-Jiang Zhang, “Orthogonal Laplacianfaces for Face Recognition”, IEEE Transactions on Image Processing, (accepted), March 2006.

9. Hwanjo Yu, Jiong Yang, Jiawei Han, and Xiaolei Li, “Making SVM Scalable to Large Data Sets Using Hierarchical Indexing”, Data Mining and Knowledge Discovery, 11(3): 295-321, 2005.

10. D. Xin, J. Han, X. Yan and H. Cheng, “On Compressing Frequent Patterns”, Knowledge and Data Engineering, (Special issue on Intelligent Data Mining), accepted in Nov. 2005.

11. Jiawei Han, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Wah, Jianyong Wang, and Y. Dora Cai, “Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams”, Distributed and Parallel Databases, 18(2): 173-197, 2005.

12. Xifeng Yan, Philip Yu, and Jiawei Han, “Graph Indexing Based on Discriminative Frequent Structure Analysis”, ACM Transactions on Database Systems, 30(4): 960-993 (2005).

13. Deng Cai, Xiaofei He and Jiawei Han, “Document Clustering Using Locality Preserving Indexing”, IEEE Transactions on Knowledge and Data Engineering, 17(12):1624-1637, 2005.

14. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “On Efficient Algorithms for High Dimensional Projected Clustering of Data Streams”, Data Mining and Knowledge Discovery, 10:251-272, 2005.

15. Petre Tzvetkov, Xifeng Yan, Jiawei Han, “TSP: Mining top-k closed sequential patterns”, Knowledge and Information Systems (KAIS), 7(4): 438-457, 2005.

16. J. Wang, J. Han, Y. Lu, and P. Tzvetkov, “TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets”, IEEE Transactions on Knowledge and Data Engineering, 17(5):652-664, 2005.

17. K. Wang, Y. Jiang, J. X. Yu, G. Dong, and J. Han, “Divide-and-Approximate: A Novel Constraint Push Strategy for Iceberg Cube Mining”, IEEE Transactions on Knowledge and Data Engineering, 17(3):354-368, 2005.

18. J. Han, J. Pei, and X. Yan, “From Sequential Pattern Mining to Structured Pattern Mining: A Pattern-Growth Approach,” Journal of Computer Science and Technology, 19(3):257-279, 2004.

19. J. Han, J. Pei, Y. Yin and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery, 8(1):53-87, 2004.

20. H. Yu, J. Han, K. C.-C. Chang, “PEBL: Web Page Classification Without Negative Examples,” IEEE Transactions on Knowledge and Data Engineering (Special Issue on Mining and Searching the Web, 16(1):70-81, 2004.

21. G. Dong, J. Han, J. Lam, J. Pei, K. Wang, and W. Zou, “Mining Constrained Gradients in Multi-Dimensional Databases,” IEEE Transactions on Knowledge and Data Engineering, 16(5):922-938, 2004.

22. J. Pei, G. Dong, W. Zou, and J. Han, “Mining Condensed Frequent Pattern Bases,” Knowledge and Information Systems, 2004.

23. J. Pei, J. Han, and L. V. S. Lakshmanan, “Pushing Convertible Constraints in Frequent Itemset Mining,” Data Mining and Knowledge Discovery, 8(3):227-252, 2004.

24. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, 16(11):1424-1440, 2004.

25. A. Doan, Y. Lu, Y. Lee, and J. Han, “Profile-Based Object Matching for Information Integration,” IEEE Intelligent System, 18(5):54-59, 2003.

26. Y. Lu and J. Han, ``Cancer classification using gene expression data'', Information Systems (Special Issue on Data Management in Bioinformatics), 28(4): 243-268, 2003.

27. K. H. Tung, H. Lu, J. Han, and L. Feng, “Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules,” IEEE Transactions on Knowledge and Data Engineering, 15(1): 43-56, 2003.

28. K. Wang, Y. He and J. Han, ``Pushing Support Constraints into Association Mining'', IEEE Transactions on Knowledge and Data Engineering, 15(3): 642-658, 2003.

29. Han and K. C.-C. Chang, ``Data Mining for Web Intelligence'', COMPUTER, 35(11):64-70, 2002.

30. R. Ng and J. Han, ``CLARANS: A Method for Clustering Objects for Spatial Data Mining'', IEEE Transactions on Knowledge and Data Engineering, 14(5): 1003-1016, 2002.

31. Feng, J. X. Yu, H. Lu, and J. Han, ``A Template Model Multidimensional Inter-transactional Association Rules'', The VLDB Journal, 11(2):153-175, 2002

32. H. Caldas, L. Soibelman, and J. Han ``Automated Classification of Construction Project'', Journal of Computing in Civil Engineering, 16(4):234-243, 2002.

33. Han, R. B. Altman, V. Kumar, H. Mannila and D. Pregibon, ``Emerging Scientific Applications in Data Mining'', Communications of ACM, 45(8):54-58, 2002.

Book or Book Chapters

Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques}, (Foreword by Jim Gray), 2nd ed., Morgan Kaufmann, 2006.
Jiawei Han, Benjamin W. Wah, Vijay Raghavan, Xindong Wu, and Rajeev Rastogi (eds.), Proceedings of the Fifth Int. Conf. on Data Mining (ICDM-2005), (Houston, Texas, Nov. 27--30, 2005) IEEE Computer Society, New York, 2005. (846 + xxvii pages).
J. Yang, X. Yan, J. Han, and W. Wang, “Discovering Evolutionary Classifier over High Speed Non-Static Stream”, in S. Bandyopadhyay et al. (eds.), Advanced Methods for Knowledge Discovery from Complex Data, Springer Verlag, 2005.
J. Han, J. Pei, and X. Yan, “Sequential Pattern Mining by Pattern-Growth: Principles and Extensions”, in W. W. Chu and T. Y. Lin (eds.), Recent Advances in Data Mining and Granular Computing (Mathematical Aspects of Knowledge Discovery), Springer Verlag, 2005.

38. P. Bajcsy, J. Han, L, Liu, J. Yang, “Survey of Bio-Data Analysis from Data Mining Perspective,” Chapter 2 of D. Shasha, et al. (eds.), Data Mining in Bioinformatics, Springer Verlag, 2005, pp. 9-39.

39. H. Yu, A. Doan, and J. Han, ``Mining for Information Discovery on the Web: Overview and Illustrative Research,” N. Zhong and J. Liu (eds.), Intelligent Technologies forInformation Analysis}, Springer Verlag, 2004, pp. 131-163.

40. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu, ``Mining Frequent Patterns in Data Streams at Multiple Time Granularities'', H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), Next Generation Data Mining, AAAI/MIT Press, 2004, pp.105-124.

Refereed Conference Publications (Refereed Workshop Publications are omitted due to limited space)

Xiaoxin Yin, Jiawei Han, and Philip Yu, “LinkClus: Efficient Clustering via Heterogeneous Semantic Links”, in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06), Seoul, Korea, Sept. 2006.
Hector Gonzalez, Jiawei Han, and Xiaolei Li, “FlowCube: Constructuing RFID FlowCubes for Multi-Dimensional Analysis of Commodity Flows”, in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06), Seoul, Korea, Sept. 2006.
Dong Xin, Chen Chen, and Jiawei Han, “Towards Robust Indexing for Ranked Queries”, in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06), Seoul, Korea, Sept. 2006.
Dong Xin, Jiawei Han, Hong Cheng, and Xiaolei Li, “Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach”, in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06), Seoul, Korea, Sept. 2006.
Dong Xin, Hong Cheng, Xifeng Yan, and Jiawei Han, “Extracting Redundancy-Aware Top-K Patterns”, in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA, Aug. 2006.
Qiaozhu Mei, Dong Xin, Hong Cheng, ChengXiang Zhai, and Jiawei Han, “Generating Semantic Annotations for Frequent Patterns with Context Analysis”, in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA, Aug. 2006.
Chao Liu, Chen Chen, Jiawei Han, and Philip Yu, “GPLAG: Detection of Software Plagiarism by Procedure Dependency Graph Analysis”, in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA, Aug. 2006.
Dong Xin, Xuehua Shen, Qiaozhu Mei, and Jiawei Han, “Discovering Interesting Patterns Through User's Interactive Feedback”, in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA, Aug. 2006.
Deng Cai, Xiaofei He and Jiawei Han, “Tensor Space Model for Document Analysis”, in Proc. 2006 Int. ACM SIGIR Conf. on Research & Development on Information Retrieval (SIGIR'06), Seattle, WA, Aug. 2006.
Hongyan Liu, Ying Lu, Jiawei Han, and Jun He, “Error-Adaptive and Time-Aware Maintenance of Frequency Counts over Data Streams”, in Proc. 2006 Int. Conf. on Web-Age Information Management (WAIM'06), Hong Kong, China, June, 2006.
Kaushik Chakrabarti, Venkatesh Ganti, Jiawei Han, and Dong Xin, “Ranking Objects Based on Relationships”, in Proc. 2006 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'06), Chicago, IL, June 2006.
Xiaolei Li, Jiawei Han, and Sangkyum Kim, “Motion-Alert: Automatic Anomaly Detection in Massive Moving Objects”, Proc. 2006 IEEE Int. Conf. on Intelligence and Security Informatics (ISI'06), San Diego, CA, May 2006.
Wen Jin, Anthony K. H. Tung, Jiawei Han, and Wei Wang, “Ranking Outliers Using Symmetric Neighborhood Relationship,” in Proc. 2006 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'06), Singapore, April 2006.
Hongyan Liu, Jiawei Han, Dong Xin, and Zheng Shao, “Mining Interesting Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach,” in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006.
Chao Liu, Xifeng Yan, and Jiawei Han, “Mining Control Flow Abnormality for Logic Error Isolation,” in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006.
Charu Aggarwal, Chen Chen, and Jiawei Han, “On the Inverse Classification Problem and its Applications”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.
Hector Gonzalez, Jiawei Han, Xiaolei Li, and Diego Klabjan, “Warehousing and Analysis of Massive RFID Data Sets”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.
Hongyan Liu, Jiawei Han, Dong Xin, and Zheng Shao, “Top-Down Mining of Interesting Patterns from Very High Dimensional Data”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.
Dong Xin, Jiawei Han, Zheng Shao, and Hongyan Liu, “C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.
Xifeng Yan, Feida Zhu, Jiawei Han, and Philip Yu, “Searching Substructures with Superimposed Distance”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.
Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, Jiawei Han, “Community Mining from Multi-Relational Networks”, in Proc. 2005 European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'05), Porto, Portugal, Oct., 2005.
Wen Jin, Martin Ester and Jiawei Han, “Efficient Processing of Ranked Queries with Sweeping Selection”, in Proc. 2005 European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'05), Porto, Portugal, Oct., 2005.
Xiaoxin Yin and Jiawei Han, “Efficient Classification from Multiple Heterogeneous Databases”, in Proc. 2005 European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'05), Porto, Portugal, Oct., 2005.

64. C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, “SOBER: Statistical Model-based Bug Localization”, Proc. 2005 ACM SIGSOFT Symp. on the Foundations of Software Engineering (FSE 2005), Lisbon, Portugal, Sept. 2005.

65. D. Xin, J. Han, X. Yan and H. Cheng, “Mining Compressed Frequent-Pattern Sets”, Proc. 2005 Int. Conf. on Very Large Data Bases (VLDB'05), Trondheim, Norway, Aug. 2005.

66. X. Yan, H. Cheng, J. Han, and D. Xin, “Summarizing Itemset Patterns: A Profile-Based Approach”, Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL, Aug. 2005.

67. X. Yan, X. J. Zhou, and J. Han, “Mining Closed Relational Graphs with Connectivity Constraints”, Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL, Aug. 2005.

68. X. Yin, J. Han, and P.S. Yu, “Cross-Relational Clustering with User's Guidance”, Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL, Aug. 2005.

69. S. Cong, J. Han, and D. Padua, “Parallel Mining of Closed Sequential Patterns”, Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL, Aug. 2005.

70. D. Cai and X. He. “Orthogonal Locality Preserving Indexing”, Proc. 2005 Int. Conf. on Research and Development in Information Retrieval (SIGIR'05), Salvador, Brazil, Aug. 2005.

71. X. Yin, J. Han, and J. Yang, “Searching for Related Objects in Relational Databases”, Proc. 2005 Int. Conf. on Scientific and Statistical Database Management (SSDBM'05), Santa Barbara, CA, June 2005.

72. H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, “Mining Coherent Dense Subgraphs across Massive Biological Networks for Functional Discovery”, Proc. 2005 Int. Conf. on Intelligent Systems for Molecular Biology (ISMB 2005), Ann Arbor, MI, June 2005.

73. X. Yan, P. S. Yu, and J. Han, “Substructure Similarity Search in Graph Databases”, Proc. 2005 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'05), Baltimore, Maryland, June 2005, pp. 766-777.

74. C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining Behavior Graphs for Backtrace of Noncrashing Bugs”, Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005.

75. H. Cheng, X. Yan, and J. Han, “SeqIndex: Indexing Sequences by Sequential Pattern Analysis”, Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005.

76. X. Li, J. Han, X. Yin, and D. Xin, “Mining Evolving Customer-Product Relationships in Multi-Dimensional Space”, Proc. 2005 Int. Conf. on Data Engineering (ICDE'05), Tokyo, Japan, April 2005, pp. 580-581.

77. X. Yan, X. J. Zhou, J. Han, “Mining Closed Relational Graphs with Connectivity Constraints”, Proc. 2005 Int. Conf. on Data Engineering (ICDE'05), Tokyo, Japan, April 2005, pp. 357-358.

78. W. Jin, J. Han, and M. Ester, “Mining Thick Skylines over Large Databases”, Proc. 2004 European Conf. on Principles of Principles and Practice of Knowledge Discovery in Databases (PKDD’04), Pisa, Italy, Sept. 2004, pp. 255-266.

79. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A Framework for Projected Clustering of High Dimensional Data Streams”, Proc. 2004 Int. Conf. on Very Large Data Bases (VLDB'04), Toronto, Canada, Aug. 2004.

80. X. Li, J. Han, and H. Gonzalez, “High-Dimensional OLAP: A Minimal Cubing Approach”, Proc. 2004 Int. Conf. on Very Large Data Bases (VLDB'04), Toronto, Canada, Aug. 2004.

81. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “On Demand Classification of Data Streams”, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04), Seattle, WA, Aug. 2004.

82. H. Cheng, X. Yan, and J. Han, “IncSpan: Incremental Mining of Sequential Patterns in Large Database”, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04), Seattle, WA, Aug. 2004.

83. B. He, K.C.-C. Chang, and J. Han, “Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach”, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04), Seattle, WA, Aug. 2004.

84. Y. Li, J. Han, and J. Yang, “Clustering Moving Objects”, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04), Seattle, WA, Aug. 2004.

85. Wu, M. Garland, and J. Han, “Mining Scale-Free Networks using Geodesic Clustering”, Proc. 2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04), Seattle, WA, Aug. 2004.

86. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach”, IEEE Transactions on Knowledge and Data Engineering, 16(10), 2004.

87. Z. Shao, J. Han, and D. Xin, “MM-Cubing: Computing Iceberg Cubes by Factorizing the Lattice Space”, Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, June 2004.

88. Y. Li, J. Yang, and J. Han, “Continuous K-Nearest Neighbor Search for Moving Objects”, Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, June 2004.

89. J. Han, J. Pei, Y. Yin and R. Mao, “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach”, Data Mining and Knowledge Discovery, 8(1):53-87, 2004.

90. X. Yan, P. S. Yu, and J. Han, “Graph Indexing: A Frequent Structure-based Approach”, Proc. 2004 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June 2004.

91. Y. D. Cai, D. Clutter, G. Pape, J. Han, M. Welge, and L. Auvil, “MAIDS: Mining Alarming Incidents from Data Streams”, (system demonstration), Proc. 2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'04), Paris, France, June 2004.

92. W.-Y. Kim, Y.-K. Lee, and J. Han, “CCMine: Efficient Mining of Confidence-Closed Correlated Patterns”, Proc. 2004 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'04), Sydney, Australia, May 2004.

93. H.Yu, J. Han, K. C.-C. Chang, “PEBL:Web PageClassification Without Negative Examples”, IEEE Transactions onKnowledge and Data Engineering (Special Issue on Mining and Searching the Web),16(1): 70-81, 2004.

94. G. Dong, J. Han, J. Lam, J. Pei, K. Wang, and W. Zou, “MiningConstrained Gradients in Multi-Dimensional Databases”, IEEE Transactions on Knowledge and Data Engineering, 16(6), 2004.

95. X. Yin, J. Han, J. Yang, and P. S. Yu, “CrossMine: Efficient Classification across Multiple Database Relations”, Proc. 2004 Int. Conf. on Data Engineering (ICDE'04), Boston, MA, March 2004.

96. J. Wang and J. Han, “BIDE: Efficient Mining of Frequent Closed Sequences”, Proc. 2004 Int. Conf. on Data Engineering (ICDE'04), Boston, MA, March 2004.

97. P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-K Closed Sequential Patterns”, Proc. 2003 Int. Conf. on Data Mining (ICDM'03), Melbourne, FL, Nov. 2003.

98. Y.-K. Lee, W.-Y. Kim, Y. D. Cai, and J. Han, “CoMine: Efficient Mining of Correlated Patterns”, Proc. 2003 Int. Conf. on Data Mining (ICDM'03), Melbourne, FL, Nov. 2003.

99. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A Framework for Clustering Evolving Data Streams”, Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.

100. D. Xin, J. Han, X. Li, and B. W. Wah, “Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration”, Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.

101. X. Yan and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns”, Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003..

102. H. Yu, J. Yang, and J. Han, “Classifying Large Data Sets Using SVM with Hierarchical Clusters”, Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.

103. J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets”, Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.

104. H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining Concept-Drifting Data Streams using Ensemble Classifiers”, Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.

105. X. Yin and J. Han, “CPAR: Classification based on Predictive Association Rules”, Proc. 2003 SIAM Int.Conf. on Data Mining (SDM'03), San Fransisco, CA, May 2003.

106. X. Yan, J. Han, and R. Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Datasets”, Proc. 2003 SIAM Int.Conf. on Data Mining (SDM'03), San Fransisco, CA, May 2003.

107. K. Wang, Y. Jiang, J. X. Yu, G. Dong, and J. Han, "Pushing Aggregate Constraints by Divide-and-Approximate", The IEEE International Conference on Data Engineering, 2003, Bangalore, India

108. S.-J. Ko and J. Han, ``Mining the Typical Preference of Collaborative User Group'', Proc. 2003 Int. Conf. on Conceptual Modeling (ER'03), Skokie, IL, Oct. 2003.

109. H. Yu, J. Yang, W. Wang, and J. Han, ``Discovering Compact and Highly Discriminative Features or Feature Combinations of Drug Activities Using Support Vector Machine'', Proc. 2003 IEEE Computer Society Bioinformatics Conf. (CSB'03), Stanford, California, Aug. 2003.

110. J. Han, J. Wang, Y. Lu, and P. Tzvetkov, ``Mining Top-K Frequent Closed Patterns without Minimum Support'', Proc. 2002 Int. Conf. on Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002, pp. 211-218.

111. J. Pei, G. Dong, W. Zou, and J. Han ``On Computing Condensed Frequent Pattern Bases'', Proc. 2002 Int. Conf. on Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002, pp. 378-385.

112. X. Yan and J. Han ``gSpan: Graph-Based Substructure Pattern Mining'', Proc. 2002 Int. Conf. on Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002, pp. 721-724.

113. H. Yu, K. C. C. Chang, and J. Han ``Heterogeneous Learner for Web Page Classification'', Proc. 2002 Int. Conf. on Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002, pp. 538-545.

114. J. Pei, J. Han, and W. Wang, ``Mining Sequential Patterns with Constraints in Large Databases'', Proc. 2002 Int. Conf. on Information and Knowledge Management (CIKM'02), McLean, VA, Nov. 2002, pp. 18-25.

115. L. V. S. Lakshmanan, J. Pei, and J. Han, ``Quotient Cube: How to Summarize the Semantics of a Data Cube'', Proc. 2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002, pp. 778-789.

116. Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, ``Multi-Dimensional Regression Analysis of Time-Series Data Streams'', Proc. 2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002, pp. 323-334.

Project Impact

o Research Progress: A set of scalable algorithms and methods (as well as a set of software packages) are developed for mining various kinds of patterns and knowledge (including frequent patterns, sequential patterns, structured patterns, classification, and clustering) in large databases. Many of these methods, with further developments, can be used by industry and other agencies for scalable data mining applications.

o Training: Three Ph.D. students (Deng Cai, Xifeng Yan, Ying Lu) and two Research Associates (Drs. Jiong Yang and Jianyong Wang) are partially supported by the project. In the meantime, our whole data mining group (with 13 Ph.D. students) has been greatly benefited by the support of this research project.

o Education: Parts of this research are used in a Data Mining courses (CS412 and CS512) taught at the University of Illinois at Urbana-Champaign (2002-2006).

o Collaborations: For this project we have established collaborations with IBM T.J. Watson Research Center, Microsoft Research, Google, Intel, and NCSA (National Center of Supercomputer Applications). Through such collaborations we expect to have access to real datasets and applications and produce more research results.

Current and Future Activities

The following are some of the highlights of our ongoing work. Please refer to the section: Publications and Products section for related references

§ High-dimensional and scalable data analysis techniques: KDD’03 (CB-SVM), VLDB’04, ICDE’06 (C0Cubing).

§ Efficient and effective methods for mining sequential patterns: SDM’03 (CloSpan), ICDE’04 (BIDE), and KDD’04 (IncSpan).

§ Pattern compression methods for sequential and graph patterns: KDD’05 and VLDB’05.

§ Efficient methods for mining graph and structured patterns: KDD’03 (CloseGraph), SIGMOD’04 (gIndex), SDM’05 (SeqIndex), and SIGMOD’05 (Graphfil).

§ Efficient and scalable methods for classifications, cluster and link analysis across multiple database relations: CrossMine (ICDE’04), CrossClus (KDD’05), LinkClus (VLDB’06).

§ Warehousing and mining RFID and sensor databases: ICDE’06 (RFID Warehousing) and VLDB’06 (RFID FlowCube).

Area Background

Mining frequent patterns, sequential patterns, and structured patterns efficiently in large databases has been an important theme in data mining with many applications. There have been a lot of research activities in this direction. Out work is built upon previous studies on scalable data mining methods, especially frequent and sequential pattern mining algorithms and explore its further extensions and applications.

Area References

§ [1] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules, VLDB 1994.

§ [2] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uhturusamy, editors. Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996.

§ [3] J. Han and M. Kamber. Data Mining: Concepts and Techniques, 2^nd edition, Morgan Kaufmann Publishers, 2006.

§ [4] J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, SIGMOD 2000.

§ [5] D. J. Hand, H. Mannila, and P. Smyth. Principles of Data Mining, MIT Press, 2001.

§ [6] G. Manku, R. Motwani. Approximate Frequency Counts over Data Streams. VLDB 2002.

§ [7] X. Yan and J. Han. CloseGraph: Mining Closed Frequent Graph Patterns, KDD 2003.

§ [8] X. Yan, P. S. Yu, and J. Han, Graph Indexing: A Frequent Structure-based Approach, SIGMOD 2004.

Potential Related Projects

The project is closely related to many research projects on knowledge discovery in databases and their applications, such as homeland security, bioinformatics, etc.

Project Web site URL: http://www.cs.uiuc.edu/~hanj/projs/patternmine.htm

Online software: Online software related to this project can be downloaded at www.illimine.cs.uiuc.edu

Online resources: Research publications related to this project can be downloaded at Selected Publications