Mining Dynamics of Data Streams in Multi-Dimensional Space
National Science Foundation Award Number: IIS-0308215 (Sept. 2003-August 2006)
Contact Information
Jiawei Han, PI
Department of Computer Science
University of Illinois, Urbana-Champaign
1304 West Springfield Ave. , Urbana, Illinois 61801 U.S.A.
Office: (217) 333-6903, Fax: (217) 265-6494
E-mail: hanj at cs.uiuc.edu, URL: http://www.cs.uiuc.edu/~hanj
List
of Supported Students and Staff
§
Xifeng Yan, Ph.D. student,
Department of Computer Science, University
of Illinois at
Urbana-Champaign
§
Ying Lu, Ph.D.
student, Department of Computer Science, University of Illinois
at Urbana-Champaign
§
Jianyong Wang, research fellow, Department of Computer
Science, University
of Illinois at
Urbana-Champaign
§
Jiong Yang, research fellow, Department of Computer
Science, University
of Illinois at
Urbana-Champaign
Project
Award Information
- Award Number: IIS-0308215
- Duration Sept. 2003-August 2006
- Title: Mining Dynamics of Data Streams in
Multi-Dimensional Space
- Keywords: Stream
data mining, stream data processing, scalable algorithms, online
mining algorithms, data mining applications
Project
Summary
Stream data processing and mining represent an important,
emerging class of data-intensive applications where data flows in and out
dynamically, in huge (possibly infinite) volumes, adaptive to only single-scan
algorithms, but often demanding fast or even real-time responses. Based on the
observation that a majority of stream data resides at the primitive abstraction
level, but most interesting patterns may need to be discovered at certain high
levels of abstraction in multi-dimensional
space, we study the issues on stream data mining and develop effective,
efficient and scalable methods for mining the dynamics of data streams in
multi-dimensional space. The scope of our study includes the discovery of
changes, trends, and evolutions of characteristics, clusters, classification
models, and frequent patterns in data streams. Our methodology is to capture
sufficient statistical, compact and aggregate information in concise data
structures to facilitate the efficient processing of both continuous and ad-doc
stream mining queries. Several strategically important applications will be
explored, including network intrusion detection, telecommunication and Web data
flow analysis, and financial data flow analysis. The study will contribute to
the development of the principles and new methods for real-time data mining
systems and promote its strategically important applications, including timely
discovery of terrorist or criminal activities for homeland security protection,
intrusion detection, multi-dimensional analysis of data-intensive,
fast-changing events, and so on. The research results will be published timely
for wide dissemination, industry adoption, and education of new generation of
information technology students and workers.
Publications and Products:
Journal
articles (including accepted)
1. Dong Xin, Jiawei Han, Xiaolei Li, Zheng Shao, and Benjamin W. Wah, “Computing Iceberg Cubes by Top-Down and
Bottom-Up Integration: The StarCubing
Approach”, IEEE Transactions on Knowledge and Data Engineering,
(accepted), Aug. 2006
2. Chao Liu, Long Fei,
Xifeng Yan, Jiawei Han, and Samuel P. Midkiff,
“SOBER: Statistical Model-based
Fault Localization”, IEEE Transactions on Software Engineering,
(accept), Aug. 2006.
3. Xifeng Yan, Feida Zhu, Philip S. Yu, and Jiawei
Han, “Feature-based Substructure
Similarity Search”, ACM Transactions on Database Systems, accept for
publication, April 2006.
4.
F. Pan, K. Kamath, K. Zhang, S. Pulapura, A. Achar, J. Nunez-Iglesias, Y. Huang, X. Yan, J.
Han, H. Hu, M. Xu, X. J.
Zhou. “Integrative Array Analyzer:
A software package for analysis of cross-platform and cross-species microarray data”, Bioinformatics, 2006.
5. J. Wang, J. Han, and J. Pei, “Closed Constrained-Gradient Mining in Retail
Databases”, IEEE Transactions on Knowledge and Data Engineering,
18(6): 764-769, 2006.
6. X. Yin, J. Han, J. Yang and P. S.
Yu, “Efficient Classification
across Multiple Database Relations: A CrossMine
Approach”, IEEE Transactions on Knowledge and Data Engineering},
18(6): 770-783, 2006.
7. Charu Aggarwal,
Jiawei Han, Jianyong Wang,
and Philip S. Yu, “A Framework for
On-Demand Classification of Evolving Data Streams”, IEEE Transactions
on Knowledge and Data Engineering}, 18(5):577-789, 2006.
8. Deng Cai, Xiaofei He, Jiawei Han and
Hong-Jiang Zhang, “Orthogonal Laplacianfaces for Face Recognition”, IEEE
Transactions on Image Processing, (accepted), March 2006.
9. Hwanjo Yu, Jiong
Yang, Jiawei Han, and Xiaolei
Li, “Making SVM Scalable to Large
Data Sets Using Hierarchical Indexing”, Data Mining and Knowledge
Discovery, 11(3): 295-321, 2005.
10. D. Xin, J.
Han, X. Yan and H. Cheng, “On Compressing Frequent Patterns”, Knowledge and Data Engineering,
(Special issue on Intelligent Data Mining), accepted in Nov. 2005.
11. Jiawei Han, Yixin
Chen, Guozhu Dong, Jian
Pei, Benjamin W. Wah, Jianyong
Wang, and Y. Dora Cai, “Stream Cube: An Architecture for Multi-Dimensional Analysis of Data
Streams”, Distributed and Parallel Databases, 18(2): 173-197, 2005.
12. Xifeng Yan,
Philip Yu, and Jiawei Han, “Graph Indexing Based on Discriminative
Frequent Structure Analysis”, ACM Transactions on Database Systems,
30(4): 960-993 (2005).
13. Deng Cai, Xiaofei He and Jiawei Han,
“Document Clustering Using Locality
Preserving Indexing”, IEEE Transactions on Knowledge and Data
Engineering, 17(12):1624-1637, 2005.
14. C. Aggarwal,
J. Han, J. Wang, and P. S. Yu, “On
Efficient Algorithms for High Dimensional Projected Clustering of Data Streams”,
Data Mining and Knowledge Discovery,
10:251-272, 2005.
15. J. Wang, J. Han, Y. Lu, and P. Tzvetkov, “TFP: An Efficient
Algorithm for Mining Top-K Frequent Closed Itemsets”,
IEEE Transactions on Knowledge and Data Engineering}, 17(5):652-664, 2005.
16. K. Wang, Y. Jiang, J. X. Yu, G.
Dong, and J. Han, “Divide-and-Approximate:
A Novel Constraint Push Strategy for Iceberg Cube Mining”, IEEE Transactions on Knowledge and Data
Engineering, 17(3):354-368, 2005.
17. J. Han, J. Pei, and X. Yan, “From
Sequential Pattern Mining to Structured Pattern Mining: A Pattern-Growth
Approach,” Journal of Computer
Science and Technology, 19(3):257-279, 2004.
18. J. Han, J. Pei, Y. Yin and R. Mao,
“Mining Frequent Patterns without
Candidate Generation: A Frequent-Pattern Tree Approach,” Data Mining and Knowledge Discovery,
8(1):53-87, 2004.
19.
H. Yu, J. Han, K. C.-C. Chang, “PEBL: Web Page Classification Without Negative Examples,” IEEE Transactions on Knowledge and Data
Engineering (Special Issue on Mining and Searching the Web, 16(1):70-81,
2004.
20. G. Dong, J. Han, J. Lam, J. Pei, K.
Wang, and W. Zou, “Mining Constrained Gradients in Multi-Dimensional Databases,”
IEEE Transactions on Knowledge and Data
Engineering, 16(5):922-938, 2004.
21. J. Pei, G. Dong, W. Zou, and J. Han, “Mining
Condensed Frequent Pattern Bases,” Knowledge and Information Systems, 2004.
22. J. Pei, J. Han, and L. V. S. Lakshmanan, “Pushing
Convertible Constraints in Frequent Itemset Mining,” Data
Mining and Knowledge Discovery, 8(3):227-252, 2004.
23. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan
Approach,” IEEE Transactions on
Knowledge and Data Engineering, 16(11):1424-1440, 2004.
24. A. Doan, Y. Lu, Y. Lee, and J. Han,
“Profile-Based Object Matching for
Information Integration,” IEEE
Intelligent System, 18(5):54-59, 2003.
25. Y. Lu and J. Han, ``Cancer classification using gene expression data'',
Information Systems (Special Issue on Data Management in Bioinformatics),
28(4): 243-268, 2003.
Book and Book Chapters
- Jiawei
Han and Micheline Kamber,
Data Mining: Concepts and Techniques}, (Foreword by Jim Gray), 2nd ed.,
Morgan Kaufmann, 2006.
- Jiawei
Han, Hector Gonzalez, Xiaolei Li, and Diego Klabjan, “Warehousing and Mining Massive RFID
Data Sets”, in Xue Li, Osmar R. Zaiane, Zhanhuai Li (eds.),
Proc. 2006 Int. Conf. Advanced Data Mining and Applications (ADMA'06), Xi'An, China, August 2006, pp. 1-18. (Lecture Notes in
Computer Science, Vol. 4093, Springer Berlin/Heidelberg, 2006).
- X. Yin, J.
Han, J. Yang and P. S. Yu, “CrossMine:
Efficient Classification across Multiple Database Relations, in
Jean-Francois Boulicaut, Luc de Raedt, and Heikki Mannila (eds.), Constraint-Based Mining and Inductive
Databases, Springer-Verlag LNAI vol. 3848, pp.
172-195, 2006.
- Jiawei
Han, Benjamin W. Wah, Vijay Raghavan,
Xindong Wu, and Rajeev Rastogi
(eds.), Proceedings of the Fifth Int. Conf. on Data Mining (ICDM-2005),
(Houston, Texas, Nov. 27--30, 2005) IEEE Computer Society, New York, 2005.
(846 + xxvii pages).
- J. Yang, X. Yan, J. Han, and W. Wang, “Discovering
Evolutionary Classifier over High Speed Non-Static Stream”, in
S. Bandyopadhyay et al. (eds.), Advanced Methods for Knowledge Discovery
from Complex Data, Springer Verlag, 2005.
- J. Han, J.
Pei, and X. Yan, “Sequential Pattern
Mining by Pattern-Growth: Principles and Extensions”, in W. W.
Chu and T. Y. Lin (eds.), Recent Advances in Data Mining and Granular
Computing (Mathematical Aspects of Knowledge Discovery), Springer Verlag, 2005.
7. P. Bajcsy,
J. Han, L, Liu, J. Yang, “Survey of
Bio-Data Analysis from Data Mining Perspective,” Chapter 2 of D. Shasha,
et al. (eds.), Data Mining in
Bioinformatics, Springer Verlag, 2005, pp. 9-39.
8. H. Yu, A. Doan, and J. Han, ``Mining for Information Discovery on the Web: Overview
and Illustrative Research,” N. Zhong
and J. Liu (eds.), Intelligent Technologies forInformation
Analysis}, Springer Verlag, 2004, pp. 131-163.
9. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu, ``Mining
Frequent Patterns in Data Streams at Multiple Time Granularities'',
H. Kargupta, A. Joshi, K. Sivakumar,
and Y. Yesha (eds.), Next Generation Data Mining,
AAAI/MIT Press, 2004, pp.105-124.
Refereed Conference Publications (Refereed Workshop Publications
are omitted due to limited space)
- Chao Liu and Jiawei Han, “Failure Proximity: A
Fault Localization-Based Approach”, Proc. 14th ACM SIGSOFT
Symposium on the Foundations of Software Engineering (FSE'06), Portland, OR,
Nov. 2006.
- Hector Gonzalez, Jiawei Han, and Xiaolei Li,
“Mining
Compressed Commodity Workflows From Massive RFID Data Sets”, in
Proc. 2006 Int. Conf. on Information and Knowledge Management (CIKM'06),
Arlington, VA, Nov. 2006.
- Xiaoxin Yin, Jiawei
Han, and Philip Yu, “LinkClus: Efficient Clustering via Heterogeneous
Semantic Links”, in Proc. 2006 Int. Conf. on Very Large Data
Bases (VLDB'06), Seoul,
Korea,
Sept. 2006.
- Hector Gonzalez, Jiawei Han, and Xiaolei Li,
“FlowCube: Constructuing RFID
FlowCubes for Multi-Dimensional Analysis of
Commodity Flows”, in Proc. 2006 Int. Conf. on Very Large Data
Bases (VLDB'06), Seoul,
Korea,
Sept. 2006.
- Dong Xin,
Chen Chen, and Jiawei
Han, “Towards
Robust Indexing for Ranked Queries”, in Proc. 2006 Int. Conf. on
Very Large Data Bases (VLDB'06), Seoul,
Korea,
Sept. 2006.
- Dong Xin,
Jiawei Han, Hong Cheng, and Xiaolei
Li, “Answering
Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach”,
in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06), Seoul, Korea,
Sept. 2006.
- Dong Xin,
Hong Cheng, Xifeng Yan,
and Jiawei Han, “Extracting
Redundancy-Aware Top-K Patterns”, in Proc. 2006 ACM SIGKDD Int.
Conf. on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA,
Aug. 2006.
- Qiaozhu Mei, Dong Xin,
Hong Cheng, ChengXiang Zhai,
and Jiawei Han, “Generating
Semantic Annotations for Frequent Patterns with Context Analysis”,
in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'06), Philadelphia, PA, Aug. 2006.
- Chao Liu, Chen Chen, Jiawei Han, and Philip
Yu, “GPLAG:
Detection of Software Plagiarism by Procedure Dependency Graph Analysis”,
in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'06), Philadelphia,
PA, Aug. 2006.
- Dong Xin,
Xuehua Shen, Qiaozhu Mei, and Jiawei Han,
“Discovering
Interesting Patterns Through User's Interactive Feedback”, in
Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'06), Philadelphia, PA, Aug. 2006.
- Deng Cai,
Xiaofei He and Jiawei
Han, “Tensor
Space Model for Document Analysis”, in Proc. 2006 Int. ACM SIGIR
Conf. on Research & Development on Information Retrieval (SIGIR'06),
Seattle, WA, Aug. 2006.
- Hongyan Liu, Ying Lu, Jiawei Han, and Jun He, “Error-Adaptive
and Time-Aware Maintenance of Frequency Counts over Data Streams”,
in Proc. 2006 Int. Conf. on Web-Age Information Management (WAIM'06), Hong Kong, China, June, 2006.
- Kaushik Chakrabarti,
Venkatesh Ganti, Jiawei Han, and Dong Xin,
“Ranking
Objects Based on Relationships”, in Proc. 2006 ACM SIGMOD Int.
Conf. on Management of Data (SIGMOD'06), Chicago, IL, June 2006.
- Xiaolei Li, Jiawei
Han, and Sangkyum Kim, “Motion-Alert:
Automatic Anomaly Detection in Massive Moving Objects”, Proc.
2006 IEEE Int. Conf. on Intelligence and Security Informatics (ISI'06),
San Diego, CA, May 2006.
- Wen Jin, Anthony K. H. Tung, Jiawei Han, and Wei Wang, “Ranking
Outliers Using Symmetric Neighborhood Relationship,” in Proc.
2006 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'06),
Singapore, April 2006.
- Hongyan Liu, Jiawei
Han, Dong Xin, and Zheng
Shao, “Mining
Interesting Patterns from Very High Dimensional Data: A Top-Down Row
Enumeration Approach,” in Proc. 2006 SIAM
Int. Conf. on Data Mining (SDM'06), Bethesda,
MD, April 2006.
- Chao Liu, Xifeng
Yan, and Jiawei Han,
“Mining
Control Flow Abnormality for Logic Error Isolation,” in Proc.
2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006.
- Charu Aggarwal,
Chen Chen, and Jiawei
Han, “On
the Inverse Classification Problem and its Applications”, in
Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia,
April 2006.
- Hector Gonzalez, Jiawei
Han, Xiaolei Li, and Diego Klabjan,
“Warehousing
and Analysis of Massive RFID Data Sets”, in Proc. 2006 Int.
Conf. on Data Engineering (ICDE'06), Atlanta,
Georgia,
April 2006.
- Hongyan Liu, Jiawei
Han, Dong Xin, and Zheng
Shao, “Top-Down Mining
of Interesting Patterns from Very High Dimensional Data”, in
Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia,
April 2006.
- Dong Xin,
Jiawei Han, Zheng Shao, and Hongyan Liu,
“C-Cubing:
Efficient Computation of Closed Cubes by Aggregation-Based Checking”,
in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia,
April 2006.
- Xifeng Yan,
Feida Zhu, Jiawei Han,
and Philip Yu, “Searching
Substructures with Superimposed Distance”, in Proc. 2006 Int.
Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.
- Deng Cai,
Zheng Shao, Xiaofei He, Xifeng Yan, Jiawei Han, “Community
Mining from Multi-Relational Networks”, in Proc. 2005 European
Conf. on Principles and Practice of Knowledge Discovery in Databases
(PKDD'05), Porto, Portugal, Oct., 2005.
- Wen Jin, Martin Ester and Jiawei Han, “Efficient
Processing of Ranked Queries with Sweeping Selection”, in Proc. 2005 European Conf. on
Principles and Practice of Knowledge Discovery in Databases (PKDD'05),
Porto, Portugal, Oct., 2005.
- Xiaoxin Yin and Jiawei
Han, “Efficient
Classification from Multiple Heterogeneous Databases”, in Proc.
2005 European Conf. on Principles and Practice of Knowledge Discovery in
Databases (PKDD'05), Porto, Portugal, Oct., 2005.
- C. Liu, X. Yan,
L. Fei, J. Han, and S. Midkiff,
“SOBER:
Statistical Model-based Bug Localization”, Proc. 2005 ACM
SIGSOFT Symp. on the
Foundations of Software Engineering (FSE 2005), Lisbon, Portugal,
Sept. 2005.
- D. Xin,
J. Han, X. Yan and H. Cheng, “Mining Compressed
Frequent-Pattern Sets”, Proc. 2005 Int. Conf. on Very Large Data
Bases (VLDB'05), Trondheim,
Norway,
Aug. 2005.
- X. Yan,
H. Cheng, J. Han, and D. Xin, “Summarizing Itemset Patterns: A Profile-Based Approach”,
Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL,
Aug. 2005.
- X. Yan,
X. J. Zhou, and J. Han, “Mining Closed
Relational Graphs with Connectivity Constraints”, Proc. 2005
Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL,
Aug. 2005.
- X. Yin, J. Han, and P.S. Yu,
“Cross-Relational
Clustering with User's Guidance”, Proc. 2005 Int. Conf. on
Knowledge Discovery and Data Mining (KDD'05), Chicago, IL,
Aug. 2005.
- S. Cong, J. Han, and D. Padua, “Parallel Mining
of Closed Sequential Patterns”, Proc. 2005 Int. Conf. on
Knowledge Discovery and Data Mining (KDD'05), Chicago, IL,
Aug. 2005.
- D. Cai and X. He. “Orthogonal Locality
Preserving Indexing”, Proc. 2005 Int. Conf. on Research and
Development in Information Retrieval (SIGIR'05), Salvador, Brazil, Aug.
2005.
- X. Yin, J. Han, and J. Yang,
“Searching
for Related Objects in Relational Databases”, Proc. 2005 Int.
Conf. on Scientific and Statistical Database Management (SSDBM'05), Santa Barbara, CA,
June 2005.
- H. Hu,
X. Yan, Yu, J. Han and X. J. Zhou, “Mining Coherent Dense Subgraphs across Massive Biological Networks for
Functional Discovery”, Proc. 2005 Int. Conf. on Intelligent
Systems for Molecular Biology (ISMB 2005), Ann Arbor, MI, June 2005.
- X. Yan,
P. S. Yu, and J. Han, “Substructure
Similarity Search in Graph Databases”, Proc. 2005 ACM-SIGMOD
Int. Conf. on Management of Data (SIGMOD'05), Baltimore, Maryland, June
2005, pp. 766-777.
- C. Liu, X. Yan,
H. Yu, J. Han, and P. S. Yu, “Mining Behavior
Graphs for Backtrace of Noncrashing
Bugs”, Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05),
Newport Beach, CA, April 2005.
- H. Cheng, X. Yan, and J. Han, “SeqIndex: Indexing Sequences by Sequential Pattern
Analysis”, Proc. 2005 SIAM
Int. Conf. on Data Mining (SDM'05), Newport
Beach, CA, April
2005.
- X. Li, J. Han, X. Yin, and D. Xin, “Mining Evolving
Customer-Product Relationships in Multi-Dimensional Space”,
Proc. 2005 Int. Conf. on Data Engineering (ICDE'05), Tokyo, Japan,
April 2005, pp. 580-581.
- X. Yan,
X. J. Zhou, J. Han, “Mining Closed
Relational Graphs with Connectivity Constraints”, Proc. 2005
Int. Conf. on Data Engineering (ICDE'05), Tokyo, Japan,
April 2005, pp. 357-358.
- W. Jin, J. Han, and M. Ester,
“Mining
Thick Skylines over Large Databases”, Proc. 2004 European Conf.
on Principles of Principles and Practice of Knowledge Discovery in
Databases (PKDD’04), Pisa, Italy, Sept. 2004, pp. 255-266.
- C. Aggarwal, J. Han,
J. Wang, and P. S. Yu, “A
Framework for Projected Clustering of High Dimensional Data Streams”,
Proc. 2004 Int. Conf. on Very Large Data Bases (VLDB'04), Toronto, Canada,
Aug. 2004.
- X. Li, J. Han, and H. Gonzalez,
“High-Dimensional
OLAP: A Minimal Cubing Approach”, Proc. 2004 Int. Conf. on Very
Large Data Bases (VLDB'04), Toronto,
Canada,
Aug. 2004.
- C. Aggarwal,
J. Han, J. Wang, and P. S. Yu, “On Demand
Classification of Data Streams”, Proc. 2004 Int. Conf. on
Knowledge Discovery and Data Mining (KDD'04), Seattle, WA,
Aug. 2004.
- H. Cheng, X. Yan, and J. Han, “IncSpan: Incremental
Mining of Sequential Patterns in Large Database”, Proc.
2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04), Seattle, WA,
Aug. 2004.
- B. He, K.C.-C. Chang, and J.
Han, “Discovering
Complex Matchings across Web Query Interfaces: A
Correlation Mining Approach”, Proc. 2004 Int. Conf. on Knowledge Discovery
and Data Mining (KDD'04), Seattle,
WA, Aug. 2004.
- Y. Li, J. Han, and J. Yang, “Clustering
Moving Objects”, Proc. 2004 Int. Conf. on Knowledge Discovery
and Data Mining (KDD'04), Seattle, WA, Aug. 2004.
- Wu, M. Garland, and J. Han,
“Mining
Scale-Free Networks using Geodesic Clustering”, Proc. 2004 Int.
Conf. on Knowledge Discovery and Data Mining (KDD'04), Seattle, WA,
Aug. 2004.
- J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and
M.-C. Hsu, “Mining Sequential
Patterns by Pattern-Growth: The PrefixSpan
Approach”, IEEE Transactions on Knowledge and Data Engineering,
16(10), 2004.
- Z. Shao,
J. Han, and D. Xin, “MM-Cubing:
Computing Iceberg Cubes by Factorizing the Lattice Space”, Proc.
2004 Int. Conf. on Scientific and Statistical Database Management
(SSDBM'04), Santorini
Island, Greece,
June 2004.
- Y. Li, J. Yang, and J. Han,
“Continuous
K-Nearest Neighbor Search for Moving Objects”, Proc. 2004 Int.
Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, June 2004.
- J. Han, J. Pei, Y. Yin and R.
Mao, “Mining
Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree
Approach”, Data Mining and Knowledge Discovery, 8(1):53-87,
2004.
- X. Yan,
P. S. Yu, and J. Han, “Graph
Indexing: A Frequent Structure-based Approach”, Proc. 2004
ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'04), Paris, France,
June 2004.
- Y. D. Cai,
D. Clutter, G. Pape, J. Han, M. Welge, and L. Auvil, “MAIDS: Mining
Alarming Incidents from Data Streams”, (system demonstration),
Proc. 2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'04), Paris,
France, June 2004.
- W.-Y. Kim, Y.-K. Lee, and J.
Han, “CCMine: Efficient Mining of Confidence-Closed
Correlated Patterns”, Proc. 2004 Pacific-Asia Conf. on Knowledge
Discovery and Data Mining (PAKDD'04), Sydney, Australia,
May 2004.
- H.Yu, J. Han, K. C.-C. Chang,
“PEBL:Web PageClassification
Without Negative Examples”, IEEE Transactions onKnowledge and Data Engineering (Special Issue on
Mining and Searching the Web),16(1): 70-81, 2004.
- G. Dong, J. Han, J. Lam, J.
Pei, K. Wang, and W. Zou, “MiningConstrained Gradients in Multi-Dimensional
Databases”, IEEE Transactions on Knowledge and Data Engineering,
16(6), 2004.
- X. Yin, J. Han, J. Yang, and P.
S. Yu, “CrossMine: Efficient Classification across Multiple
Database Relations”, Proc. 2004 Int. Conf. on Data Engineering
(ICDE'04), Boston, MA, March 2004.
- J. Wang and J. Han, “BIDE: Efficient
Mining of Frequent Closed Sequences”, Proc. 2004 Int. Conf. on
Data Engineering (ICDE'04), Boston,
MA, March 2004.
- P. Tzvetkov,
X. Yan, and J. Han, “TSP: Mining Top-K
Closed Sequential Patterns”, Proc. 2003 Int. Conf. on Data Mining
(ICDM'03), Melbourne, FL, Nov. 2003.
- Y.-K. Lee, W.-Y. Kim, Y. D. Cai, and J. Han, “CoMine: Efficient Mining of Correlated Patterns”, Proc. 2003 Int. Conf. on Data Mining
(ICDM'03), Melbourne, FL, Nov. 2003.
- Aggarwal, J. Han, J. Wang, and P. S.
Yu, “A
Framework for Clustering Evolving Data Streams”, Proc.
2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.
- D. Xin,
J. Han, X. Li, and B. W. Wah, “Star-Cubing:
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration”,
Proc. 2003 Int. Conf. on Very Large
Data Bases (VLDB'03), Berlin,
Germany,
Sept. 2003.
- X. Yan
and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns”,
Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'03), Washington, D.C., Aug. 2003..
- H. Yu, J. Yang, and J. Han,
“Classifying
Large Data Sets Using SVM with Hierarchical Clusters”, Proc.
2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'03), Washington, D.C., Aug. 2003.
- J. Wang, J. Han, and J. Pei,
“CLOSET+:
Searching for the Best Strategies for Mining Frequent Closed Itemsets”, Proc. 2003 ACM SIGKDD Int. Conf.
on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C.,
Aug. 2003.
- H. Wang, W. Fan, P. S. Yu, and
J. Han, “Mining
Concept-Drifting Data Streams using Ensemble Classifiers”, Proc.
2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'03), Washington, D.C., Aug. 2003.
- S.-J. Ko and J. Han, ``Mining the Typical Preference
of Collaborative User Group'', Proc. 2003 Int. Conf. on Conceptual
Modeling (ER'03), Skokie,
IL, Oct. 2003.
- H. Yu, J. Yang, W. Wang, and J.
Han, ``Discovering Compact and Highly Discriminative Features or Feature
Combinations of Drug Activities Using Support Vector Machine'', Proc. 2003
IEEE Computer Society Bioinformatics Conf. (CSB'03), Stanford, California, Aug.
2003.
Project
Impact
§
Education: Parts of the new research results are used in Data
Mining courses (CS412, CS512) for both undergraduate and graduate students
being taught in the Department of Computer Science, the University of Illinois
at Urbana-Champaign. Moreover, the research results will be
published timely in international conferences and journals and be distributed
world-wide for education and research. The new progress will also
be integrated into the new edition of my data mining textbook and other
research collections.
§
Collaborations: For this project we have established collaborations
with IBM T.J. Watson
Research Center,
Microsoft Research, and NCSA (National Center of Supercomputer
Applications). Through such collaborations we expect to have access to
real datasets and applications and produce more research results.
Current and Future Activities
The following are some of the highlights of our ongoing work. Please refer to the section: Publications and
Products section for related references
§
Development of
multi-dimensional stream data analysis techniques
§
Development of
efficient and effective methods for mining sequential patterns in data streams
§
Development of
efficient methods for mining frequent patterns
§
Development of
efficient and scalable methods for classifications in evolving data streams
§
Development of efficient and scalable methods
for clustering evolving data streams
Area
Background
This project is based on the previous works on data
mining, stream data/query processing, and stream data
mining. There have been many research papers published on
these themes. Several textbooks provide good overviews of data
mining principles and algorithms, including (Han and Kamber,
2006), (Hand, Mannila, and Smyth, 2001) and (Hastie, Tibshirani,
and Friedman, 2001). For
stream data processing, (Babcock, et al. 2002) gives a comprehensive
survey. There have been a few
previous studies on stream data mining, including (Aggarwal,
et al. 2003), (Chen et al. 2002), (Hulten et al
2001), (Domingos and Hulten
2000), (Manku and Motwani
2002), (O'Callaghan et al 2002), and (Wang et al. 2003).
Area
References
[1] C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A Framework for Clustering Evolving Data Streams. VLDB 2003.
[2] B.
Babcock, S. Babu, M. Datar,
R. Motwani, and J. Widom. Models and issues in data stream systems. POD 2002.
[3] Y. Chen, G. Dong, J. Han, B. W. Wah, and J.
Wang, Multi-Dimensional Regression
Analysis of Time-Series Data Streams, VLDB 2002.
[4] A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. Processing Complex Aggregate Queries
over Data Streams.SIGMOD 2002.
[5]
M. Greenwald, S. Khanna. Space-Efficient Online Computation of Quantile
Summaries. SIGMOD 2001.
[6] J. Han and M. Kamber.
Data Mining: Concepts and Techniques, Morgan
Kaufmann Publishers, 2001.
[7] D. J. Hand, H. Mannila, and P. Smyth. Principles of Data Mining, MIT Press, 2001.
[8]
T. Hastie, R. Tibshirani, and J. Friedman. The
Elements of Statistical Learning: Data Mining, Inference, and Prediction,
Springer-Verlag 2001.
[8] G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. KDD
2001.
[9] G. Manku and R. Motwani. Approximate frequency counts over data streams. VLDB 2002.
[10] L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. High-performance clustering of streams and
large data sets. ICDE 2002.
[11] H.
Wang, W. Fan, P. S. Yu, and J. Han.
Mining Concept-Drifting Data Streams using Ensemble
Classifiers. KDD 2003.
Potential Related Projects
This project is related to most of data
mining and stream data processing projects. In particularly, it is
related to P.I.'s NSF IIS 020-9199 (Mining Sequential and Structured
Patterns: Scalability, Flexibility, Extensibility and Applicability), and P.I.'s ONR project,
MAIDS (Mining Alarming Incidents in Data Streams). We wish to collaborate
or exchange research ideas with most of the research projects related to
knowledge discovery in databases, stream data processing, and their
applications, such as homeland security, computer networks, etc.
Project Web site URL: http://www.cs.uiuc.edu/~hanj/projs/streamine.htm
Online software: Online software related to this project can be downloaded
at www.illimine.cs.uiuc.edu
Online resources: Research publications related to
this project can be downloaded at Selected Publications
(http://www.cs.uiuc.edu/~hanj/pubs/pub.htm)