NSF/BDI: Collaborative Research: Endowing Biological Databases with Analytical Power: Indexing, Querying, and Mining of Complex Biological Structures

National Science Foundation Award Number: NSF BDI 05-15813 (September15, 2005―August 31, 2008)

 

Contact Information

 

Jiawei Han,  PI
Department of Computer Science
University of Illinois, Urbana-Champaign
1304 West Springfield Ave. , Urbana, Illinois 61801 U.S.A.
Office: (217) 333-6903,   Fax: (217) 265-6494

E-mail: hanj at cs.uiuc.edu, URL: http://www.cs.uiuc.edu/~hanj

 

List of Supported Students and Staff

 

§         Dong Xin, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign

§         Deng Cai, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign

Project Award Information

  • Award Number: NSF BDI 05-15813  
  • Duration: July 15, 2005―June 30, 2008
  • Title: NSF/BDI: Collaborative Research: Endowing Biological Databases with Analytical Power: Indexing, Querying, and Mining of Complex Biological Structures.
  • Keywords:  Biological databases, biological data mining, scalable mining algorithms, data mining applications

Project Summary

We propose to perform in-depth research and development of new, powerful, and scalable indexing, query processing, and data mining methods for construction of scalable, efficient, and analysis-based, heterogeneous biological database systems.   Our proposed study will work on a set of typical genomic and biological databases and focus on  (1) development of efficient and scalable methods for indexing and accessing of complex biological structures, with the following emphases on a) mining structural patterns in large multi-graphs, b) mining dense recurrent graphs/networks, c) discriminative feature-based indexing of biologic structures, and d) similarity search on biologic structures, and (2)   The project investigates efficient and effective approaches to the implementation of this system.  The project also strives to ensure that the developed technology will enable the development of more advanced analytical biological database systems for broad applications.

Publications and Products:

Journal articles (including accepted)

1.       Jian Pei, Jiawei Han, Hongjun Lu, Shojiro Nishio, Shiwei Tang, and Dongqing Yang, “H-Mine: Fast and Space-Preserving Frequent Pattern Mining in Large Databases”, IIE Transactions, 39:593-605, 2007.

2.       Chulyun Kim, Sangkyum Kim, Russell Dorer, Dan Xie, Jiawei Han, and Sheng Zhong, “TagSmart: Analysis and Visualization for Yeast Mutant Fitness Data Measured by Tag Microarrays”, BMC Bioinformatics, 8:128, April 2007. (http://www.biomedcentral.com/1471-2105/8/128)

3.       Jian Pei, Jiawei Han, and Wei Wang, “Constraint-based sequential pattern mining: the pattern-growth methods”, Journal of Intelligent Information Systems, 28(2):133-160, 2007.

4.       Jiawei Han, Hong Cheng, Dong Xin, and Xifeng Yan, “Frequent Pattern Mining: Current Status and Future Directions”, Data Mining and Knowledge Discovery, 14, 2007. (Online version published on January 27, 2007, DOI 10.1007/s10618-006-0059-1 SpringerLink).

5.       Dong Xin, Jiawei Han, Xifeng Yan and Hong Cheng, “On Compressing Frequent Patterns”, Knowledge and Data Engineering (Special issue on Intelligent Data Mining), 60(1): 5-29, 2007.

6.       Dong Xin, Jiawei Han, Xiaolei Li, Zheng Shao, and Benjamin W. Wah, “Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach”, IEEE Transactions on Knowledge and Data Engineering, 19(1): 111-126, 2007.

7.       Chao Liu, Long Fei, Xifeng Yan, Jiawei Han, and Samuel P. Midkiff, “Statistical Debugging: A Hypothesis Testing-based Approach”, IEEE Transactions on Software Engineering, 32(10):831-848, 2006.

8.       Yixin Chen, Guozhu Dong, Jiawei Han, Jian Pei, Benjamin W. Wah, and Jianyong Wang, “Regression Cubes with Lossless Compression and Aggregation”, IEEE Transactions on Knowledge and Data Engineering, 18(12): 1585-1599, 2006.

9.       Xifeng Yan, Feida Zhu, Philip S. Yu, and Jiawei Han, “Feature-based Substructure Similarity Search”, ACM Transactions on Database Systems, 31(4): 1418-1453, 2006.

10.   Deng Cai, Xiaofei He, Jiawei Han and Hong-Jiang Zhang, “Orthogonal Laplacianfaces for Face Recognition”, IEEE Transactions on Image Processing, 15(11): 3608-3614, 2006.

11.   F. Pan, K. Kamath, K. Zhang, S. Pulapura, A. Achar, J. Nunez-Iglesias, Y. Huang, X. Yan, J. Han, H. Hu, M. Xu, X. J. Zhou. “Integrative Array Analyzer: A software package for analysis of cross-platform and cross-species microarray data”, Bioinformatics, 22(13): 1665-1667, 2006.

12.   J. Wang, J. Han, and J. Pei, “Closed Constrained-Gradient Mining in Retail Databases”, IEEE Transactions on Knowledge and Data Engineering, 18(6): 764-769, 2006.

13.   X. Yin, J. Han, J. Yang and P. S. Yu, “Efficient Classification across Multiple Database Relations: A CrossMine Approach”, IEEE Transactions on Knowledge and Data Engineering}, 18(6): 770-783, 2006.

14.   Charu Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu, “A Framework for On-Demand Classification of Evolving Data Streams”, IEEE Transactions on Knowledge and Data Engineering, 18(5):577-789, 2006.

15.   Hwanjo Yu, Jiong Yang, Jiawei Han, and Xiaolei Li, “Making SVM Scalable to Large Data Sets Using Hierarchical Indexing”, Data Mining and Knowledge Discovery, 11(3): 295-321, 2005.

16.   Jiawei Han, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Wah, Jianyong Wang, and Y. Dora Cai, “Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams”, Distributed and Parallel Databases, 18(2): 173-197, 2005.

17.   Xifeng Yan, Philip Yu, and Jiawei Han, “Graph Indexing Based on Discriminative Frequent Structure Analysis”, ACM Transactions on Database Systems, 30(4): 960-993 2005.

18.   Deng Cai, Xiaofei He and Jiawei Han, “Document Clustering Using Locality Preserving Indexing”, IEEE Transactions on Knowledge and Data Engineering, 17(12):1624-1637, 2005.

19.   C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “On Efficient Algorithms for High Dimensional Projected Clustering of Data Streams”, Data Mining and Knowledge Discovery, 10:251-272, 2005.

20.   Petre Tzvetkov, Xifeng Yan, Jiawei Han, “TSP: Mining top-k closed sequential patterns, Knowl. Inf. Syst., 7(4): 438-457, 2005.

21.   J. Wang, J. Han, Y. Lu, and P. Tzvetkov, “TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets”, IEEE Transactions on Knowledge and Data Engineering}, 17(5):652-664, 2005.

22.   K. Wang, Y. Jiang, J. X. Yu, G. Dong, and J. Han, “Divide-and-Approximate: A Novel Constraint Push Strategy for Iceberg Cube Mining”, IEEE Transactions on Knowledge and Data Engineering, 17(3):354-368, 2005.

Book and Book Chapters

  1. Xifeng Yan and Jiawei Han, “Discovery of Frequent Substructures”, in D. Cook and L. Holder (ed.), Mining Graph Data, John Wiley & Sons, pp. 99-115, 2007.
  2. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, (Foreword by Jim Gray), 2nd ed., Morgan Kaufmann, 2006.
  3. Charu C. Aggarwal, Jiawei Han, Jianyong Wang and Philip S. Yu, “On Clustering Massive Data Streams: A Summarization Paradigm”, in C. C. Aggarwal (ed.), Data Streams: Models and Algorithms, Kluwer Academic Publishers, pp. 9-38, 2006.
  4. Jiawei Han, Y. Dora Cai, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Wah, and Jianyong Wang, “Multi-Dimensional Analysis of Data Streams Using Stream Cubes”, in C. C. Aggarwal (ed.), Data Streams: Models and Algorithms, Kluwer Academic Publishers, pp. 103-126, 2006.
  5. Jiawei Han, Hector Gonzalez, Xiaolei Li, and Diego Klabjan, “Warehousing and Mining Massive RFID Data Sets” (an invited paper), in Xue Li, Osmar R. Zaiane, Zhanhuai Li (eds.), Proc. 2006 Int. Conf. Advanced Data Mining and Applications (ADMA'06), Xi'An, China, August 2006, pp. 1-18. (Lecture Notes in Computer Science, Vol. 4093, Springer Berlin/Heidelberg, 2006).
  6. X. Yin, J. Han, J. Yang and P. S. Yu, “CrossMine: Efficient Classification across Multiple Database Relations”, in Jean-Francois Boulicaut, Luc de Raedt, and Heikki Mannila (eds.), Constraint-Based Mining and Inductive Databases, Springer-Verlag LNAI vol. 3848, pp. 172-195, 2006.
  7. Jiawei Han, Benjamin W. Wah, Vijay Raghavan, Xindong Wu, and Rajeev Rastogi (eds.), Proceedings of the Fifth Int. Conf. on Data Mining (ICDM-2005), (Houston, Texas, Nov. 27--30, 2005) IEEE Computer Society, New York, 2005. (846 + xxvii pages).
  8. J. Yang, X. Yan, J. Han, and W. Wang, “Discovering Evolutionary Classifier over High Speed Non-Static Stream”, in S. Bandyopadhyay et al. (eds.),  Advanced Methods for Knowledge Discovery from Complex Data, Springer Verlag, 2005.
  9. J. Han, J. Pei, and X. Yan, “Sequential Pattern Mining by Pattern-Growth: Principles and Extensions”, in W. W. Chu and T. Y. Lin (eds.), Recent Advances in Data Mining and Granular Computing (Mathematical Aspects of Knowledge Discovery), Springer Verlag, 2005.

 

Refereed Conference Publications (Refereed Workshop Publications are omitted due to limited space)

 

1.       Chao Liu, Xiangyu Zhang, Jiawei Han, Yu Zhang and Bharat K. Bhargava, “Failure Indexing: A Dynamic Slicing Based Approach”, in Proc. 2007 IEEE Int. Conf. on Software Maintenance (ICSM'07), Paris, France, Oct. 2007.

2.       Deng Cai, Xiaofei He, and Jiawei Han, “A Unified Subspace Learning Framework for Content-Based Image Retrieval”, in Proc. 2007 Int. Conf. on ACM Multimedia (ACM-MM'07), Augsburg, Germany, Sept. 2007.

3.       Tianyi Wu, Yuguo Chen and Jiawei Han, “Association Mining in Large Databases: A Re-Examination of Its Measures”, in Proc. 2007 Int. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'07), Warsaw, Poland, Sept. 2007.

4.       Chen Chen, Xifeng Yan, Philip S. Yu, Jiawei Han, DongQing Zhang, and Xiaohui Gu, “Towards Graph Containment Search and Indexing”, in Proc. 2007 Int. Conf. on Very Large Data Bases (VLDB'07), Vienna, Austria, Sept. 2007.

5.       Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, and John Paul Sondag, “Adaptive Fastest Path Computation on a Road Network: A Traffic Mining Approach”, in Proc. 2007 Int. Conf. on Very Large Data Bases (VLDB'07), Vienna, Austria, Sept. 2007.

6.       Xiaolei Li and Jiawei Han, “Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data”, in Proc. 2007 Int. Conf. on Very Large Data Bases (VLDB'07), Vienna, Austria, Sept. 2007.

7.       Tainyi Wu, Xiaolei Li, Dong Xin, Jiawei Han, Jacob Lee, and Ricardo Redder, “DataScope: Viewing Database Contents in Google Maps' Way”, in Proc. 2007 Int. Conf. on Very Large Data Bases (VLDB'07), Vienna, Austria, Sept. 2007 (system demo).

8.       Xiaoxin Yin, Jiawei Han, and Philip S. Yu, “Truth Discovery with Multiple Conflicting Information Providers on the Web”, in Proc. 2007 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'07), San Jose, CA, Aug. 2007.

9.       Xiaolei Li, Jiawei Han, Jae-Gil Lee, and Hector Gonzalez, “Traffic Density-based Discovery of Hot Routes in Road Networks”, in Proc. 2007 Int. Symp. on Spatial and Temporal Databases (SSTD'07), Boston, MA, July 2007.

10.   Deng Cai, Xiaofei He and Jiawei Han, “Isometric Projection”, in Proc. 2007 AAAI Conf. on Artificial Intelligence (AAAI-07), Vancouver, B. C., Canada, July 2007.

11.   Wen Jin, Anthony K.H. Tung, Martin Ester, and Jiawei Han, “On Efficient Processing of Subspace Skyline Queries on High Dimensional Data”, in Proc. 2007 Int. Conf. on Scientific and Statistical Database Management (SSDBM'07), Banff, Canada, July 2007.

12.   Deng Cai, Xiaofei He, Yuxiao Hu, Jiawei Han, and Thomas Huang, “Learning a Spatially Smooth Subspace for Face Recognition”, in Proc. 2007 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'07), Minneapolis, MN, June 2007.

13.   Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang, “Trajectory Clustering: A Partition-and-Group Framework”, in Proc. 2007 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'07), Beijing, China, June 2007.

14.   Dong Xin, Jiawei Han, and Kevin C.-C. Chang, “Progressive and Selective Merge: Computing Top-K with Ad-hoc Ranking Functions”, in Proc. 2007 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'07), Beijing, China, June 2007.

15.   Feida Zhu, Xifeng Yan, Jiawei Han, and Philip S. Yu, “gPrune: A Constraint Pushing Framework for Graph Pattern Mining”, in Proc. 2007 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'07), Nanjing, China, May 2007. (Best Student Paper Award)

16.   Jiawei Han, Hong Cheng, Dong Xin, and Xifeng Yan, “Frequent Pattern Mining: Current Status and Future Directions”, Data Mining and Knowledge Discovery, 14(1), 2007. (Online version published on January 27, 2007, DOI 10.1007/s10618-006-0059-1 SpringerLink).

17.   Jing Gao, Wei Fan, and Jiawei Han, “A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions”, in Proc. 2007 SIAM Int. Conf. on Data Mining (SDM'07), Minneapolis, MN, April 2007.

18.   Xiaolei Li, Jiawei Han, Sangkyum Kim, and Hector Gonzalez, “ROAM: Rule- and Motif-Based Anomaly Detection in Massive Moving Object Data Sets”, in Proc. 2007 SIAM Int. Conf. on Data Mining (SDM'07), Minneapolis, MN, April 2007.  (One of “Best of SDM’07”)

19.   Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu, “Discriminative Frequent Pattern Analysis for Effective Classification”, in Proc. 2007 Int. Conf. on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007.

20.   Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu, and Hong Cheng, “Mining Colossal Frequent Patterns by Core Pattern Fusion”, in Proc. 2007 Int. Conf. on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007. (Best Student Paper Award)

21.   Hector Gonzalez, Jiawei Han, and Xuehua Shen, “Cost-conscious Cleaning of Massive RFID Data Sets”, in Proc. 2007 Int. Conf. on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007.

22.   Xiaoxin Yin, Jiawei Han, and Philip S. Yu, “Object Distinction: Distinguishing Objects with Identical Names by Link Analysis”, in Proc. 2007 Int. Conf. on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007.

23.   Wen Jin, Martin Ester, Zengjian Hu, and Jiawei Han, “The Multi-Relational Skyline Operator”, in Proc. 2007 Int. Conf. on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007.

24.   Deng Cai, Xiaofei He, Kun Zhou, Jiawei Han and Hujun Bao, “Locality Sensitive Discriminant Analysis”, in Proc. 2007 Int. Joint Conf. on Artificial Intelligence (IJCAI'07), Hyderabad, India, Jan. 2007.

25.   Chao Liu, Zeng Lian, and Jiawei Han, “How Bayesians Debug?”, in Proc. 2006 Int. Conf. on Data Mining (ICDM'06), Hong Kong, China, Dec. 2006.

26.   Hong Cheng, Philip S. Yu, and Jiawei Han, “AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery”, in Proc. 2006 Int. Conf. on Data Mining (ICDM'06), Hong Kong, China, Dec. 2006.

27.   Chao Liu and Jiawei Han, “Failure Proximity: A Fault Localization-Based Approach”, in Proc. 14th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE'06), Portland, OR, Nov. 2006.

28.   Hector Gonzalez, Jiawei Han, and Xiaolei Li, “Mining Compressed Commodity Workflows From Massive RFID Data Sets”, in Proc. 2006 Int. Conf. on Information and Knowledge Management (CIKM'06), Arlington, VA, Nov. 2006.

29.   Xiaoxin Yin, Jiawei Han, and Philip Yu, “LinkClus: Efficient Clustering via Heterogeneous Semantic Links”, in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06), Seoul, Korea, Sept. 2006.

30.   Hector Gonzalez, Jiawei Han, and Xiaolei Li, “FlowCube: Constructuing RFID FlowCubes for Multi-Dimensional Analysis of Commodity Flows”, in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06), Seoul, Korea, Sept. 2006.

31.   Dong Xin, Chen Chen, and Jiawei Han,  Towards Robust Indexing for Ranked Queries”, in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06), Seoul, Korea, Sept. 2006.

32.   Dong Xin, Jiawei Han, Hong Cheng, and Xiaolei Li, “Answering Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach”, in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06), Seoul, Korea, Sept. 2006.

33.   Dong Xin, Hong Cheng, Xifeng Yan, and Jiawei Han, “Extracting Redundancy-Aware Top-K Patterns”, in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA, Aug. 2006.

34.   Qiaozhu Mei, Dong Xin, Hong Cheng, ChengXiang Zhai, and Jiawei Han, “Generating Semantic Annotations for Frequent Patterns with Context Analysis”, in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA, Aug. 2006. (Best Student Paper Runner-Up Award)

35.   Chao Liu, Chen Chen, Jiawei Han, and Philip Yu, “GPLAG: Detection of Software Plagiarism by Procedure Dependency Graph Analysis”, in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA, Aug. 2006.

36.   Dong Xin, Xuehua Shen, Qiaozhu Mei, and Jiawei Han, “Discovering Interesting Patterns Through User's Interactive Feedback”, in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA, Aug. 2006.

37.   Deng Cai, Xiaofei He and Jiawei Han, “Tensor Space Model for Document Analysis”, in Proc. 2006 Int. ACM SIGIR Conf. on Research & Development on Information Retrieval (SIGIR'06), Seattle, WA, Aug. 2006.

38.   Hongyan Liu, Ying Lu, Jiawei Han, and Jun He, “Error-Adaptive and Time-Aware Maintenance of Frequency Counts over Data Streams”, in Proc. 2006 Int. Conf. on Web-Age Information Management (WAIM'06), Hong Kong, China, June, 2006.

39.   Kaushik Chakrabarti, Venkatesh Ganti, Jiawei Han, and Dong Xin, “Ranking Objects Based on Relationships”, in Proc. 2006 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'06), Chicago, IL, June 2006.

40.   Xiaolei Li, Jiawei Han, and Sangkyum Kim, “Motion-Alert: Automatic Anomaly Detection in Massive Moving Objects”, Proc. 2006 IEEE Int. Conf. on Intelligence and Security Informatics (ISI'06), San Diego, CA, May 2006.

41.   Wen Jin, Anthony K. H. Tung, Jiawei Han, and Wei Wang, “Ranking Outliers Using Symmetric Neighborhood Relationship,” in Proc. 2006 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'06), Singapore, April 2006.

42.   Hongyan Liu, Jiawei Han, Dong Xin, and Zheng Shao, “Mining Interesting Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach,” in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006. (One of “Best of SDM’06”)

43.   Chao Liu, Xifeng Yan, and Jiawei Han, “Mining Control Flow Abnormality for Logic Error Isolation,” in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006.

44.   Charu Aggarwal, Chen Chen, and Jiawei Han, “On the Inverse Classification Problem and its Applications”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.

45.   Hector Gonzalez,  Jiawei Han, Xiaolei Li, and Diego Klabjan, “Warehousing and Analysis of Massive RFID Data Sets”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006. (Best Student Paper Award)

46.   Hongyan Liu, Jiawei Han, Dong Xin, and Zheng Shao, “Top-Down Mining of Interesting Patterns from Very High Dimensional Data”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.

47.   Dong Xin, Jiawei Han, Zheng Shao, and Hongyan Liu, “C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.

48.   Xifeng Yan, Feida Zhu, Jiawei Han, and Philip Yu, “Searching Substructures with Superimposed Distance”, in Proc. 2006 Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.

49.   Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, Jiawei Han, “Community Mining from Multi-Relational Networks”, in Proc. 2005 European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'05), Porto, Portugal, Oct., 2005.

50.   Wen Jin, Martin Ester and Jiawei Han, “Efficient Processing of Ranked Queries with Sweeping Selection”,  in Proc. 2005 European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'05), Porto, Portugal, Oct., 2005.

51.   Xiaoxin Yin and Jiawei Han, “Efficient Classification from Multiple Heterogeneous Databases”, in Proc. 2005 European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'05), Porto, Portugal, Oct., 2005.

52.   C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, “SOBER: Statistical Model-based Bug Localization”, Proc. 2005 ACM SIGSOFT Symp. on the Foundations of Software Engineering (FSE 2005), Lisbon, Portugal, Sept. 2005.

53.   D. Xin, J. Han, X. Yan and H. Cheng, “Mining Compressed Frequent-Pattern Sets”, Proc. 2005 Int. Conf. on Very Large Data Bases (VLDB'05), Trondheim, Norway, Aug. 2005.

54.   X. Yan, H. Cheng, J. Han, and D. Xin, “Summarizing Itemset Patterns: A Profile-Based Approach”, Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL, Aug. 2005. (Best Student Paper Runner-Up Award)

55.   X. Yan, X. J. Zhou, and J. Han, “Mining Closed Relational Graphs with Connectivity Constraints”, Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL, Aug. 2005.

56.   X. Yin, J. Han, and P.S. Yu, “Cross-Relational Clustering with User's Guidance”, Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL, Aug. 2005.

57.   S. Cong, J. Han, and D. Padua, “Parallel Mining of Closed Sequential Patterns”, Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05), Chicago, IL, Aug. 2005.

58.   D. Cai and X. He. “Orthogonal Locality Preserving Indexing”, Proc. 2005 Int. Conf. on Research and Development in Information Retrieval (SIGIR'05), Salvador, Brazil, Aug. 2005.

 

Project Impact

 

§         Education:  Parts of the new research results are used in Data Mining courses (CS412, CS512) for both undergraduate and graduate students being taught in the Department of Computer Science, the University of Illinois at Urbana-Champaign.    Moreover, the research results have been and will continuously be published timely in international conferences and journals and be distributed world-wide for education and research.   The new progress will also be integrated into the new edition of my data mining textbook and other research collections.

§         Collaborations: For this project we have established collaborations with Department of Computational and Molecular Biology of the University of Southern California, IBM T.J. Watson Research Center, Microsoft Research, Boeing, Intel, Google, and NCSA (National Center of Supercomputer Applications).  Through such collaborations we expect to have access to real datasets and applications and produce more research results.

 

Current and Future Activities

The following are some of the highlights of our ongoing work.  Please refer to the section: Publications and Products section for related references

§         Development of efficient and scalable mechanisms for mining biological networks: (based on) ISMB’05.

§         Development of multi-dimensional stream data analysis techniques: VLDB’04, VLDB’06.

§         Development of efficient methods for mining frequent, sequential and structured patterns: TODS’05, TODS’06, ICDE’06 (C-Cubing)

 Area Background

 

This project is based on the previous works on data mining, stream data/query processing, and moving object databases.    There have been many research papers published on these themes.   Several textbooks provide good overviews of data mining principles and algorithms, including (Han and Kamber, 2006), (Hand, Mannila, and Smyth,  2001) and  (Hastie, Tibshirani, and Friedman,  2001) and bioinformatics, such as (Durbin et al. 1998), (Pevzner 2000), and (Waterman 1995).  

 

Area References

 

1.       Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang,  Multi-Dimensional Regression Analysis of Time-Series Data Streams, VLDB 2002.

2.       Y. Cheng and G. Church, Biclustering of Expression Data, Proc. 2000 Int. Conf. on Intelligent Systems for Molecular Biology (ISMB'00), 2000.

3.       J, Cohen.  Bioinformatics---An introduction for computer scientists.  ACM Computing Surveys. 36(2):122-158, 2004.

4.       R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probability Models of Proteins and Nucleic Acids, Cambridge University Press, 1998

5.       H. Hu, X. Yan, Y. Huang, J. Han, X. J. Zhou: Mining coherent dense subgraphs across massive biological networks for functional discovery. ISMB, 2005.

6.       J. Han and M. Kamber. Data Mining: Concepts and Techniques, 2nd ed.,  Morgan Kaufmann, 2006.

7.       D. J. Hand, H. Mannila, and P. Smyth. Principles of Data Mining, MIT Press, 2001.

8.       T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag 2001.

9.       P. A. Pevzner.  Computational Molecular Biology: An Algorithmic Approach.  MIT Press.  2000.

10.   M. S. Waterman.  Introduction to Computational Biology: Maps, Sequences, and Genomes.  CRC Press. 1995

Potential Related Projects

This project is related to most of data mining and biological database and data mining.  In particularly, it is related to P.I.'s NSF IIS 020-9199 (Mining Sequential and Structured Patterns: Scalability, Flexibility, Extensibility and Applicability), and P.I.'s NSF IIS-03-08215 (Mining Dynamics of Data Streams in Multi-Dimensional Space).  We wish to collaborate or exchange research ideas with most of the research projects related to knowledge discovery in databases, biological databases and data analysis, bioinformatics, and their applications.

Project Web site URL:  http://www.cs.uiuc.edu/~hanj/projs/biobdi.htm

Online software:  Online software related to this project can be downloaded at www.illimine.cs.uiuc.edu

Online resources:  Research publications related to this project can be downloaded at Selected Publications