NSF III: Medium: Collaborative Research: StructNet: Constructing and Mining Structure-Rich Information Networks for Scientific Research

National Science Foundation Award Number: NSF IIS 17-04532 (07/01/2017--06/30/2021)

 

Contact Information

 

·         Jiawei Han, Co-PI
Department of Computer Science
University of Illinois, Urbana-Champaign
201 N. Goodwin Ave., Urbana, Illinois 61801 U.S.A.
Office: (217) 333-6903

Fax: (217) 265-6494

E-mail: hanj at cs.uiuc.edu

URL: http://www.cs.uiuc.edu/~hanj

 

List of Supported Students and Staff

 

·         Ahmed Elkishky, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign (collaborative)

·         Jiaming Shen, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign

·         Chao Zhang, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign

·         Honglei Zhuang, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign

Project Award Information

·         Award Number: NSF IIS 17-04532

·         Duration: 07/01/2017--06/30/2021

·         Title: NSF III: Medium: Collaborative Research: StructNet: Constructing and Mining Structure-Rich Information Networks for Scientific Research

·         Keywords:  Data mining; text mining; information extraction; data integration; information network construction; information network mining; big data; efficiency and scalability; scientific applications

Project Summary

To transform massive, unstructured but interconnected data into actionable knowledge, we propose a new, promising paradigm: data-to-network-to-knowledge (D2N2K), by integrating semi-structured/unstructured data, constructing organized heterogeneous information networks (hence called StructNet), and then developing powerful mining mechanisms on such organized networks.   In this work, we propose to focus our study on biomedical sciences and investigate the principles, methodologies and algorithms for (i) construction of relatively structured heterogeneous information networks (called MediNet) by mining biomedical research corpora, and (ii) exploration and mining of the networks so constructed.  We propose to investigate the principles, methodologies, and algorithms for construction of StructNet, and exploration and mining of StructNet and build a comprehensive network construction and analysis engine from massive research data and explore its broad applications.  We take biomedical domain as a case study, and dive deeply into the contents of biomedical literature and other sources of biomedical data to construct MediNet and explore applications in health domain.

Intellectual Merit:

 

·         Developing new principles, methods, and technologies for construction of research information networks from massive, unstructured research datasets: We develop methods including (1) attribute value extraction, (2) relation typing, and (3) high-quality claim mining, which aim at turning interrelated, unstructured and semi-structured research data into structured information networks, enriching semantic structures of existing heterogeneous networks and will play a critical role for turning data into structured networks and actionable knowledge.

·         Enriching the principles and technologies of network science and data mining via exploration and mining of structured information networks: We develop novel methods for in-depth exploring and mining such structured networks, including (1) context-aware multi-dimensional summary of StructNet, and (2) task-guided embedding approaches for mining StructNet, which will advance research in both data science and network science.

 

Broader Impacts: 

 

·         Benefits scientific research: The work will generate an extensible framework to facilitate literature-based scientific research. Fundamentally different from the existing research services (e.g., Google Scholar, Microsoft Academic Search, AMiner, and CiteSeerX), StructNet dives deeply into the contents to conduct corpus-wide information network construction and mining to benefit research and scientific discovery. The case study on MediNet will have a big impact on the biomedical domain, which can potentially tackle problems from disease diagnosis, drug discovery, to precision medicine. The methodology can be transferred to many other domains that are heavily with text data and semi-structured data.

·         Benefits network science, data mining, and information technology: It will generate new methodologies and tools for structuring and exploring massive interconnected data. As we did before, the technologies will be transferred to interested research centers and industry.

·         Benefits education and training: The project will train a good number of researchers, especially female and minority students, educating a great number of undergraduates and graduates via our research publications, tutorials, massive online courses, workshops, and demo-systems. 

·         The research results are to be published in various research and application forums and be integrated into the educational programs at UIUC.  The progress of the project and the research results are also disseminated via the project Web site (http://www.cs.uiuc.edu/homes/hanj/projs/structnet.htm).

Selected Publications and Products:

Books (authored)

·         Jialu Liu, Jingbo Shang and Jiawei Han,  Phrase Mining from Massive Text and Its ApplicationsMorgan & Claypool Publishers, 2017 (Series: Synthesis Lectures on Data Mining and Knowledge Discovery)

 

Journal and Refereed Conference Publications

1.      Jingbo Shang, Chao Zhang, Jiaming Shen, Jiawei Han, "Towards Multidimensional Analysis of Text Corpora", Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), (Conference Tutorial), London, UK, Aug. 2018

2.      Carl Yang, Xiaolin Shi, Jie Luo and Jiawei Han, "I Know You’ll Be Back: Interpretable New User Clustering and Churn Prediction on a Mobile Social Application", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

3.      Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni and Jiawei Han, "TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

4.      Qi Li, Meng Jiang, Xikun Zhang, Meng Qu, Timothy Hanratty, Jing Gao and Jiawei Han, "TruePIE: Discovering Reliable Patterns in Pattern-Based Information Extraction", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

5.      Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler and Jiawei Han, "HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

6.      Yu Shi, Qi Zhu, Fang Guo, Chao Zhang and Jiawei Han, "Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

7.      Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu and Jiawei Han, "End-to-End Reinforcement Learning for Automatic Taxonomy Induction", in Proc. of 2018 Annual Meeting of the Association for Computational Linguistics (ACL'18), Melbourne, Australia, July 2018 

8.      Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha and Jiawei Han, "Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach", in Proc. of 2018  Int. ACM SIGIR Conf. on Research and Development in Information Retrieval  (SIGIR'18), Ann Arbor, MI, July 2018 

9.      Ahmed El-Kishky, Frank Xu, Aston Zhang, Stephen Macke and Jiawei Han, "Entropy-Based Subword Mining for Word Embeddings", in Proc. of the 2nd Workshop on Subword and Character Level Models in NLP (SCLeM'18) (at NAACL 2018), New Orleans, LA, June 2018 

10.  Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan,Jiawei Han, “AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks,” Proc. of 2018 SIAM Int. Conf. on Data Mining (SDM’18), San Diego, CA, May 2018

11.  Men Qu, Xiang Ren, Yu Zhang, and Jiawei Han, “Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning”, Proc. of 2018 Int. Conf. on World-Wide Web (WWW’18), Lyon, France, Apr. 2018

12.  Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Frank F. Xu and Jiawei Han, "Open Information Extraction with Global Structure Constraints”, (poster paper), Proc. of 2018 Int. Conf. on World-Wide Web (WWW’18), Lyon, France, Apr. 2018 (received WWW'18 best poster award honorable mentioning)

13.  Carl Yang, Chao Zhang, Jiawei Han, Xuewen Chen, Jieping Ye,   "Did You Enjoy the Ride: Understanding Passenger Experience via Heterogeneous Network Embedding", Proc. of 2018, IEEE International Conference on Data Engineering, Paris, France, April 2018

14.  Liyuan Liu, Jingbo Sahng, Frank Xu, Xiang Ren, Huan Gui, Jian Peng and Jiawei Han, "Empower Sequence Labeling with Task-Aware Neural Language Model", in Proc. of 2018 AAAI Conf. on Artificial Intelligence (AAAI'18), New Orleans, LA, Feb. 2018

15.  Chao Zhang, Mengxiong Liu, Zhengchao Liu, Carl Yang, Luming Zhang, Jiawei Han, "Spatiotemporal Activity Modeling Under Data Scarcity: A Graph-Regularized Cross-Modal Embedding Approach", in Proc. of 2018 AAAI Conf. on Artificial Intelligence (AAAI'18), New Orleans, LA, Feb. 2018

16.  Wanzheng Zhu, Chao Zhang, Shuochao Yao, Xiaobin Gao, Jiawei Han, "A Spherical Hidden Markov Model for Semantics-Rich Human Mobility Modeling", in Proc. of 2018 AAAI Conf. on Artificial Intelligence (AAAI'18), New Orleans, LA, Feb. 2018

17.  Zeqiu Wu, Xiang Ren, Frank F. Xu, Ji Li and Jiawei Han, "Indirect Supervision for Relation Extraction using Question-Answer Pairs", in Proc. of 2018 ACM  Int. Conf. on Web Search and Data Mining (WSDM'18), Los Angeles, CA, Feb. 2018

18.  Meng Qu, Jian Tang, and Jiawei Han, "Curriculum Learning for Heterogeneous Star Network Embedding via Deep Reinforcement Learning",  in Proc. of 2018 ACM  Int. Conf. on Web Search and Data Mining (WSDM'18), Los Angeles, CA, Feb. 2018

19.  Jiawei Han, "On the Power of Massive Text Data", (keynote speech), in Proc. of 2018 ACM  Int. Conf. on Web Search and Data Mining (WSDM'18), Los Angeles, CA, Feb. 2018

20.  Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han, "Unsupervised Meta-path Selection for Text Similarity Measure based on Heterogeneous Information Networks", Data Mining and Knowledge Discovery (DMKD), to appear 2018

21.  Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han, "Automated Phrase Mining from Massive Text Corpora",  accepted by IEEE Transactions on Knowledge and Data Engineering, Feb., 2018

22.  Jingbo Shang, Meng Jiang, Wenzhu Tong, Jinfeng Xiao, Jian Peng, Jiawei Han. "DPPred: An Effective Prediction Framework with Concise Discriminative Patterns", accepted by IEEE Transactions on Knowledge and Data Engineering, Sept. 2017

23.  Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, Jiawei Han, "An Attention-based Collaboration Framework for Multi-View Network Representation Learning", in Proc. of 2017 ACM Int. Conf. on Information and Knowledge Management (CIKM'17), Singapore, Nov. 2017

24.  Chenguang Wang, Yangqiu Song, Haoran Li, Yizhou Sun , Ming Zhang, Jiawei Han, "Second-Order Heterogeneous Information Network Similarity for Text",  in Proc. of 2017 ACM Int. Conf. on Information and Knowledge Management (CIKM'17), Singapore, Nov. 2017

25.  Quan Yuan, Jingbo Shang, Xin Cao, Chao Zhang, Xinhe Geng, Jiawei Han, "Detecting Multiple Periods and Periodic Patterns in Event Time Sequences",  in Proc. of 2017 ACM Int. Conf. on Information and Knowledge Management (CIKM'17), Singapore, Nov. 2017

26.  Shi Zhi, Yicheng Sun, Jiayi Liu, Chao Zhang and Jiawei Han, "ClaimVerif: A Real-time Claim Verification System Using the Web and Fact Databases" (system demo),  in Proc. of 2017 ACM Int. Conf. on Information and Knowledge Management (CIKM'17), Singapore, Nov. 2017

27.  Mengxiong Liu, Zhengchao Liu, Chao Zhang, Keyang Zhang, Quan Yuan, Tim Hanrantty and Jiawei Han, "Urbanity: A System for Interactive Exploration of Urban Dynamics from Streaming Human Sensing Data" (system demo),  in Proc. of 2017 ACM Int. Conf. on Information and Knowledge Management (CIKM'17), Singapore, Nov. 2017

28.  Huan Gui, Jialu Liu, Fangbo Tao, Meng Jiang, Brandon Norick, Lance Kaplan and Jiawei Han, "Embedding Learning with Events in Heterogeneous Information Networks", IEEE Transactions on Knowledge and Data Engineering, 29(11): 2428-2441, 2017

29.  Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei Han, "SetExpan: Corpus-based Set Expansion via Context Feature Selection and Rank Ensemble",  in Proc. of 2017 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2017), Skopje, Macedonia, Sept. 2017

30.  Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji and Jiawei Han, "Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach",  in Proc. of 2017 Conf. on  Empirical Methods in Natural Language Processing (EMNLP'17), Copenhagen, Denmark, Sept. 2017

31.  Honglei Zhuang, Chi Wang, Fangbo Tao, Lance Kaplan and Jiawei Han, "Identifying Semantically Deviating Outlier Documents",  in Proc. of 2017 Conf. on  Empirical Methods in Natural Language Processing (EMNLP'17), Copenhagen, Denmark, Sept. 2017

32.  Meng Jiang, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance Kaplan, Timothy Hanratty and Jiawei Han, "MetaPAD: Meta Patten Discovery from Massive Text Corpora", in Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'17), Halifax, Nova Scotia, Canada, Aug. 2017

33.  Yu Shi, Po-Wei Chan, Honglei Zhuang, Huan Gui and Jiawei Han, "PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks", in Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'17), Halifax, Nova Scotia, Canada, Aug. 2017

34.  Meng Qu, Xiang Ren and Jiawei Han, "Automatic Synonym Discovery with Knowledge Bases", in Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'17), Halifax, Nova Scotia, Canada, Aug. 2017

35.  Carl Yang, Lanxiao Bai, Chao Zhang, Quan Yuan and Jiawei Han, "Bridging Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI recommendation", in Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'17), Halifax, Nova Scotia, Canada, Aug. 2017

36.  Chao Zhang, Liyuan Liu, Dongming Lei, Quan Yuan, Honglei Zhuang, Tim Hanratty and Jiawei Han, "TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams", in Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'17), Halifax, Nova Scotia, Canada, Aug. 2017

37.  Chao Zhang, Keyang Zhang, Quan Yuan, Fangbo Tao, Luming Zhang, Tim Hanratty, Jiawei Han, "ReAct: Online Multimodal Embedding for Recency-Aware Spatiotemporal Activity Modeling", In Proc. of 2017 ACM SIGIR Conf. on Research & Development in Information Retrieval (SIGIR'17), Tokyo, Japan, Aug. 2017 

38.  Jingbo Shang, Xiang Ren, Meng Jiang, and Jiawei Han, "Mining Entity-Relation-Attribute Structures from Massive Text Data" (conference tutorial), Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'17), Halifax, Nova Scotia, Canada, August 2017

39.  Xiuli Ma, Guangyu Zhou, Jingbo Shang, Jingjing Wang, Jian Peng, and Jiawei Han, "Detection of Complexes in Biological Networks through Diversified Dense Subgraphs Mining", Journal of Computational Biology, 24 (9): 923–941, 2017

40.  Xiang Ren, Jiaming Shen, Meng Qu, Xuan Wang, Zeqiu Wu, Qi Zhu, Meng Jiang, Fangbo Tao, Saurabh Sinha, David Liem, Peipei Ping, Richard Weinshilboum and Jiawei Han, "LifeNet: A Structured Network-Based Knowledge Exploration and Analytics System for Life Sciences", Proc. of 2017 Annual Meeting of the Association for Computational Linguistics (ACL'17), (system demo), Vancouver, Canada, July 2017

41.  Lifu Huang, Jonathan May, Xiaoman Pan, Heng Ji, Xiang Ren, Jiawei Han, Lin Zhao, James A. Hendler, "Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems", Big Data. 5(1):19-31, 2017

42.  Wenjun Jiang, Chenglin Miao, Lu Su, Qi Li, Shaohan Hu, Shiguang Wang, Jing Gao, Hengchang Liu, Tarek F. Abdelzaher, Jiawei Han, Xue Liu, Yan Gao, and Lance Kaplan, "Towards Quality Aware Information  Integration in Distributed Sensing Systems", in IEEE Transactions on Parallel and Distributed Systems, 2017

 

Project Impact

·         Education:  Parts of the new research results are used in Data Mining courses (CS412, CS512) for both undergraduate and graduate students being taught in the Department of Computer Science, the University of Illinois at Urbana-Champaign.    Moreover, the research results have been and will continuously be published timely in international conferences and journals and be distributed world-wide for education and research.  The new progress will also be integrated into the new edition of our data mining textbook and other research collections.

·         Collaborations: For this project we have established collaborations with ARL, NIH, IBM Research, Microsoft Research, Google Research, LinkedIn, UCLA Medical School, Mayo Clinic, and NCSA (National Center of Supercomputer Applications).  Through such collaborations we expect to have access to real datasets and applications and produce more research results.

 

Current and Future Activities

·         The following are some of the highlights of our ongoing work.  Please refer to the section: Publications and Products section for related references

Area Background

·         This project is based on the previous research on data mining, information network analysis, spatiotemporal data analysis, and data cube and multidimensional analysis.   

·         There have been many research papers published on these themes.   Several textbooks on data mining, information retrieval and information network analysis provide good overviews of the principles and algorithms, including (Han, Kamber and Pei, 2011) and (Sun and Han 2012).

 

Area References

·         P. Yu, J. Han, and C. Faloutsos, editors. Link Mining: Models, Algorithms, and Applications. Springer, 2010

·         Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann, 2011.

·         Yizhou Sun and Jiawei Han, Mining Heterogeneous Information Networks: Principles and Methodologies, Morgan & Claypool Publishers, 2012.

·         Chi Wang and Jiawei Han, Mining Latent Entity Structures, Morgan & Claypool Publishers, 2015

Potential Related Projects

·         Any project related to data mining, network construction, text mining, information extraction, information fusion, information and social network analysis, spatiotemporal data mining, and knowledge discovery.

Project Web site URL:  http://www.cs.uiuc.edu/~hanj/projs/structnet.htm

Online software:  Online software related to this project can be downloaded at www.illimine.cs.uiuc.edu and github first author entries

Online resources:  Research publications related to this project can be downloaded at Selected Publications