NSF III: Medium: Collaborative Research: StructNet: Constructing and Mining Structure-Rich
Information Networks for Scientific Research
National Science Foundation Award Number: NSF
IIS 17-04532
(07/01/2017--06/30/2021)
E-mail: hanj at illinois.edu
List of Supported Students and Staff
·
Ahmed Elkishky, Ph.D.
student, Department of Computer Science, University of Illinois at
Urbana-Champaign (collaborative)
·
Jiaming Shen, Ph.D. student, Department of Computer
Science, University of Illinois at Urbana-Champaign
·
Chao Zhang, Ph.D. student, Department of Computer
Science, University of Illinois at Urbana-Champaign
·
Honglei Zhuang,
Ph.D. student, Department of Computer Science, University of Illinois at
Urbana-Champaign
·
Award Number: NSF IIS 17-04532
·
Duration: 07/01/2017--06/30/2021
·
Title: NSF III: Medium: Collaborative Research: StructNet: Constructing and Mining Structure-Rich
Information Networks for Scientific Research
·
Keywords: Data mining; text mining;
information extraction; data integration; information network construction; information network mining; big data;
efficiency and scalability; scientific applications
Project Summary
To transform massive, unstructured but
interconnected data into actionable knowledge, we propose a new, promising
paradigm: data-to-network-to-knowledge (D2N2K), by integrating
semi-structured/unstructured data, constructing organized heterogeneous
information networks (hence called StructNet), and
then developing powerful mining mechanisms on such organized networks. In this work, we propose to focus our study
on biomedical sciences and investigate the principles, methodologies and
algorithms for (i) construction of relatively
structured heterogeneous information networks (called MediNet)
by mining biomedical research corpora, and (ii) exploration and mining of the
networks so constructed. We propose to investigate
the principles, methodologies, and algorithms for construction of StructNet, and exploration and mining of StructNet and build a comprehensive network construction
and analysis engine from massive research data and explore its broad
applications. We take biomedical domain
as a case study, and dive deeply into the contents of biomedical literature and
other sources of biomedical data to construct MediNet
and explore applications in health domain.
Intellectual Merit:
·
Developing new principles, methods, and
technologies for construction of research information networks from massive,
unstructured research datasets: We develop methods including (1) attribute
value extraction, (2) relation typing, and (3) high-quality claim mining, which
aim at turning interrelated, unstructured and semi-structured research data
into structured information networks, enriching semantic structures of existing
heterogeneous networks and will play a critical role for turning data into
structured networks and actionable knowledge.
·
Enriching the principles and technologies of
network science and data mining via exploration and mining of structured
information networks: We develop novel methods for in-depth exploring and
mining such structured networks, including (1) context-aware multi-dimensional
summary of StructNet, and (2) task-guided embedding
approaches for mining StructNet, which will advance
research in both data science and network science.
Broader Impacts:
·
Benefits scientific research: The work will
generate an extensible framework to facilitate literature-based scientific
research. Fundamentally different from the existing research services (e.g.,
Google Scholar, Microsoft Academic Search, AMiner,
and CiteSeerX), StructNet
dives deeply into the contents to conduct corpus-wide information network
construction and mining to benefit research and scientific discovery. The case
study on MediNet will have a big impact on the
biomedical domain, which can potentially tackle problems from disease
diagnosis, drug discovery, to precision medicine. The methodology can be
transferred to many other domains that are heavily with text data and
semi-structured data.
·
Benefits network science, data mining, and
information technology: It will generate new methodologies and tools for structuring
and exploring massive interconnected data. As we did before, the technologies
will be transferred to interested research centers and industry.
·
Benefits education and training: The project
will train a good number of researchers, especially female and minority
students, educating a great number of undergraduates and graduates via our
research publications, tutorials, massive online courses, workshops, and
demo-systems.
·
The research results are to be published in various research and
application forums and be integrated into the educational programs at
UIUC. The progress of the project and the research results are also
disseminated via the project Web site
(http://www.cs.uiuc.edu/homes/hanj/projs/structnet.htm).
Selected
Publications and Products:
Books (authored)
Journal and Refereed Conference
Publications
1.
Jingbo
Shang, Chao Zhang, Jiaming Shen, Jiawei Han, "Towards Multidimensional Analysis of Text Corpora",
Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'18), (Conference Tutorial), London, UK, Aug. 2018
2.
Carl Yang, Xiaolin Shi, Jie Luo and Jiawei Han, "I
Know You’ll Be Back: Interpretable New User Clustering and Churn Prediction on
a Mobile Social Application", in Proc. of 2018 ACM SIGKDD Int.
Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018
3.
Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler,
Michelle Vanni and Jiawei Han, "TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term
Embedding and Clustering", in Proc. of 2018 ACM SIGKDD
Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August
2018
4.
Qi Li, Meng Jiang, Xikun Zhang,
Meng Qu, Timothy Hanratty, Jing Gao and Jiawei Han, "TruePIE: Discovering Reliable Patterns in Pattern-Based
Information Extraction", in Proc. of 2018 ACM SIGKDD
Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August
2018
5.
Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler and Jiawei Han, "HiExpan: Task-Guided
Taxonomy Construction by Hierarchical Tree
Expansion", in Proc. of 2018 ACM SIGKDD Int. Conf. on
Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018
6.
Yu Shi, Qi Zhu, Fang Guo, Chao Zhang and Jiawei Han,
"Easing Embedding Learning by Comprehensive Transcription
of Heterogeneous Information Networks", in Proc. of 2018 ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK,
August 2018
7.
Yuning
Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu and Jiawei
Han, "End-to-End Reinforcement Learning for Automatic Taxonomy
Induction", in Proc. of 2018 Annual Meeting of the Association
for Computational Linguistics (ACL'18), Melbourne, Australia, July 2018
8.
Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh
Sinha and Jiawei Han, "Entity Set Search of Scientific Literature: An
Unsupervised Ranking Approach", in Proc. of 2018 Int. ACM
SIGIR Conf. on Research and Development in Information Retrieval
(SIGIR'18), Ann Arbor, MI, July 2018
9.
Ahmed El-Kishky, Frank Xu, Aston
Zhang, Stephen Macke and Jiawei Han, "Entropy-Based Subword Mining
for Word Embeddings", in Proc. of the 2nd Workshop on Subword and Character Level Models in NLP (SCLeM'18) (at
NAACL 2018), New Orleans, LA, June 2018
10.
Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan,Jiawei Han, “AspEm: Embedding Learning by Aspects in Heterogeneous
Information Networks,” Proc. of 2018 SIAM Int. Conf. on Data Mining (SDM’18),
San Diego, CA, May 2018
11.
Men Qu, Xiang Ren, Yu Zhang, and Jiawei Han, “Weakly-supervised
Relation Extraction by Pattern-enhanced Embedding Learning”, Proc.
of 2018 Int. Conf. on World-Wide Web (WWW’18), Lyon, France, Apr. 2018
12.
Qi Zhu, Xiang Ren, Jingbo Shang,
Yu Zhang, Frank F. Xu and Jiawei Han, "Open
Information Extraction with Global Structure Constraints”, (poster
paper), Proc. of 2018 Int. Conf. on World-Wide Web (WWW’18), Lyon, France, Apr.
2018 (received WWW'18 best poster award honorable mentioning)
13.
Carl Yang, Chao Zhang, Jiawei Han, Xuewen Chen, Jieping
Ye, "Did You Enjoy the Ride: Understanding Passenger
Experience via Heterogeneous Network Embedding", Proc. of
2018, IEEE International Conference on Data Engineering, Paris, France,
April 2018
14.
Liyuan Liu, Jingbo Sahng, Frank Xu, Xiang Ren, Huan Gui,
Jian Peng and Jiawei Han, "Empower
Sequence Labeling with Task-Aware Neural Language Model", in
Proc. of 2018 AAAI Conf. on Artificial
Intelligence (AAAI'18), New Orleans, LA, Feb. 2018
15.
Chao Zhang, Mengxiong Liu, Zhengchao Liu, Carl Yang, Luming
Zhang, Jiawei Han, "Spatiotemporal Activity Modeling Under Data Scarcity: A
Graph-Regularized Cross-Modal Embedding Approach", in
Proc. of 2018 AAAI Conf. on Artificial
Intelligence (AAAI'18), New Orleans, LA, Feb. 2018
16.
Wanzheng
Zhu, Chao Zhang, Shuochao Yao, Xiaobin
Gao, Jiawei Han, "A Spherical Hidden Markov Model for Semantics-Rich Human
Mobility Modeling", in Proc. of 2018 AAAI Conf. on
Artificial Intelligence (AAAI'18), New Orleans, LA, Feb. 2018
17.
Zeqiu Wu,
Xiang Ren, Frank F. Xu, Ji Li and Jiawei Han, "Indirect
Supervision for Relation Extraction using Question-Answer Pairs", in
Proc. of 2018 ACM Int. Conf. on Web Search and Data Mining
(WSDM'18), Los Angeles, CA, Feb. 2018
18.
Meng Qu, Jian Tang, and Jiawei Han, "Curriculum
Learning for Heterogeneous Star Network Embedding via Deep Reinforcement
Learning", in Proc. of 2018 ACM Int. Conf. on Web Search and
Data Mining (WSDM'18), Los Angeles, CA, Feb. 2018
19.
Jiawei Han, "On
the Power of Massive Text Data", (keynote speech), in
Proc. of 2018 ACM Int. Conf. on Web Search and Data Mining
(WSDM'18), Los Angeles, CA, Feb. 2018
20.
Chenguang
Wang, Yangqiu Song, Haoran
Li, Ming Zhang, and Jiawei Han, "Unsupervised Meta-path Selection for Text
Similarity Measure based on Heterogeneous Information Networks", Data
Mining and Knowledge Discovery (DMKD), to appear 2018
21.
Jingbo
Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R
Voss, Jiawei Han, "Automated Phrase Mining from Massive Text
Corpora", accepted by IEEE Transactions on Knowledge and Data
Engineering, Feb., 2018
22.
Jingbo
Shang, Meng Jiang, Wenzhu Tong, Jinfeng
Xiao, Jian Peng, Jiawei Han. "DPPred: An Effective Prediction Framework with Concise
Discriminative Patterns", accepted by IEEE
Transactions on Knowledge and Data Engineering, Sept. 2017
23.
Meng Qu, Jian Tang, Jingbo
Shang, Xiang Ren, Ming Zhang, Jiawei Han, "An
Attention-based Collaboration Framework for Multi-View Network Representation
Learning", in Proc. of 2017 ACM Int. Conf. on Information
and Knowledge Management (CIKM'17), Singapore, Nov. 2017
24.
Chenguang Wang,
Yangqiu Song, Haoran Li, Yizhou Sun , Ming Zhang, Jiawei Han, "Second-Order Heterogeneous Information Network Similarity for Text",
in Proc. of 2017 ACM Int. Conf. on Information and Knowledge Management
(CIKM'17), Singapore, Nov. 2017
25.
Quan Yuan, Jingbo Shang, Xin
Cao, Chao Zhang, Xinhe Geng,
Jiawei Han, "Detecting Multiple Periods and Periodic Patterns in Event
Time Sequences", in Proc. of 2017 ACM Int. Conf. on Information and
Knowledge Management (CIKM'17), Singapore, Nov. 2017
26.
Shi Zhi, Yicheng
Sun, Jiayi Liu, Chao Zhang and Jiawei Han, "ClaimVerif: A
Real-time Claim Verification System Using the Web and Fact Databases"
(system demo), in Proc. of 2017 ACM Int. Conf. on Information and
Knowledge Management (CIKM'17), Singapore, Nov. 2017
27.
Mengxiong
Liu, Zhengchao Liu, Chao Zhang, Keyang
Zhang, Quan Yuan, Tim Hanrantty and Jiawei Han,
"Urbanity: A System for Interactive Exploration of Urban
Dynamics from Streaming Human Sensing Data" (system
demo), in Proc. of 2017 ACM Int. Conf. on Information and Knowledge
Management (CIKM'17), Singapore, Nov. 2017
28.
Huan Gui, Jialu
Liu, Fangbo Tao, Meng Jiang, Brandon Norick, Lance Kaplan and Jiawei Han, "Embedding
Learning with Events in Heterogeneous Information Networks",
IEEE Transactions on Knowledge and Data Engineering, 29(11): 2428-2441, 2017
29.
Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang
Ren, Jiawei Han, "SetExpan: Corpus-based
Set Expansion via Context Feature Selection and Rank Ensemble", in
Proc. of 2017 European Conf. on Machine Learning and Principles and
Practice of Knowledge Discovery in Databases (ECMLPKDD 2017), Skopje,
Macedonia, Sept. 2017
30.
Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi,
Huan Gui, Heng Ji and Jiawei Han, "Heterogeneous Supervision for Relation Extraction: A Representation
Learning Approach", in Proc. of 2017 Conf. on Empirical Methods
in Natural Language Processing (EMNLP'17), Copenhagen,
Denmark, Sept. 2017
31.
Honglei
Zhuang, Chi Wang, Fangbo Tao, Lance Kaplan and Jiawei
Han, "Identifying Semantically Deviating Outlier Documents", in
Proc. of 2017 Conf. on Empirical Methods in Natural Language
Processing (EMNLP'17), Copenhagen, Denmark, Sept. 2017
32.
Meng Jiang, Jingbo Shang,
Taylor Cassidy, Xiang Ren, Lance Kaplan, Timothy Hanratty and Jiawei Han,
"MetaPAD: Meta Patten Discovery from
Massive Text Corpora", in Proc. of 2017 ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'17),
Halifax, Nova Scotia, Canada, Aug. 2017
33.
Yu Shi, Po-Wei Chan, Honglei
Zhuang, Huan Gui and Jiawei Han, "PReP: Path-Based
Relevance from a Probabilistic Perspective in Heterogeneous Information Networks", in Proc. of 2017 ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'17),
Halifax, Nova Scotia, Canada, Aug. 2017
34.
Meng Qu, Xiang Ren and Jiawei Han, "Automatic
Synonym Discovery with Knowledge Bases", in Proc. of 2017 ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'17),
Halifax, Nova Scotia, Canada, Aug. 2017
35.
Carl Yang, Lanxiao Bai, Chao
Zhang, Quan Yuan and Jiawei Han, "Bridging
Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI
recommendation", in Proc. of 2017 ACM SIGKDD Int.
Conf. on Knowledge Discovery and Data Mining (KDD'17), Halifax, Nova Scotia,
Canada, Aug. 2017
36.
Chao Zhang, Liyuan Liu, Dongming
Lei, Quan Yuan, Honglei Zhuang, Tim Hanratty and
Jiawei Han, "TrioVecEvent: Embedding-Based
Online Local Event Detection in Geo-Tagged Tweet Streams",
in Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data
Mining (KDD'17), Halifax, Nova Scotia, Canada, Aug. 2017
37.
Chao Zhang, Keyang Zhang, Quan
Yuan, Fangbo Tao, Luming
Zhang, Tim Hanratty, Jiawei Han, "ReAct: Online
Multimodal Embedding for Recency-Aware Spatiotemporal
Activity Modeling", In Proc. of 2017 ACM SIGIR Conf. on
Research & Development in Information Retrieval (SIGIR'17), Tokyo, Japan,
Aug. 2017
38.
Jingbo
Shang, Xiang Ren, Meng Jiang, and Jiawei Han, "Mining Entity-Relation-Attribute Structures from Massive Text Data"
(conference tutorial), Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge
Discovery and Data Mining (KDD'17), Halifax, Nova Scotia, Canada, August
2017
39.
Xiuli Ma,
Guangyu Zhou, Jingbo Shang,
Jingjing Wang, Jian Peng, and Jiawei Han,
"Detection of Complexes in Biological Networks through Diversified Dense
Subgraphs Mining", Journal of Computational Biology, 24 (9):
923–941, 2017
40.
Xiang Ren, Jiaming Shen, Meng Qu, Xuan Wang, Zeqiu Wu, Qi Zhu, Meng Jiang, Fangbo
Tao, Saurabh Sinha, David Liem, Peipei
Ping, Richard Weinshilboum and Jiawei Han, "LifeNet: A Structured
Network-Based Knowledge Exploration and Analytics System for Life Sciences", Proc.
of 2017 Annual Meeting of the Association for Computational Linguistics
(ACL'17), (system demo), Vancouver, Canada, July 2017
41.
Lifu
Huang, Jonathan May, Xiaoman Pan, Heng Ji, Xiang Ren,
Jiawei Han, Lin Zhao, James A. Hendler, "Liberal
Entity Extraction: Rapid Construction of Fine-Grained Entity Typing
Systems", Big Data. 5(1):19-31, 2017
42.
Wenjun Jiang, Chenglin Miao, Lu Su, Qi Li, Shaohan Hu, Shiguang Wang, Jing Gao, Hengchang
Liu, Tarek F. Abdelzaher, Jiawei Han, Xue Liu, Yan Gao, and Lance Kaplan, "Towards Quality
Aware Information Integration in Distributed Sensing Systems", in
IEEE Transactions on Parallel and Distributed Systems, 2017
Project Impact
·
Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann,
2011.
·
Yizhou Sun and
Jiawei Han, Mining Heterogeneous Information Networks: Principles and
Methodologies, Morgan & Claypool Publishers, 2012.
·
Chi Wang and
Jiawei Han, Mining Latent Entity Structures, Morgan & Claypool Publishers, 2015
Potential Related Projects
·
Any project related to data
mining, network construction, text mining, information extraction, information
fusion, information and social network analysis, spatiotemporal data mining,
and knowledge discovery.
Project
Web site URL: http://hanj.cs.illinois.edu/projs/structnet.htm
Online
software: Online software related to this project can be
downloaded at www.illimine.cs.uiuc.edu and github first author entries
Online resources: Research
publications related to this project can be downloaded at Selected Publications