NSF BIGDATA: F: Collaborative Research: Taming Big Networks via Embedding

National Science Foundation Award Number: NSF IIS 1741317

(01/01/2018-12/31/2021)

 

 

 

Contact Information

 

Jiawei Han,  Co-PI (PI: Quanquan Gu, University of California at Los Angeles)
Department of Computer Science
University of Illinois, Urbana-Champaign
1304 West Springfield Ave., Urbana, Illinois 61801 U.S.A.
Office: (217) 333-6903, Fax: (217) 265-6494

E-mail: hanj at cs.uiuc.edu, URL: http://www.cs.uiuc.edu/~hanj

 

List of Supported Students and Staff

 

§  Chao Zhang, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign (duration working on this project: 2018-2019)

§  Qi Li, Postdoc research fellow, Department of Computer Science, University of Illinois at Urbana-Champaign (duration working on this project: 2018-2019)

§  Yu Shi, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign (duration working on this project: 2018-2019)

§  Liyuan Liu, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign (duration working on this project: 2018-2019)

§  Xiaotao Gu, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign (duration working on this project: 2019-2020)

§  Jiaming Shen, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign (duration working on this project: 2019-2020)

Project Award Information

 

·         Award Number: NSF IIS 1741317

·         Duration: 01/01/2018-12/31/2021

·         Title: NSF BIGDATA: F: Collaborative Research: Taming Big Networks via Embedding

·         Keywords:  Massive network; heterogeneous information network; embedding; network construction, interference, and mining; efficiency and scalability

Project Summary

In the Internet Age, information entities and objects are interconnected, thereby forming gigantic information networks. In recent years, network embedding methods have been shown to be greatly beneficial, for many unsupervised and supervised learning problems over networks, and are now often the methods of choice, especially for Big Networks. However, existing network embedding methods do not have theoretical guarantees, and are still in its infancy.  Particularly, most network embedding methods are cast into nonconvex optimization problems and solved by ad hoc algorithms without any convergence guarantee. Moreover, it is unclear under what conditions the latent network representation is learnable, and what is the sample complexity of network embedding. This prohibits us from designing new algorithms in a principled way. To bridge such a discrepancy between theory and practice, the PI will develop a new generation of network embedding methods for taming massive networks, from homogeneous networks to heterogeneous networks, from transductive to inductive, from unsupervised to supervised, and from stochastic to online. The new methods to be developed enjoy faster rates of convergence in optimization, lower computational complexities, and statistical learning guarantees. To evaluate the proposed algorithms, both theoretical analysis and extensive experimental evaluations on real-world massive network datasets will be performed. The targeted applications are including but not limited to semantic search and information retrieval in social / information network analysis, expert finding in bibliographical database, and recommendation systems.  The progress of the project and the research results are also disseminated via the project Web site (http://www.cs.uiuc.edu/homes/hanj/projs/embedding.htm).

 

Intellectual Merit:

 

The proposed research bridges the gap between the empirical success of network embedding, and existing statistical learning and optimization theories. The core of this proposed research is the integration of modern network mining techniques with sophisticated statistical learning and optimization tools, which lays a foundation to design a new generation of network embedding algorithms with strong theoretical guarantees, and to derive new theories for various setups of network embedding. Extensive empirical evaluations ensure the proposed algorithms' applicability in various application domains. The proposed research is expected to advance the frontier of network embedding, and enable it to be good at taming modern massive networks in the wild.

 

Broader Impacts:

 

The results of this research have the potential to impact the machine learning, data mining, information retrieval and many other communities. The proposal also has the potential to reshape the way one approaches problems relating to graph mining and network analysis, and their roles in a wide range of applications with massive networks. Our education plan includes developing open course materials that integrate information network analysis and machine learning and providing research-based training opportunities for both undergraduate and graduate students in engineering, art and science. The PIs will actively get underrepresented groups involved in research projects and train a new generation of data scientists. This project also supports the outreach activity to K-12 students, to stimulate their interest, and make the proposed research accessible to a broader audience. This project will produce open source software tools and the PIs have a strong track record for developing and supporting widely-used tools.

Publications and Products: (Note: major publications closely related to this project are in bold font)

Note:  Please search and download all the papers in PDF, if available, at our group’s publication website by following the link: Selected research publications.

Books

·         Xiang Ren and Jiawei Han, Mining Structures of Factual Knowledge from Text: An Effort-Light ApproachMorgan & Claypool Publishers, 2018 (Series: Synthesis Lectures on Data Mining and Knowledge Discovery)

·         Chao Zhang and Jiawei Han, Multidimensional Mining of Massive Text Data, Morgan & Claypool Publishers, 2019 (Series: Synthesis Lectures on Data Mining and Knowledge Discovery)

 

 

Journal articles

·         Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han, "Automated Phrase Mining from Massive Text Corpora", IEEE Transactions on Knowledge and Data Engineering, 30(10):1825-1837 (2018)

·         Jingbo Shang, Meng Jiang, Wenzhu Tong, Jinfeng Xiao, Jian Peng, Jiawei Han. "DPPred: An Effective Prediction Framework with Concise Discriminative Patterns", IEEE Transactions on Knowledge and Data Engineering, 30(7): 1226-1239 (2018)

·         Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, Jiawei Han, "Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks", Data Mining and Knowledge Discovery, 32(6): 1735-1767 (2018)

·         Chao Zhang, Dongming Lei, Quan Yuan, Honglei Zhuang, Lance M. Kaplan, Shaowen Wang, Jiawei Han, "GeoBurst+: Effective and Real-Time Local Event Detection in Geo-Tagged Tweet Streams", ACM Transactions on Intelligent Systems and Technology (TIST) 9(3): 34:1-24 (2018)

·         Wei Shen, Jiawei Han, Jianyong Wang, Xiaojie Yuan, Zhenglu Yang, "SHINE+: A General Framework for Domain-Specific Entity Linking with Heterogeneous Information Networks",  IEEE Transactions on Knowledge and Data Engineering, 30(2): 353-366 (2018)

 

Refereed Conference Publications

 

·         Ahmed El-Kishky, Frank Xu, Aston Zhang, and Jiawei Han, "Parsimonious Morpheme Segmentation with an Application to Enriching Word Embeddings", in Proc. 2019 IEEE Int. Conf. on Big Data (IEEE BigData'19), Los Angeles, CA, Dec. 2019

·         Hyungsul Kim, Ahmed El-Kishky, Xiang Ren, and Jiawei Han, "Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach", in Proc. 2019 IEEE Int. Conf. on Big Data (IEEE BigData'19), Los Angeles, CA, Dec. 2019

·         Xuan Wang, Yu Zhang, Qi Li, Jiawei Han, "Taming Unstructured Big Data: Automated Information Extraction from Massive Text"  (Conference tutorial), 2019 IEEE Int. Conf. on Big Data (IEEE BigData'19), Los Angeles, CA, Dec. 2019

·         Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan and Jiawei Han, "Spherical Text Embedding",  in Proc. 2019 Conf. on Neural Information Processing Systems (NeurIPS’19), Vancouver, Canada, Dec. 2019

·         Carl Yang, Peiye Zhuang, Wenhan Shi, Alan Luu and Pan Li, "Conditional Structure Generation through Graph Variational Generative Adversarial Nets",  in Proc. 2019 Conf. on Neural Information Processing Systems (NeurIPS’19), Vancouver, Canada, Dec. 2019

·         Xuan Wang, Yu Zhang, Qi Li, Xiang Ren, Jingbo Shang, and Jiawei Han, "Distantly Supervised Biomedical Named Entity Recognition with Dictionary Expansion", in Proc. 2019 IEEE Int. Conf. on Bioinformatics and Biomedicine (IEEE-BIBM'19), San Diego, CA, Nov. 2019

·         Yuning Mao, Jingjing Tian, Jiawei Han and Xiang Ren, “Hierarchical Text Classification with Reinforced Label Assignment”, in Proc. 2019 Conf. on Empirical Methods in Natural Language Processing and Int. Joint Conf. on Natural Language Processing (EMNLP-IJNLP19), Hong Kong, China, Nov. 2019

·         Zihan Wang, Jingbo Shang, Liyuan Liu, Lihao Lu, Jiacheng Liu and Jiawei Han, “CrossWeigh: Training Named Entity Tagger from Imperfect Annotations”, in Proc. 2019 Conf. on Empirical Methods in Natural Language Processing and Int. Joint Conf. on Natural Language Processing (EMNLP-IJNLP19), Hong Kong, China, Nov. 2019

·         Carl Yang, Mengxiong Liu, Frank He, Jian Peng, Jiawei Han, “cube2net: Efficient Quality Network Construction with Data Cube Organization”,  in Proc. of 2019 IEEE Int. Conf. on Data Mining: PhD Forum, Beijing, Nov. 2019

·         Carl Yang, Jieyu Zhang, and Jiawei Han, “Neural Embedding Propagation on Heterogeneous Networks”,  in Proc. of 2019 IEEE Int. Conf. on Data Mining (ICDM’19), Beijing, Nov. 2019

·         Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, and Jiawei Han, “HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories”, in Proc. of 2019 IEEE Int. Conf. on Data Mining (ICDM’19), Beijing, Nov. 2019

·         Jiawei Han, “From Unstructured Text to TextCube: Automated Construction and Multidimensional Exploration” (keynote speech), in Proc. 2019 ACM Int. Conf. on Information and Knowledge Management (CIKM’19), Beijing, China, Nov. 2019

·         Chanyoung Park, Donghyun Kim, Qi Zhu, Jiawei Han and Hwanjo Yu, “Task-Guided Pair Embedding in Heterogeneous Network”, in Proc. 2019 ACM Int. Conf. on Information and Knowledge Management (CIKM’19), Beijing, China, Nov. 2019

·         Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim and Jiawei Han, “Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity”, in Proc. 2019 ACM Int. Conf. on Information and Knowledge Management (CIKM’19), Beijing, China, Nov. 2019

·         Carl Yang, Lingrui Gan, Zongyi Wang, Jiaming Shen, Jinfeng Xiao and Jiawei Han, “Query-Specific Knowledge Summarization with Entity Evolutionary Networks”, in Proc. 2019 ACM Int. Conf. on Information and Knowledge Management (CIKM’19), Beijing, China, Nov. 2019

·         Yu Shi, Xinwei He, Naijing Zhang, Carl Yang, and Jiawei Han, "User-Guided Clustering in Heterogeneous Information Networks via Motif-Based Comprehensive Transcription", in Proc. 2019 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD'19), Wurzburg, Germany, Sept. 2019

·         Yu Meng, Jiaxin Huang, Jingbo Shang, and Jiawei Han, “TextCube: Automated Construction and Multidimensional Exploration”, Conference tutorial at 2019 Int. Conf. on Very Large Data Bases (VLDB’19), Los Angeles, CA, Aug. 2019

·         Yu Meng, Jiaxin Huang, Zihan Wang, Chenyu Fan, Guangyuan Wang, Chao Zhang, Jingbo Shang, Lance Kaplan, Jiawei Han, "TopicMine: User-Guided Topic Mining by Category-Oriented Embedding", in Proc. of 2019 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'19), (demo paper), Anchorage, AK, August 2019

·         Carl Yang, Dai Teng, Siyang Liu, Sayantani Basu, Jieyu Zhang, Jingbo Shang, Chao Zhang, Jiaming Shen, Lance Kaplan, Timothy Hanratty, Jiawei Han, "CubeNet: Multi-Facet Hierarchical Heterogeneous Network Construction, Analysis, and Mining", in Proc. of 2019 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'19), (demo paper), Anchorage, AK, August 2019

·         Ahmed El-Kishky, Xingyu Fu, Aseel Addawood, Nahil Sobh, Clare Voss and Jiawei Han, "Constrained Sequence-to-sequence Semitic Root Extraction for Enriching Word Embeddings", in Proc. of the 4th Arabic Natural Language Processing Worksho (WANLP 2019), co-located with ACL 2019, Florence, Italy, July 2019

·         Jingbo Shang, Jiaming Shen, Liyuan Liu, and Jiawei Han, "Constructing and Mining Heterogeneous Information Networks from Massive Text", Conference tutorial at 2019 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'19), Anchorage, AK, Aug. 2019

·         Liyuan Liu, Jingbo Shang, and Jiawei Han, "Arabic Named Entity Recognition: What Works and Whats Next", in Proc. of the 4th Arabic Natural Language Processing Worksho (WANLP 2019), co-located with ACL 2019, Florence, Italy, July 2019

·         Diya Li, Lifu Huang, Heng Ji, Jiawei Han, "Biomedical Event Extraction based on Knowledge-driven Tree-LSTM", in Proc. 2019 Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT'19), Minneapolis, MN, June 2019

·         Shuochao Yao, Ailing Piao, Wenjun Jiang, Yiran Zhao, Huajie Shao, Shengzhong Liu, Dongxin Liu, Jinyang Li, Tianshi Wang, Shaohan Hu, Lu Su, Jiawei Han and Tarek Abdelzaher, “STFNets: Learning Sensing Signals from the Time-Frequency Perspective with Short-Time Fourier Neural Networks”, in Proc. the Web Conf. 2019 (WWW’19), San Francisco, CA, May 2019 

·         Carl Yang, Huy Hoang Do, Tomas Mikolov and Jiawei Han “Place Deduplication with Embeddings”,  in Proc. the Web Conf. 2019 (WWW’19), San Francisco, CA, May 2019 

·         Honglei Zhuang, Timothy Hanratty, and Jiawei Han, "Aspect-Based Sentiment Analysis with Minimal Guidance", in Proc. 2019 SIAM Int. Conf. on Data Mining (SDM'19), Calgary, Alberta, Canada, May 201

·         Sha Li, Chao Zhang, Dongming Lei, Ji Li, Jiawei Han, "GeoAttn: Fine-Grained Localization of Social Media Messages via Attentional Memory Network", in Proc. 2019 SIAM Int. Conf. on Data Mining (SDM'19), Calgary, Alberta, Canada, May 2019

·         Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky and Jiawei Han, "Integrating Local Context and Global Cohesiveness for Open Information Extraction", in Proc. 2019 ACM Int. Conf. on Web Search and Data Mining (WSDM'19), Melbourne, Australia, Feb. 2019

·         Jiaming Shen, Ruiliang Lyu, Xiang Ren, Michelle Vanni, Brian Sadler, Jiawei Han, “Mining Entity Synonyms with Efficient Neural Set Generation”, in Proc. 2019 AAAI Conf. on Artificial Intelligence (AAAI-19), Honolulu, Hawaii, Jan. 2019

·         Yu Meng, Jiaming Shen, Chao Zhang and Jiawei Han, “Weakly-Supervised Hierarchical Text Classification”, in Proc. 2019 AAAI Conf. on Artificial Intelligence (AAAI-19), Honolulu, Hawaii, Jan. 2019

·         Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, and Jiawei Han, "Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning", Bioinformatics 35(10): 1745-1752 (2019)

·         Xuan Wang, Yu Zhang, Qi Li, Cathy Wu, and Jiawei Han, "PENNER: Pattern-enhanced Nested Named Entity Recognition in Biomedical Literature", Proc. 2018 Int. Conf. on Bioinformatics and Biomedicine (BIBM'18), Madrid, Spain, Dec. 2018

·         Qi Li, Xuan Wang, Yu Zhang, Fei Ling, Cathy Wu, and Jiawei Han, "Pattern Discovery for Wide-Window Open Information Extraction in Biomedical Literature", Proc. 2018 Int. Conf. on Bioinformatics and Biomedicine (BIBM'18), Madrid, Spain, Dec. 2018

·         Shi Zhi, Fan Yang, Zheyi Zhu, Qi Li, Zhaoran Wang, and Jiawei Han, "Dynamic Truth Discovery on Numerical Data", in Proc of 2018 IEEE Int. Conf. on Data Mining (ICDM'18), Singapore, Nov. 2018

·         Carl Yang, Yichen Feng, Pan Li, Yu Shi, and Jiawei Han, "Meta-Graph Based HIN Spectral Embedding: Methods, Analyses, and Insights", in Proc of 2018 IEEE Int. Conf. on Data Mining (ICDM'18), Singapore, Nov. 2018

·         Fangbo Tao, Chao Zhang, Xiusi Chen, Meng Jiang, Tim Hanratty, Lance Kaplan, and Jiawei Han, "Doc2Cube: Automated Document Allocation to Text Cube via Dimension-Aware Joint Embedding", in Proc of 2018 IEEE Int. Conf. on Data Mining (ICDM'18), Singapore, Nov. 2018

·         Doris Xin, Ahmed El-Kishky, De Liao, Brandon Norick, and Jiawei Han, "Active Learning on Heterogeneous Information Networks: A Multi-armed Bandit Approach", in Proc of 2018 IEEE Int. Conf. on Data Mining (ICDM'18), Singapore, Nov. 2018

·         Jingbo Shang, Liyuan Liu, Xiaotao Gu, Xiang Ren, Teng Ren and Jiawei Han, "Learning Named Entity Tagger using Domain-Specific Dictionary", in Proc. of 2018 Conf. on Empirical Methods in Natural Language Processing (EMNLP'18), Brussels, Belgium, Oct. 2018

·         Liyuan Liu, Xiang Ren, Jingbo Shang, Xiaotao Gu, Jian Peng and Jiawei Han, "Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling", in Proc. of 2018 Conf. on Empirical Methods in Natural Language Processing (EMNLP'18), Brussels, Belgium, Oct. 2018

·         Quan Yuan, Xiang Ren, Wenqi He, Chao Zhang, Xinhe Geng, Lifu Huang, Heng Ji, Chin-Yew Lin and Jiawei Han, "Open-Schema Event Profiling for Massive News Corpora", in Proc. of 2018 ACM Int. Conf. on Information and Knowledge Management (CIKM'18), Turin, Italy, Oct. 2018

·         Yu Meng, Jiaming Shen, Chao Zhang and Jiawei Han, "Weakly-Supervised Neural Text Classification", in Proc. of 2018 ACM Int. Conf. on Information and Knowledge Management (CIKM'18), Turin, Italy, Oct. 2018

·         Jingbo Shang, Jiaming Shen, Tianhang Sun, Xingbang Liu, Anja Gruenheid, Flip Korn, Adam Lelkes, Cong Yu and Jiawei Han, "Investigating Rumor News Using Agreement-Aware Search", in Proc. of 2018 ACM Int. Conf. on Information and Knowledge Management (CIKM'18), Turin, Italy, Oct. 2018

·         Carl Yang, Mengxiong Liu, Frank He, Xikun Zhang, Jian Peng, and Jiawei Han, "Similarity Modeling on Heterogeneous Networks via Automatic Path Discovery", in Proc. of 2018 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD'18), Dublin, Ireland, Sept. 2018

·         Carl Yang, Mengxiong Liu, Vincent W. Zheng and Jiawei Han, "Node, Motif and Subgraph: Leveraging Network Functional Blocks Through Structural Convolution", in Proc. of 2018 IEEE/ACM Int. Conf. on Social Networks Analysis and Mining (ASONAM'18), Barcelona, Spain, Aug. 2018

·         Xuan Wang, Yu Zhang, Qi Li, Yinyin Chen and Jiawei Han, "Open Information Extraction with Meta-pattern Discovery in Biomedical Literature", in Proc. of 2018 ACM Conf. on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB'18), Washington, DC, August 2018 

·         Jingbo Shang, Chao Zhang, Jiaming Shen, Jiawei Han, "Towards Multidimensional Analysis of Text Corpora", Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), (Conference Tutorial), London, UK, Aug. 2018

·         Jingbo Shang, Qi Zhu, Jiaming Shen, Xuan Wang, Xiaotao Gu, Lance Kaplan, Timothy Harratty and Jiawei Han, "AutoNet: Automated Network Construction and Exploration System from Domain-Specific Corpora", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), (demo paper), London, UK, August 2018

·         Jiaming Shen, Jinfeng Xiao, Yu Zhang, Carl Yang, Jingbo Shang, Jinda Han, Saurabh Sinha, Peipei Ping, Richard Weinshilboum, Zhiyong Lu and Jiawei Han, "SetSearch+: Entity-Set-Aware Search and Mining for Scientific Literature", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), (demo paper), London, UK, August 2018

·         Hanwen Zha, Jiaming Shen, Keqian Li, Warren Greiff, Michelle Vanni, Jiawei Han and Xifeng Yan, "FTS: Faceted Taxonomy Construction and Search for Scientific Publications", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), (demo paper), London, UK, August 2018

·         Carl Yang, Xiaolin Shi, Jie Luo and Jiawei Han, "I Know You’ll Be Back: Interpretable New User Clustering and Churn Prediction on a Mobile Social Application", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

·         Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni and Jiawei Han, "TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

·         Qi Li, Meng Jiang, Xikun Zhang, Meng Qu, Timothy Hanratty, Jing Gao and Jiawei Han, "TruePIE: Discovering Reliable Patterns in Pattern-Based Information Extraction", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

·         Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler and Jiawei Han, "HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

·         Yu Shi, Qi Zhu, Fang Guo, Chao Zhang and Jiawei Han, "Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks", in Proc. of 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'18), London, UK, August 2018

·         Yuchen Li, Zhengzhi Lou, Yu Shi and Jiawei Han, "Temporal Motifs in Heterogeneous Information Networks", in Proc. of 2018 Int. Workshop on Mining and Learning with Graphs (MLG'18), co-located with KDD'18, London, UK, August 2018

·         Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu and Jiawei Han, "End-to-End Reinforcement Learning for Automatic Taxonomy Induction", in Proc. of 2018 Annual Meeting of the Association for Computational Linguistics (ACL'18), Melbourne, Australia, July 2018 

·         Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha and Jiawei Han, "Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach", in Proc. of 2018  Int. ACM SIGIR Conf. on Research and Development in Information Retrieval  (SIGIR'18), Ann Arbor, MI, July 2018 

·         Ahmed El-Kishky, Frank Xu, Aston Zhang, Stephen Macke and Jiawei Han, "Entropy-Based Subword Mining for Word Embeddings", in Proc. of the 2nd Workshop on Subword and Character Level Models in NLP (SCLeM'18) (at NAACL 2018), New Orleans, LA, June 2018 

·         Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan,Jiawei Han, “AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks,” Proc. of 2018 SIAM Int. Conf. on Data Mining (SDM’18), San Diego, CA, May 2018

·         Meng Qu, Xiang Ren, Yu Zhang, and Jiawei Han, “Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning”, Proc. of 2018 Int. Conf. on World-Wide Web (WWW’18), Lyon, France, Apr. 2018

·         Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Frank F. Xu and Jiawei Han, "Open Information Extraction with Global Structure Constraints”, (poster paper), Proc. of 2018 Int. Conf. on World-Wide Web (WWW’18), Lyon, France, Apr. 2018 (received WWW'18 best poster award honorable mentioning)

·         Carl Yang, Chao Zhang, Jiawei Han, Xuewen Chen, Jieping Ye,   "Did You Enjoy the Ride: Understanding Passenger Experience via Heterogeneous Network Embedding", Proc. of 2018, IEEE International Conference on Data Engineering, Paris, France, April 2018

·         Liyuan Liu, Jingbo Shang, Frank Xu, Xiang Ren, Huan Gui, Jian Peng and Jiawei Han, "Empower Sequence Labeling with Task-Aware Neural Language Model", in Proc. of 2018 AAAI Conf. on Artificial Intelligence (AAAI'18), New Orleans, LA, Feb. 2018

·         Chao Zhang, Mengxiong Liu, Zhengchao Liu, Carl Yang, Luming Zhang, Jiawei Han, "Spatiotemporal Activity Modeling Under Data Scarcity: A Graph-Regularized Cross-Modal Embedding Approach", in Proc. of 2018 AAAI Conf. on Artificial Intelligence (AAAI'18), New Orleans, LA, Feb. 2018

·         Wanzheng Zhu, Chao Zhang, Shuochao Yao, Xiaobin Gao, Jiawei Han, "A Spherical Hidden Markov Model for Semantics-Rich Human Mobility Modeling", in Proc. of 2018 AAAI Conf. on Artificial Intelligence (AAAI'18), New Orleans, LA, Feb. 2018

·         Zeqiu Wu, Xiang Ren, Frank F. Xu, Ji Li and Jiawei Han, "Indirect Supervision for Relation Extraction using Question-Answer Pairs", in Proc. of 2018 ACM  Int. Conf. on Web Search and Data Mining (WSDM'18), Los Angeles, CA, Feb. 2018

·         Meng Qu, Jian Tang, and Jiawei Han, "Curriculum Learning for Heterogeneous Star Network Embedding via Deep Reinforcement Learning",  in Proc. of 2018 ACM  Int. Conf. on Web Search and Data Mining (WSDM'18), Los Angeles, CA, Feb. 2018

 

Ph.D. Dissertations

 

·         Xiang Ren, Ph.D., January 2018, thesis title: “Mining Entity and Relation Structures from Text: An Effort-Light Approach", Ph.D. Thesis won 2018 ACM SIGKDD Doctoral Dissertation Award

·         Chao Zhang, Ph.D., Nov. 2018, thesis title: “Multi-dimensional Mining of Unstructured Data with Limited Supervision"", Ph.D. Thesis won 2019 ACM SIGKDD Doctoral Dissertation Award Runner-Up

·         Yu Shi, Ph.D., March 2019, thesis title: “Harnessing Heterogeneous Association in Real-World Networks

·         Honglei Zhuang, Ph.D., March 2019, thesis title: “Text Mining with Word Embedding for Outlier and Sentiment Analysis" 

·         Jingbo Shang, Ph.D., Nov. 2019, thesis title: “Constructing and Mining Structured Heterogeneous Information Networks from Massive Text Corpora

 

Project Impact

 

§  Education: Parts of the new research results are used in Data Mining courses (CS412, CS512, CS412 MCD-DS online Coursera courses) for both undergraduate and graduate students being taught in the Department of Computer Science, the University of Illinois at Urbana-Champaign.   The research results have been and will continuously be published timely in international conferences and journals and be distributed world-wide for education and research.  Most of the software developed in this project have been made opensource published at Github. The new progress will also be integrated into the new edition of our data mining textbook and other research collections.

§  Collaborations: For this project we have established collaborations with ARL, BBN, Adobe, IAI, MITRE, Microsoft Research, Mayo Clinic, UCLA Medical School, LinkedIn, Facebook, and other industry and research centers.  Through such collaborations we expect to explore many real applications and produce bigger Research Impacts.

 

Current and Future Activities

The following are some of the highlights of our ongoing work.  Please refer to the section: Publications and Products section for related references.

1.      Study effective and scalable methods for embedding at mining heterogeneous information networks

2.      Study effective and scalable methods for embedding and text mining at construction of heterogeneous information networks from unstructured data

3.      Study effective and scalable methods for embedding and mining for construction of multidimensional text-cubes and cube networks to support new applications

 

Area Background

 

This project is based on the previous research on data mining, text mining, embedding in networks, and data cube and multidimensional analysis.    There have been many research papers published on these themes.   Several textbooks on data mining, text mining, information retrieval and information network analysis provide good overviews of the principles and algorithms.

 

Area References

·         Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd edition, Morgan Kaufmann, 2011

·         Philip S. Yu, Jiawei Han, and Christos Faloutsos (eds), Link Mining: Models, Algorithms, and Applications, Springer, 2010

·         C. Aggarwal, Machine Learning for Text, Springer 2017

 

 

 

 

Potential Related Projects

·          Information Network Academic Research CenterNetwork Science-Collaborative Technology Alliance

·          NIH BD2K: KnowEng (Knowledge Engine for Genomics) Center: Construction and Mining of Biological Networks

·         Multi-Dimensional Structuring, Summarizing and Mining of Social Media Data (NSF/IIS)

·         StructNet: Constructing and Mining Structure-Rich Information Networks for Scientific Research (NSF/IIS)

Project Web site URL:  http://www.cs.uiuc.edu/~hanj/projs/embedding.htm

Online software:  Online software can be downloaded at http://illimine.cs.uiuc.edu, and online system demo is at http://dm.cs.uiuc.edu/movemine

Online resources:  Research publications related to this project can be downloaded at Selected Publications