NSF III: Medium: Collaborative Research: Mining and Leveraging Knowledge Hypercubes for Complex Applications: NSF-IIS 19-56151

(10/01/2020-09/30/2024)

 

 

Contact Information

 

Jiawei Han,  Co-PI, Michael Aiken Chair Professor 
Department of Computer Science
University of Illinois, Urbana-Champaign
201 N. Goodwin Ave., Urbana, Illinois 61801 U.S.A.
Office: (217) 333-6903, Fax: (217) 265-6494

E-mail: hanj at illinois.edu, URL: http://hanj.cs.illinois.edu

 

List of Supported Students and Staff

 

§  Xiaotao Gu, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign 

§  Priyanka Kargupta, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign 

§  Yu Zhang, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign 

§  Liyuan Liu, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign 

§  Yunyi Zhang, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign 

§  Ming Zhong, Ph.D. student, Department of Computer Science, University of Illinois at Urbana-Champaign 

Project Award Information

 

·         Award Number: NSF IIS NSF-IIS 19-56151

·         Duration: 10/01/2020-09/30/2024

·         Title: NSF III: Medium: Collaborative Research: Mining and Leveraging Knowledge Hypercubes for Complex Applications

·         Keywords:  text/data mining; knowledge bases; unsupervised and weakly supervised learning; multi-dimensional information extraction and analysis; knowledge discovery; efficiency and scalability

Project Summary

Recent years have witnessed the proliferation of various machine-readable knowledge repositories, such as general knowledge bases and domain-specific ontologies.  Although existing knowledge repositories have shown their power at simple search and question answering, their usage in complex problem solving is very limited. In many domains, knowledge varies with respect to contexts, and a flat structure that is commonly adopted by existing knowledge repositories cannot capture the complicated knowledge associated with different contexts.  To make knowledge resources more findable, accessible, interoperable, and reusable (FAIR), this project proposes to conceptualize a new structure, Knowledge Hypercube (K-CUBE), for organizing and retrieving knowledge that could support complex applications in various domains.  A knowledge hybercube organizes knowledge with respect to selected important dimensions or aspects, and thus it allows people to easily access knowledge in any context, encapsulate distinctive entities and relationships, and conduct cross-dimensional comparison and inference.  The major objective of this proposal is to form a paradigm of mining knowledge hybercubes from massive collection of text documents and leveraging such hybercubes for complex exploration and prediction tasks.   The progress of the project and the research results are also disseminated via the project Web site (http://hanj.cs.illinois.edu/projs/hypercube.htm).

 

Intellectual Merit:

 

The proposed research bridges the gap between the empirical success of network embedding, and existing statistical learning and optimization theories. The core of this proposed research is the integration of modern network mining techniques with sophisticated statistical learning and optimization tools, which lays a foundation to design a new generation of network embedding algorithms with strong theoretical guarantees, and to derive new theories for various setups of network embedding. Extensive empirical evaluations ensure the proposed algorithms' applicability in various application domains. The proposed research is expected to advance the frontier of network embedding and enable it to be good at taming modern massive networks in the wild.

 

Broader Impacts:

 

The successful completion of this project will lead to a new advanced way to store, retrieve, share and exploit knowledge for complex applications. It will have immediate impact on the process of knowledge distillation, organization and exploitation and will broadly impact the field of data science which centers around finding and using knowledge.  The proposed research will provide an important source to advance knowledge-based machine learning approaches. Furthermore, the proposed research to mine and leverage knowledge can potentially benefit a wide range of domains which have gigantic literature and unsolved complex tasks by building a bridge between complex tasks and text collections, such as drug repurposing and fake news detection.  A repository of the developed software and constructed knowledge hypercubes for the proposed domains will be constructed and the results of this project will be disseminated to both within the computer science area and in many other disciplines.  This project has the potential to promote the adoption of knowledge hypercubes by industry, making knowledge resources more findable, accessible, interoperable, and reusable (FAIR).  Moreover, the proposed research work will be integrated tightly with education as we plan to leverage knowledge hypercubes for educational tasks such as knowledge tracing.  We will also encourage the participation of undergraduate and minority students in data mining research at all three institutions.

 

The research results are to be published in various research and application forums and be integrated into the educational programs at UIUC.  The progress of the project and the research results are also disseminated via the project Web site (http://www.cs.uiuc.edu/homes/hanj/projs/hypercube.htm).

Publications and Products: (Note: major publications closely related to this project are in bold font)

Note:  Please search and download all the papers in PDF, if available, at our group’s publication website by following the link: Selected research publications.

Books

·         Xiang Ren and Jiawei Han, Mining Structures of Factual Knowledge from Text: An Effort-Light ApproachMorgan & Claypool Publishers, 2018 (Series: Synthesis Lectures on Data Mining and Knowledge Discovery)

·         Chao Zhang and Jiawei Han, Multidimensional Mining of Massive Text Data, Morgan & Claypool Publishers, 2019 (Series: Synthesis Lectures on Data Mining and Knowledge Discovery)

 

 

Journal articles

·         Zhizhi Yu, Di Jin, Ziyang Liu, Dongxiao He, Xiao Wang, Hanghang Tong, Jiawei Han, “Embedding text-rich graph neural networks with sequence and topical semantic structures”, Knowledge and Information Systems, 65(2): 613-640 (2023)

·         Wei Shen, Yuhan Li, Yinan Liu, Jiawei Han, Jianyong Wang, Xiaojie Yuan, “Entity Linking Meets Deep Learning: Techniques and Solutions”, IEEE Trans. Knowl. Data Eng., 35(3): 2556-2578 (2023)

·         Di Jin, Zhizhi Yu, Dongxiao He, Carl Yang, Philip S. Yu, Jiawei Han, “GCN for HIN via Implicit Utilization of Attention and Meta-Pathss”, IEEE Trans. Knowl. Data Eng., 35(4): 3925-3937 (2023) 

·         Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu, “Heterogeneous Information Networks: the Past, the Present, and the Future”, Proc. VLDB Endow., 15(12): 3807-3811 (2022)

·         Di Jin, Wenjun Wang, Guojie Song, Philip S. Yu, Jiawei Han, “Guest Editorial: Special Issue on Network Structural Modeling and Learning in Big Data, IEEE Transactions on Big Data, 8(4): 867-868 (2022)

·         Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun and Jiawei Han, “Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark ”, IEEE Transactions on Knowledge and Data Engineering, 34(10): 4854-4873 (2022)

·         Wei Shen, Yuwei Yin, Yang Yang, Jiawei Han, Jianyong Wang, Xiaojie Yuan, “Toward Tweet Entity Linking with Heterogeneous Information Networks”, IEEE Transactions on Knowledge and Data Engineering, 34(12): 6003-6017 (2022)

·         Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, and Jiawei Han, ”Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts”, Frontier in Big Data, 3:9, 2020

·         Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han, "Automated Phrase Mining from Massive Text Corpora", IEEE Transactions on Knowledge and Data Engineering, 30(10):1825-1837 (2018)

·         Jingbo Shang, Meng Jiang, Wenzhu Tong, Jinfeng Xiao, Jian Peng, Jiawei Han. "DPPred: An Effective Prediction Framework with Concise Discriminative Patterns", IEEE Transactions on Knowledge and Data Engineering, 30(7): 1226-1239 (2018)

Refereed Conference Publications

1.       Sizhe Zhou, Suyu Ge, Jiaming Shen, Jiawei Han, “Corpus-Based Relation Extraction by Identifying and Refining Relation Patterns”, in Proc. 2023 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’23), Turin, Italy, Sept. 2023

2.       Bowen Jin, Yu Zhang, Qi Zhu, Jiawei Han, “Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks”, in Proc. 2023 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’23), Long Beach, CA, August 2023

3.       Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei Han, “Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers”, in Proc. 2023 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’23), Long Beach, CA, August 2023

4.       Nishant Balepur, Shivam Agarwal, Karthik Venkat Ramanan, Susik Yoon, Diyi Yang and Jiawei Han, “DynaMiTE: Discovering Explosive Topic Evolutions with User Guidance”, in Proc. 2023 Annual Meeting of the Association for Computational Linguistics (ACL Findings’23), Toronto, Canada, July 2023, pp. 194-217

5.       Pengcheng Jiang, Shivam Agarwal, Bowen Jin, Xuan Wang, Jimeng Sun and Jiawei Han, “Text Augmented Open Knowledge Graph Completion via Pre-Trained Language Models”, in Proc. 2023 Annual Meeting of the Association for Computational Linguistics (ACL Findings’23), Toronto, Canada, July 2023, pp. 11161-11180

6.       Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu and Jiawei Han, “Patton: Language Model Pretraining on Text-Rich Networks”, in Proc. 2023 Annual Meeting of the Association for Computational Linguistics (ACL’23), Toronto, Canada, July 2023, pp. 7005-7020

7.       Sha Li, Ruining Zhao, Manling Li, Heng Ji, Chris Callison-Burch and Jiawei Han “Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification”, in Proc. 2023 Annual Meeting of the Association for Computational Linguistics (ACL’23), Toronto, Canada, July 2023, pp. 5677-5697

8.       Siru Ouyang, Jiaao Chen, Jiawei Han and Diyi Yang, “Compositional Data Augmentation for Abstractive Conversation Summarization”, in Proc. 2023 Annual Meeting of the Association for Computational Linguistics (ACL’23), Toronto, Canada, July 2023, pp. 1471-1488

9.       Ming Zhong, Siru Ouyang, Minhao Jiang, Vivian Hu, Yizhu Jiao, Xuan Wang and Jiawei Han “ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision”, in Proc. 2023 Annual Meeting of the Association for Computational Linguistics (ACL Findings’23), Toronto, Canada, July 2023, pp. 12120-12130

10.    Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, Jiawei Han, “Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning”, in Proc. 2023 Int. Conf. on Machine Learning (ICML’23), Honolulu, Hawaii, July 2023

11.    Susik Yoon, Dongha Lee, Yunyi Zhang and Jiawei Han, “Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding”, in Proc. 2023 ACM SIGIR Int. Conf. on Research and Development in Information Retrieval (SIGIR’23), Taipei, Taiwan, July 2023

12.    Bowen Jin, Yu Zhang, Yu Meng, Jiawei Han, “Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks”, in Proc. 2023 Int. Conf. on Learning Representations (ICLR’23), Kigali Rwanda, May 2023

13.    Susik Yoon, Hou Pong Chan and Jiawei Han, “PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets Stream”, in Proc. 2023 The Web Conf. (WWW’23), Austin, TX, Apr. 2023, pp. 1650-1661

14.    Susik Yoon, Yu Meng, Dongha Lee and Jiawei Han, “SCStory: Self-supervised and Continual Online Story Discovery”, in Proc. 2023 The Web Conf. (WWW’23), Austin, TX, Apr. 2023, pp. 1853-1864

15.    Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng and Jiawei Han, “The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study”, in Proc. 2023 The Web Conf. (WWW’23), Austin, TX, Apr. 2023, pp. 1626-1637

16.    Yizhu Jiao, Ming Zhong, Jiaming Shen, Yunyi Zhang, Chao Zhang and Jiawei Han, “Unsupervised Event Chain Mining from Multiple Documents”, in Proc. 2023 The Web Conf. (WWW’23), Austin, TX, Apr. 2023, pp. 1948-1959

17.    Jinfeng Xiao, Mohab Elkaref, Nathan Herr, Geeth De Mel, and Jiawei Han, “Taxonomy-Guided Fine-Grained Entity Set Expansion”, in Proc. 2023 SIAM Conf. on Data Mining (SDM’23), Minneapolis, MN, Apr. 2023, pp. 1626-1637

18.    Suyu Ge, Jiaxin Huang, Yu Meng, and Jiawei Han, “FineSum: Target-Oriented, Fine-Grained Opinion Summarization”, in Proc. 2023 ACM Int. Conf. on Web Search and Data Mining (WSDM’23), Singapore, Feb. 2023, pp. 1093-1101

19.    Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, and Jiawei Han, “Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts”, in Proc. 2023 ACM Int. Conf. on Web Search and Data Mining (WSDM’23), Singapore, Feb. 2023, pp. 429-437

20.    Yizhu Jiao, Sha Li, Yiqing Xie, Ming Zhong, Heng Ji and Jiawei Han, “Open-Vocabulary Argument Role Prediction for Event Extraction”, in Proc. 2022 Conf. on Empirical Methods in Natural Language Processing (EMNLP’22), Abu Dhabi, UAE, Dec. 2022

21.    Ming Zhong, Yang Liu, Suyu Ge, Yuning Mao, Yizhu Jiao, Xingxing Zhang, Yichong Xu, Chenguang Zhu, Michael Zeng and Jiawei Han, “Unsupervised Multi-Granularity Summarization”, in Proc. 2022 Conf. on Empirical Methods in Natural Language Processing (EMNLP’22), Abu Dhabi, UAE, Dec. 2022

22.    Sha Li, Heng Ji and Jiawei Han, “Open Relation and Event Type Discovery with Type Abstraction”, in Proc. 2022 Conf. on Empirical Methods in Natural Language Processing (EMNLP’22), Abu Dhabi, UAE, Dec. 2022

23.    Yuning Mao, Ming Zhong and Jiawei Han, “CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation”, in Proc. 2022 Conf. on Empirical Methods in Natural Language Processing (EMNLP’22), Abu Dhabi, UAE, Dec. 2022

24.    Ming Zhong, Yang Liu, Da Yin, Yuning Mao, Yizhu Jiao, Pengfei Liu, Chenguang Zhu, Heng Ji and Jiawei Han, “Towards A Unified Multi-Dimensional Evaluator for Text Generation”, in Proc. 2022 Conf. on Empirical Methods in Natural Language Processing (Findings of EMNLP’22), Abu Dhabi, UAE, Dec. 2022

25.    Dongha Lee, Jiaming Shen, Seonghyeon Lee, Susik Yoon, Hwanjo Yu and Jiawei Han, “Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation”, in Proc. 2022 Conf. on Empirical Methods in Natural Language Processing (Findings of EMNLP 2022), Abu Dhabi, UAE, Dec. 2022

26.    Xuan Wang, Vivian Hu, Minhao Jiang, Yu Zhang, Jinfeng Xiao, Danielle Cherrice Loving, Heng Ji, Martin Burke, Jiawei Han, “REACTCLASS: Cross-Modal Supervision for Subword-Guided Reactant Entity Classificationn”, in Proc. 2022 IEEE Int. Conf. on Bioinformatics and Biomedicine (BIBM’22), Las Vegas, NV, Dec. 2022, pp. 844-847

27.    Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han, “Generating Training Data with Language Models: Towards Zero-Shot Language Understanding”, in Proc. of 2022 Conf. on Neural Information Processing Systems (NeurIPS’22), New Orlean, LA, Nov. 2022

28.    Shivam Agarwal, Ramit Sawhney, Megh Thakkar, Preslav Nakov, Jiawei Han, and Tyler Derr, “THINK: Temporal Hypergraph Hyperbolic Network”, in Proc. of 2022 IEEE Int. Conf. on Data Mining (ICDM’22), Orlando, FL, Nov. 2022, pp. 849-854

29.    Jiaxin Huang, Yu Meng, and Jiawei Han, “Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation”, in Proc. of 2022 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’22), Washington, DC, Aug. 2022, pp. 605-614

30.    Yunyi Zhang, Fang Guo, Jiaming Shen, and Jiawei Han., “Unsupervised Key Event Detection from Massive Text Corpus”, in Proc. of 2022 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’22), Washington, DC, Aug. 2022, pp. 2535-2544

31.    Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han, “Seed-Guided Topic Discovery with Out-of- Vocabulary Seeds”, in Proc. of 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’22), Seattle, WA, July 2022, pp. 279-290

32.    Yuxin Xiao, Zecheng Zhang, Yuning Mao, Carl Yang, Jiawei Han, “SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction”, in Proc. of 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’22), Seattle, WA, July 2022, pp. 2395-2409

33.    Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han, “Phrase-aware Unsupervised Constituency Parsing”, in Proc. of 2022 Annual Meeting of the Association for Computational Linguistics (ACL’22), Dublin, Ireland, May 2022, pp. 6406-6415

34.    Yuning Mao, Lambert Mathias, Rui Hou, Amjad Almahairi, Hao Ma, Jiawei Han, Wen-tau Yih, Madian Khabsa, “UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning”, in Proc. of 2022 Annual Meeting of the Association for Computational Linguistics (ACL’22), Dublin, Ireland, May 2022, pp. 6253-6264

35.    Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, Jiawei Han, “EIDER: Evidence-enhanced Document-level Relation Extraction”, in Findings of the Association for Computational Linguistics (ACL’22 Findings), Dublin, Ireland, May 2022, pp. 257-268

36.    Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul N. Bennett, Jiawei Han, Xia Song, “Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators”, in Proc. 2022 Int. Conf. on Learning Representations (ICLR’22), April 2022

37.    Minhao Jiang, Xiangchen Song, Jieyu Zhang and Jiawei Han, “TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic Representations”, in Proc. The ACM Web Conf. 2022 (WWW’22), April 2022, pp. 925-934

38.    Dongha Lee, Jiaming Shen, Seongku Kang, Susik Yoon, Jiawei Han and Hwanjo Yu, “TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters”, in Proc. The ACM Web Conf. 2022 (WWW’22), April 2022, pp. 2819-2829

39.    Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang and Jiawei Han, “Topic Discovery via Latent Space Clustering of Language Model Embeddings”, in Proc. The ACM Web Conf. 2022 (WWW’22), April 2022, pp. 3143-3152

40.    Yiqing Xie, Zhen Wang, Carl Yang, Yaliang Li, Bolin Ding, Hongbo Deng and Jiawei Han, “KoMen: Domain Knowledge Guided Interaction Recommendation for Emerging Scenarios”, in Proc. The ACM Web Conf. 2022 (WWW’22), April 2022, pp. 1301-1310

41.    Yu Zhang, Zhihong Shen, Chieh-Han Wu, Boya Xie, Junheng Hao, Ye-Yi Wang, Kuansan Wang and Jiawei Han, “Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification”, in Proc. The ACM Web Conf. 2022 (WWW’22), April 2022, pp. 3162-3173

42.    Yu Zhang, Shweta Garg, Yu Meng, Xiusi Chen, Jiawei Han, “MotifClass: Weakly Supervised Text Classi- fication with Higher-order Metadata Information”, in Proc. 2022 ACM Int. Conf. on Web Search and Data Mining (WSDM’22), Feb. 2022, pp. 1357-1367

43.    Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang, "UCPhrase: Unsupervised Context-aware Quality Phrase Tagging", in Proc. of 2021 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'21), Aug. 2021 

44.    Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han, "On the Power of Pre-Trained Text Representations: Models and Applications in Text Mining" (Conference Tutorial), in Proc. of 2021 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'21), Aug. 2021

45.    Sha Li, Heng Ji and Jiawei Han, "Document-Level Event Argument Extraction by Conditional Generation", in Proc. 2021 Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT'21), June 2021

46.    Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren and Jiawei Han, "TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names", in Proc. 2021 Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT'21), June 2021

47.    Xinyang Zhang, Chenwei Zhang, Xin Luna Dong, Jingbo Shang and Jiawei Han, “Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks”, in Proc. The Web Conf. 2021 (WWW’21), April 2021

48.    Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang and Jiawei Han, “MATCH: Metadata-Aware Text Classification in a Large Hierarchy”, in Proc. The Web Conf. 2021 (WWW’21), April 2021

49.    Qi Zhu, Fang Guo, Jingjing Tian, Yuning Mao, Jiawei Han, "SUMDocS: Surrounding-aware Unsupervised Multiple Document Summarization", in Proc. 2021 SIAM Int. Conf. on Data Mining (SDM'21), April 2021

50.    Yu Zhang, Xiusi Chen, Yu Meng and Jiawei Han, "Hierarchical Metadata-Aware Document Categorization under Weak Supervision", in Proc. 2021 ACM Int. Conf. on Web Search and Data Mining (WSDM'21), Feb. 2021

51.    Di Jin, Xiangchen Song, Zhizhi Yu, Ziyang Liu, Heling Zhang, Zhaomeng Cheng and Jiawei Han, "BiTe-GCN: A New GCN Architecture via Bidirectional Convolution of Topology and Features on Text-Rich Networks",  in Proc. 2021 ACM Int. Conf. on Web Search and Data Mining (WSDM'21), Feb. 2021

52.    Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun and Jiawei Han, "Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark", IEEE Transactions on Knowledge and Data Engineering, 2021

53.    Xuan Wang, Xiangchen Song, Bangzheng Li, Kang Zhou, Qi Li, and Jiawei Han, "Fine-Grained Named Entity Recognition with Distant Supervision in COVID-19 Literature", in Proc. 2020 IEEE Int. Conf. on Bioinformatics and Biomedicine (IEEE BIBM 2020), Dec. 2020

54.    Xuan Wang, Yu Zhang, Aabhas Chauhan, Qi Li, and Jiawei Han, "Textual Evidence Mining via Spherical Heterogeneous Information Network Embedding", in Proc. 2020 IEEE Int. Conf. on Big Data (IEEE BigData'20), Dec. 2020 

55.    XuanWang, Yingjun Guan, Yu Zhang, Qi Li, and Jiawei Han, "Pattern-enhanced Named Entity Recognition with Distant Supervision", in Proc. 2020 IEEE Int. Conf. on Big Data (IEEE BigData'20), Dec. 2020  

56.    Carl Yang, Liyuan Liu, Mengxiong Liu, Zongyi Wang, Chao Zhang, and Jiawei Han, "Graph Clustering with Embedding Propagation", in Proc. 2020 IEEE Int. Conf. on Big Data (IEEE BigData'20), Dec. 2020  

57.    Jiaxin Huang, Yu Meng, Fang Guo, Heng Ji and Jiawei Han, "Aspect-Based Sentiment Analysis by Aspect-Sentiment Joint Embedding", in Proc. 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP'20), Nov. 2020

58.    Yuning Mao, Yanru Qu, Yiqing Xie, Xiang Ren and Jiawei Han, "Multi-document Summarization with Maximal Marginal Relevance-guided Reinforcement Learning", in Proc. 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP'20), Nov. 2020

59.    Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang and Jiawei Han, "Text Classification Using Label Names Only: A Language Model Self-Training Approach", in Proc. 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP'20), Nov. 2020

60.    Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni, Xiang Ren and Jiawei Han, "SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery", in Proc. 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP'20), Nov. 2020

61.    Edouard Fouche, Yu Meng, Fang Guo, Honglei Zhuang, Klemens Boehm, and Jiawei Han, "Mining Text Outliers in Document Directories", in Proc. 2020 IEEE Int. Conf. on Data Mining (ICDM'20), Nov. 2020

62.    Carl Yang, Jieyu Zhang, and Jiawei Han, "Co-Embedding Network Nodes and Hierarchical Labels with Taxonomy Based Generative Adversarial Networks", in Proc. 2020 IEEE Int. Conf. on Data Mining (ICDM'20), Nov. 2020 (Best Paper Award)

63.    Yu Meng, Jiaxin Huang, Jiawei Han, “Embedding-Driven Multi-Dimensional Topic Mining and Text Analysis”, (Conference tutorial), 2020 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’20), San Diego, CA, August 2020

64.    Jiaxin Huang, Yiqing Xie, Yu Meng, Yunyi Zhang and Jiawei Han, “CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring”, in Proc. of 2020 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’20), San Diego, CA, August 2020

65.    Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos and Jiawei Han, “Octet: Online Catalog Taxonomy Enrichment with Self-Supervision”, in Proc. of 2020 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’20), San Diego, CA, August 2020 

66.    Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang and Jiawei Han, “Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding”, in Proc. of 2020 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’20), San Diego, CA, August 2020 

67.    Chanyoung Park, Carl Yang, Qi Zhu, Donghyun Kim, Hwanjo Yu and Jiawei Han, “Unsupervised Differentiable Multi-aspect Network Embedding”, in Proc. of 2020 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’20), San Diego, CA, August 2020 

68.    Carl Yang, Aditya Pal, Andrew Zhai, Nikil Pancha, Jiawei Han, Chuck Rosenburg and Jure Leskovec, “MultiSage: Empowering GCN with Contextualized Multi-Embeddings on Web-Scale Multipartite Networks”, in Proc. of 2020 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’20), San Diego, CA, August 2020

69.    Yiqing Xie, Sha Li, Carl Yang, Raymond Chi-Wing Wong, Jiawei Han, “When Do GNNs Work: Understanding and Improving Neighborhood Aggregation”, in Proc. of 2020 Int. Joint Conf. on Artificial Intelligence and Pacific Rim Int. Conf. on Artificial Intelligence (IJCAI-PRICAI’20), Yokohoma, Japan, July 2020

70.    Yu Zhang, Yu Meng, Jiaxin Huang, Frank F. Xu, Xuan Wang and Jiawei Han, “Minimally Supervised Categorization of Text with Metadata”, in Proc. 2020 ACM SIGIR Int. Conf. on Research and development in Information Retrieval (SIGIR’20), Xi’an, China, July 2020 

71.    Honglei Zhuang, Fang Guo, Chao Zhang, Liyuan Liu and Jiawei Han, “Joint Aspect-Sentiment Analysis with Minimal User Guidance”, in Proc. 2020 ACM SIGIR Int. Conf. on Research and development in Information Retrieval (SIGIR’20), Xi’an, China, July 2020

72.    Carl Yang, Jieyu Zhang, Haonan Wang, Bangzheng Li, Jiawei Han, "Neural Concept Map Generation for Effective Document Classification with Interpretable Structured Summarization" (short paper), in Proc. 2020 ACM SIGIR Int. Conf. on Research and development in Information Retrieval (SIGIR'20), Xi'an, China, July 2020

73.    Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren and Jiawei Han, “Facet-Aware Evaluation for Extractive Summarization”, in Proc. 2020 Annual Conf. of the Association for Computational Linguistics (ACL’20), Seattle, WA, July 202

74.    Yunyi Zhang, Jiaming Shen, Jingbo Shang and Jiawei Han, “Empower Entity Set Expansion via Language Model Probing”, in Proc. 2020 Annual Conf. of the Association for Computational Linguistics (ACL’20), Seattle, WA, July 2020 

75.    Xuan Wang, Yingjun Guan, Weili Liu, Aabhas Chauhan, Enyi Jiang, Qi Li, David Liem, Dibakar Sigdel, John Caufield, Peipei Ping and Jiawei Han, “EVIDENCEMINER: Textual Evidence Discovery for Life Sciences”, in Proc. 2020 Annual Conf. of the Association for Computational Linguistics (ACL’20) (System demo), Seattle, WA, July 2020

76.    Xiaotao Gu, Yuning Mao, Jiawei Han, Jialu Liu, You Wu, Cong Yu, Daniel Finnie, Hongkun Yu, Jiaqi Zhai and Nicholas Zukoski, ”Generating Representative Headlines for News Stories”, in Proc. 2020 Int. World Wide Web Conf. (WWW’20), Taipei, Taiwan, Apr. 2020

77.    Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang and Jiawei Han, ”Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion”, in Proc. 2020 Int. World Wide Web Conf. (WWW’20), Taipei, Taiwan, Apr. 2020 

78.    Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang and Jiawei Han, ”Discriminative Topic Mining via Category-Name Guided Text Embedding”, in Proc. 2020 Int. World Wide Web Conf. (WWW’20), Taipei, Taiwan, Apr. 2020

79.    Jingbo Shang, Xinyang Zhang, Liyuan Liu, Sha Li and Jiawei Han, ”NetTaxo: Automated Topic Taxonomy Construction from Large-Scale Text-Rich Network”, in Proc. 2020 Int. World Wide Web Conf. (WWW’20), Taipei, Taiwan, Apr. 2020 

80.    Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang and Jiawei Han ”TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network”, in Proc. 2020 Int. World Wide Web Conf. (WWW’20), Taipei, Taiwan, Apr. 2020 

81.    Qi Zhu, Hao Wei, Bunyamin Sisman, Da Zheng, Christos Faloutsos, Xin Luna Dong and Jiawei Han, ”Collective Multi-type Entity Alignment Between Knowledge Graphs”, in Proc. 2020 Int. World Wide Web Conf. (WWW’20), Taipei, Taiwan, Apr. 2020

82.    Liu, Liyuan, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. "On the variance of the adaptive learning rate and beyond," In Proc. 2020 Int. Conf. on Learning Representations (ICLR), Addis Ababa, Ethiopia, Apr. 2020.

83.    Chanyoung Park, Donghyun Kim, Hwanjo Yu, Jiawei Han, “Unsupervised Attributed Multiplex Network Embedding”, in Proc. 2020 AAAI Int. Conf. on Artificial Intelligence (AAAI’20), New York, NY, Feb. 2020

84.    Aravind Sankar, Xinyang Zhang, Adit Krishnan and Jiawei Han, "A Deep Generative Approach to Integrate Social Homophily and Temporal Influence in Diffusion Prediction", in Proc. 2020 ACM Int. Conf. on Web Search and Data Mining (WSDM'20), Houston, TX, Feb. 2020

85.    Carl Yang, Jieyu Zhang, Haonan Wang, Sha Li, Myunghwan Kim, Matthew Walker, Yiou Xiao and Jiawei Han, "Relation Learning on Social Networks with Multi-Modal Graph Edge Variational Autoencoders", in Proc. 2020 ACM Int. Conf. on Web Search and Data Mining (WSDM'20), Houston, TX, Feb. 2020

86.    Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, and Jiawei Han, ”Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts”, Frontier in Big Data, 3:9, 2020

Ph.D. Dissertations

 

·         Xuan Wang, Ph.D., Dec. 2022, thesis title: “Scientific Knowledge Extraction from Massive Text Data”

·         Yuning Mao, Ph.D., April 2022, thesis title: “Guided Text Summarization with Limited Supervision”

·         Xiaotao Gu, Ph.D., March 2022, thesis title: “Annotation-Free Knowledge Mining from Massive Text Corpora”

·         Jiaming Shen, Ph.D., Nov. 2021, thesis title: “Automated Taxonomy Discovery and Exploration”

·         Carl Ji Yang, Ph.D., Nov. 2020, thesis title: “Multi-Facet Graph Mining with Contextualized Projections”

·         Shi Zhi, Ph.D., Sept. 2020, thesis title: “Learning from Multiple Heterogeneous Sources—Handling source trustworthiness and incompleteness”

·         Ahmed El-Kishky, Ph.D., March 2020, thesis title: “Text Mining at Multiple Granularity: Leveraging Subwords, Words, Phrases, and Sentences”

·         Jingbo Shang, Ph.D., Nov. 2019, thesis title: “Constructing and Mining Structured Heterogeneous Information Networks from Massive Text Corpora”, Ph.D. Thesis won 2020 ACM SIGKDD Doctoral Dissertation Award Runner-Up

 

Project Impact

 

§  Education: Parts of the new research results are used in Data Mining courses (CS412, CS512, CS412 MCD-DS online Coursera courses) for both undergraduate and graduate students being taught in the Department of Computer Science, the University of Illinois at Urbana-Champaign.   The research results have been and will continuously be published timely in international conferences and journals and be distributed world-wide for education and research.  Most of the software developed in this project have been made opensource published at Github. The new progress will also be integrated into the new edition of our data mining textbook and other research collections.

§  Collaborations: For this project we have established collaborations with ARL, Google Research, Amazon, Adobe, IAI, Microsoft Research, UCLA Medical School, LinkedIn, Facebook, and other industry and research centers.  Through such collaborations we expect to explore many real applications and produce bigger Research Impacts.

 

 

Current and Future Activities

The following are some of the highlights of our ongoing work.  Please refer to the section: Publications and Products section for related references.

1.        Study effective and scalable methods for embedding at mining text and heterogeneous information networks

2.        Study effective and scalable methods for embedding and text mining at construction of heterogeneous knowledge cubes from unstructured data

3.       Study effective and scalable methods for exploration of multidimensional text-and knowledge-hypercubes to support new applications

Area Background

 

This project is based on the previous research on data mining, text mining, embedding in networks, and data cube and multidimensional analysis.    There have been many research papers published on these themes.   Several textbooks on data mining, text mining, information retrieval and information network analysis provide good overviews of the principles and algorithms.

 

Area References

·         Jiawei Han, Jian Pei, and Hanghang Tong, Data Mining: Concepts and Techniques, 4th edition, Morgan Kaufmann, 2022

·         C. Aggarwal, Machine Learning for Text, Springer 2017

·         Xiang Ren and Jiawei Han, Mining Structures of Factual Knowledge from Text: An Effort-Light Approach, Morgan & Claypool Publishers, 2018 

·         Jialu Liu, Jingbo Shang and Jiawei Han,  Phrase Mining from Massive Text and Its Applications, Morgan & Claypool, 2017

·         Yizhou Sun and Jiawei Han, Mining Heterogeneous Information Networks: Principles and Methodologies, Morgan & Claypool, 2012

 

 

Potential Related Projects

·         Information Network Academic Research CenterNetwork Science-Collaborative Technology Alliance

·         NIH BD2K: KnowEng (Knowledge Engine for Genomics) Center: Construction and Mining of Biological Networks

·         Multi-Dimensional Structuring, Summarizing and Mining of Social Media Data (NSF/IIS)

·         StructNet: Constructing and Mining Structure-Rich Information Networks for Scientific Research (NSF/IIS)

 

Project Web site URL:  http://hanj.cs.illinois.edu/projs/hypercube.htm

Online software:  Online software can be downloaded at GitHub by githubing the first-authors of the corresponding papers

Online resources:  Research publications related to this project can be downloaded at Selected Publications