A Task-Agnostic Machine Learning Framework for Dynamic Knowledge Graphs


Many applications require well-structured and current information to enable downstream tasks. Knowledge graphs are a type of knowledge representation that effectively organize current information capturing elements and the relationships between them such that they can be queried and/or reasoned over in more advanced applications. A particular challenge is ensuring that an application-specific knowledge graph is both comprehensive and contains the most current representation, achieved through dynamic updating. Some available software frameworks for managing information as part of a data science pipeline are effective in collecting, labelling, and analysing textual data using natural language processing. Despite the utility of these frameworks, they can nonetheless be daunting for use by industry professionals and/or researchers who may not be familiar with the specifics of each tool. In this work, we present a generalized task-agnostic supervised machine learning framework that serves as a streamlined methodology for the creation and dynamic updating of knowledge graphs. A user needs only to define task-specific parameters allowing the tool to scrape data from the internet, generating a candidate corpus. The user may then provide sample annotations from the corpus to train task-specific natural language processing models to extract the relevant knowledge graph elements and the relationships connecting them. We demonstrate the utility of this framework for a case study seeking to build knowledge graph representations of merger and acquisition events between companies from scraped online articles reporting these instances. Our task-specific machine learning models achieve upwards of 99.2% F1 score evaluation metric on candidate web page classification and 81.5% F1 score on sentence-level extraction of entity relationships, demonstrating the promise of this framework.Our framework is freely available at: github.com/Checktr/tadkg

Proceedings of the 32nd Annual International Conference on Computer Science and Software Engineering