Knowledge graph and ontologies within AI

Knowledge graph introduction

Within artificial intelligence and data-driven technologies, one concept that has gained significant attention and prominence is the knowledge graph. With its ability to represent and connect vast amounts of structured and semantically rich information, the knowledge graph has emerged as a fundamental framework for organising and harnessing knowledge. In this blog post, we will delve into the essence of knowledge graphs and ontologies, exploring their definition, components, and the role they play in achieving a deeper understanding of data and explainability of AI models.

A Knowledge Graph is a flexible, reusable data layer used for answering complex queries across data silos. They create supreme connectedness with contextualised data, represented and organised in the form of graphs. Built to capture the ever-changing nature of knowledge, they easily accept new data, definitions, and requirements.

In 2012, Google announced that soon, users of its search engine would be able to search for “things, not strings” (of text). In other words, Google would return what is known as structured information about people, places, events, movies, and other concepts, not just the traditional list of web links containing words matching the search terms. This information is drawn from what is known as a knowledge graph. [1]

*Nowadays, much of the response for a Google search isn’t web links, but structured information from Google’s knowledge graph.*

It’s not just Google. Across the data science community, knowledge graphs have become a growing phenomenon in recent years, driving many applications, including virtual assistants like Siri and Alexa. At the same time, there’s some debate about what actually constitutes a knowledge graph (they’re a bit of a buzzword). But one common definition describes a knowledge graph as knowledge bases plus data integration. [2]

“things, not strings”

Components of a Knowledge Graph

A knowledge graph comprises of three main components: entities, attributes, and relationships. Entities represent real-world objects, concepts, or instances, such as people, places, or events. Attributes describe properties or characteristics of these entities, providing additional contextual information. Relationships, on the other hand, establish connections and associations between entities

A simple KG example

Knowledge graph representation

At the heart of knowledge graph representation lies the concept of a node-edge structure. Nodes represent entities or concepts, while edges capture the relationships or connections between these entities. When visualised as a graph, it allows for a clear and intuitive depiction of the interconnectedness and dependencies within the knowledge domain.

A widely adopted representation format for knowledge graphs is the Resource Description Framework (RDF). RDF represents knowledge as triples, consisting of subject-predicate-object statements. The subject represents the entity, the predicate denotes the relationship, and the object signifies the related entity or value.

Understanding Ontology: The Backbone of Knowledge Graphs

In the realm of knowledge representation and organisation, ontologies play a crucial role in structuring information and enabling effective data integration and reasoning. An ontology holds the master knowledge scaffolding of an organisation: a complete, consistent data model of the business. Its a framework designed to make AI work more effectively. In this blog section, we will explore the concept of ontology, its purpose, and its significance within the realm of knowledge graphs.

Purpose of an Ontology

Ontologies facilitate data integration by providing a shared understanding of the underlying concepts and relationships. They are also instrumental in knowledge engineering, allowing for the formal representation and organisation of domain-specific knowledge.

Why Ontologies matter

With speed and effectiveness of a company being at the forefront of industrial competition, building an ontology is not just another IT project. It is one that should matter to every CEO, CMO, and senior manager inside the company. Once you create the framework for the ontology, you can get more from your current investments in technology and apply emerging artificial intelligence techniques to drive your business. The ontology is the tool that teaches intelligent machines how your business runs. Without it, neither your systems nor your employees can truly understand how to access and organise the lifeblood of the business – the knowledge and information that provides value for your customers and the marketplace. [3]

Ontologies provide a strategic advantage in transforming your company into an AI-powered enterprise, enabling accelerated operations and enhanced decision-making. By leveraging AI capabilities supported by an ontology, your business can gain a competitive edge. It enables faster product and service delivery, efficient customer service, and the ability to capitalise on emerging market opportunities. The streamlined information flows facilitated by the ontology empower both automated systems and human decision-makers, driving improved outcomes.

Your ontology has the potential to accelerate your enterprise whether you work in retail, finance, healthcare, government, or any other sector. Additionally, it provides deep insights rooted in individuals’ underlying motivations rather than surface-level attributes like age, leading to product enhancements that align with those motivations.

Ontologies power meaningful AI capabilities

Ontologies begin as a holistic understanding of the language of the business and the customer, and are then designed into processes, applications, navigational structures, content, data models, and the relationships between concepts. They contain language variations, alternative spellings, translations, acronyms, and technical terms. They can describe “is-ness” and “about-ness” – this is a contract, it is about a services engagement, it is also about this vendor, for example.

Ontologies can also support advanced capabilities to drive intelligent virtual assistants (bots). They can form the basis for inference engines – mechanisms to essentially answer a question that has not been preprogrammed into the bot. Bots powered by ontologies are faster to deploy, more scalable, and more cost-effective. Every aspect of business requires contextualised knowledge. The role of AI is to use the ontology to assist with this contextualisation

Defining Ontology

At its core, an ontology is a formal and explicit specification of shared conceptualisation. It provides a structured framework for representing knowledge by defining a set of concepts, their properties, and the relationships between them. Essentially, an ontology serves as a common vocabulary that enables effective communication and understanding between humans and machines.

Components of an Ontology

An ontology consists of several key components, including concepts, properties, and relationships. Concepts represent the various entities or classes within a domain, defining their characteristics and behaviours. Properties describe the attributes or features associated with these concepts, specifying their roles and relationships. Relationships establish connections and dependencies between concepts, capturing the associations and interactions within the knowledge domain.

Ontology recursion 🙂

“Knowledge” engineering?

So knowledge engineering refers to the process of capturing, organising, and modelling knowledge to create intelligent systems or applications. It involves the systematic and structured approach of acquiring, representing, and maintaining human knowledge within a knowledge-based system. The overall goal of knowledge engineering is to support dynamic and intelligent information architectures, so that systems that can reason, learn, and make decisions based on the captured knowledge.

Relevance of Knowledge graphs to AI

Knowledge graphs are highly relevant to artificial intelligence (AI) because they provide a structured and interconnected representation of knowledge, enabling AI systems to access and reason over vast amounts of information. By capturing entities, attributes, and relationships, knowledge graphs facilitate a deeper understanding of data and enhance the interpretability and explainability of AI models.

Knowledge graphs play a vital role in achieving explainability in AI systems by providing transparency and justifications for AI-based decisions. They enable semantic search, data integration, and knowledge discovery, making AI systems more intelligent and capable of making informed decisions based on interconnected knowledge.

“there can be no meaningful AI without IA”

Industry trends for information retrieval and LLMs

The importance of information retrieval for generative large language models (LLMs) is widely acknowledged in the data science community. While LLMs rely on retrieving information to enhance their reasoning abilities, their stored knowledge is often unreliable and difficult to update.

These models excel at reasoning based on the context provided during inference. However, a bias has emerged in the first half of 2023, suggesting that vector search is the only way to achieve information retrieval for LLMs. This is incorrect, and limiting practitioners towards a single approach (vector databases) instead of focusing on the broader capabilities of knowledge graphs for information retrieval. It would appear this bias is impeding advancements in the interpretability of AI decision-making. [4]

Advantages of Knowledge graphs over vector based information retrieval

While vector-based retrieval techniques, such as word embeddings or neural network-based models have their merits in representing and retrieving information, knowledge graphs offer some distinct advantages:

Contextual Understanding: Knowledge graphs capture not only the individual entities and their attributes but also the relationships between them. This contextual understanding enables AI systems to go beyond surface-level associations and comprehend the semantic connections and dependencies within the data. In contrast, vector-based retrieval techniques often focus on capturing the similarity or proximity between words or documents without explicitly modelling the underlying relationships.
Interconnected Knowledge: Knowledge graphs excel at representing interconnected knowledge, where entities and concepts are linked through explicit relationships. This interconnectedness allows AI systems to traverse the graph, follow the relationships, and uncover hidden patterns, insights, and contextual information. Vector-based retrieval techniques, on the other hand, primarily rely on statistical associations and proximity, lacking the explicit representation of relationships that knowledge graphs provide.
Explainability and Interpretability: Knowledge graphs inherently provide a transparent and interpretable representation of knowledge. The explicit nature of entities, attributes, and relationships allows stakeholders to understand the reasoning behind AI-based decisions. In contrast, vector-based retrieval techniques often operate as black boxes, where the inner workings and decision-making processes are not easily explainable or interpretable.
Knowledge Integration and Data Enrichment: Knowledge graphs facilitate the integration of diverse and heterogeneous data sources. By incorporating external knowledge and aligning it within the graph structure, AI systems can benefit from a broader and richer knowledge base. In vector-based retrieval, external knowledge integration can be challenging and often relies on pre-processing steps or separate techniques.
Reasoning and Inference: Knowledge graphs provide a foundation for reasoning and inference capabilities. By leveraging logical rules and ontological constraints, AI systems can perform deductive, inductive, and abductive reasoning over the knowledge graph. This reasoning ability enables deeper insights, inference generation, and decision support. Vector-based retrieval techniques, while powerful for similarity-based retrieval, do not inherently possess explicit reasoning capabilities.

Knowledge graphs offer a structured representation of interconnected knowledge, fostering contextual understanding, explainability, reasoning, and data integration within AI systems. While vector-based retrieval techniques have their strengths in certain applications, knowledge graphs provide a distinct framework for capturing, organising, and leveraging knowledge to enhance AI capabilities.

NB: A nice introduction to vector based embeddings written by our colleague Narin Meher can be found here : https://www.bigspark.dev/word-embedding-word2vec-algorithm-implementation-using-sparkmllib/

Summary points

Ontology = Structured framework defining concepts, relationships, and properties in a domain.
Knowledge graphs = Ontology + Data
Knowledge engineering is the pursuit of structuring real world semantics into a knowledge system
Major competitive advantages exist when combining the power of knowledge graphs with AI systems
Explainability of AI based decisions can be anchored on top of knowledge graphs, whereas vector based approaches are fundamentally black-box in nature

References

[1]Singhal, A. (2012) Introducing the Knowledge Graph: things, not strings. https://blog.google/products/search/introducing-knowledge-graph-things-not/

[2]OCCRP, Things not strings. (2020) https://medium.com/occrp-unreported/things-not-strings-knowledge-graphs-for-investigative-reporting-9d8a26913f65

[3] Earley, S (2020) The AI powered Enterprise. LifeTree Media

[4]Harman, C. (2023) Beware Tunnel Vision in AI retrieval. https://colinharman.substack.com/p/beware-tunnel-vision-in-ai-retrieval?sd=pf