To unlock this content please submit your details

Loading...

Table of Content


  1. Overview of property graphs


    1. The structure of labelled property graphs

    2. Exploring their flexible schema

    3. Notable LPG databases

    4. Use Cases

  2. Overview of RDF knowledge graphs


    1. The structure of RDF knowledge graphs

    2. Exploring their accompanying schema

    3. Notable RDF databases

    4. Use cases

  3. Key differences: RDF Knowledge graphs and Labelled property graphs


    1. Graph query language: Sparql v Cypher

  4. Bloor Research’s Graph Database – Market update 2023

Introduction

In today’s data-driven world, where information is generated and consumed at an unprecedented rate, traditional relational databases often fall short when it comes to handling complex and interconnected data. This has paved the way for the rise of graph databases, a revolutionary technology that allows for efficient storage and retrieval of highly connected data.

Graph databases are designed to represent and query relationships between entities, making them ideal for applications that heavily rely on connected data. However, not all graph databases are created equal. Two prominent paradigms have emerged: Knowledge Graphs and Property Graphs.

In this report, we will delve into graph databases and explore the key differences between Knowledge Graphs and Property Graphs. We will examine their underlying concepts, use cases, and the advantages they offer in different scenarios.

First, we’ll take a look at Property Graphs, a flexible and intuitive graph model that focuses on the properties of nodes and relationships. Property Graphs provide a more straightforward approach to data modelling, offering greater flexibility and scalability for applications that require rapid prototyping and agile development.

We’ll then explore RDF Knowledge Graphs, a powerful representation of information that captures the semantics of relationships and provides a rich context for data analysis. Knowledge Graphs leverage ontologies and semantic models to establish meaningful connections between entities, enabling advanced reasoning and inference capabilities.

Throughout we’ll compare the individual strengths and weaknesses of Knowledge Graphs and Property Graphs, highlighting the unique features that make each approach suitable for different use cases. We’ll discuss scenarios where Knowledge Graphs shine, such as in the domains of semantic search, recommendation systems, and knowledge representation. Additionally, we’ll uncover the advantages of Property Graphs in graph-based applications that require high-performance querying, real-time updates, and graph analytics.

 

There is no single database management system that meets all needs.

By the end of this exploration, you’ll have a solid understanding of the fundamental differences
between Knowledge Graphs and Property Graphs, empowering you to make informed decisions when choosing the right graph database paradigm for your specific needs.

 


Labelled Property Graphs: Overview

Labelled Property Graphs (LPGs) represent one of the two primary categories of graph databases, the other being RDF (Resource Description Framework) Knowledge Graphs. Providing an inherently flexible and intuitive model for data representation, LPGs have become increasingly popular in a variety of domains.

The Structure of Labelled property graphs

At the heart of an LPG are three key components: nodes, relationships (or edges), and properties. Nodes typically represent entities, such as a person, place, or thing, and each node can have an associated label indicating its type. Relationships connect these nodes, denoting how they are related to one another. Both nodes and relationships can possess properties, which are essentially key-value pairs offering additional information about them.

 

One of the greatest strengths of LPGs lies in their semantic richness and expressive power. They allow a very granular and detailed representation of data, embodying both the structure of the data and its context in a single model. This naturally aligns with our cognitive model of the world and hence is easy to understand and work with.

Another major advantage of LPGs is their flexibility. Given their schema-free or schema-optional nature, they can easily handle changes in the structure of the data. This makes LPGs an excellent fit for domains with evolving data models.

Labelled Property Graphs: Exploring Their Flexible Schema

The flexible schema of Labelled Property Graphs offers several advantages, enabling organisations to efficiently manage and analyse data:

    1. Dynamic Entity Definition: Unlike traditional rigid schemas, the flexible schema of Labelled Property Graphs allows for dynamic entity definition. Entities can be created and defined on the fly without predefining the entire schema in advance. This flexibility accommodates evolving data requirements and eliminates any need for upfront schema design.

    2. Heterogeneous Data Representation: Labelled Property Graphs support the representation of heterogeneous data, where different nodes and edges can have varying sets of properties. This flexibility enables the inclusion of diverse data types and attributes within the same graph, making it suitable for capturing complex and multifaceted information.

    3. Schema Evolution: With a flexible schema, Labelled Property Graphs facilitate schema evolution over time. As data requirements change or new attributes emerge, the schema can be easily adapted to accommodate these changes. This flexibility simplifies the process of extending or modifying the schema without requiring significant alterations to the existing data.

 

  1. Ad hoc Queries: The flexible schema allows for ad hoc querying and exploration of the data. Without strict constraints imposed by a fixed schema, users can discover patterns, relationships, and insights in a more intuitive manner. This empowers users to ask spontaneous and exploratory queries, facilitating data discovery and analysis.

  2. Contextual Insights: The flexible schema of Labelled Property Graphs enables the incorporation of contextual information through the assignment of labels. Labels provide semantic context and categorisation to nodes and relationships, enhancing the interpretability and relevance of the data. This contextualisation enriches the analysis and supports more targeted and precise querying.

 

Notable LPG databases

 

Database

Vendor

Link

Neo4j

Neo4j

Memgraph DB

Memgraph

Astra DB

Datastax inc.

Azure CosmosDB

Microsoft

 

Use Cases: When to Opt for Labelled Property Graphs

Labeled Property Graphs (LPGs) provide a distinct approach to data modeling compared to RDF. While RDF is widely used for representing linked data, there are specific use cases where LPGs offer significant advantages.

LPGs were originally designed for fast querying and data traversal. This is achieved by the dense key-value structure that can be modelled in a relational setting. When it comes to performance-intensive applications, LPGs have an edge over RDF. RDF relies on triple stores and SPARQL query engines, which can sometimes result in performance bottlenecks, especially when dealing with large-scale graphs. On the other hand, LPGs are designed for efficient graph traversals and can leverage optimised graph database engines for high-performance querying. If your use case involves complex graph analytics, real-time recommendations, or interactive querying with large datasets, LPGs offer better performance and responsiveness.

Example applications within this use case:
  1. E-commerce and Product Catalogs: In the realm of e-commerce, fast search is crucial for providing a seamless shopping experience. LPGs excel in modeling product catalogs, allowing efficient indexing and retrieval of products based on various attributes such as name, category, price, and specifications. By leveraging LPG’s indexing capabilities, e-commerce platforms can deliver lightning-fast search results, ensuring customers can quickly find the products they are looking for.

  2. Content Management Systems: Content-heavy applications such as content management systems (CMS) benefit greatly from fast search capabilities. LPGs can efficiently index and search through vast amounts of content, including articles, documents, multimedia files, and web pages. By leveraging the indexing features of LPGs, CMS platforms can provide quick and accurate search results, enabling users to locate specific pieces of content based on keywords, categories, or metadata.

  3. Social Media and User-generated Content: Social media platforms and applications that handle user-generated content often require fast search to enable real-time interaction and content discovery. LPGs can efficiently index and search through user profiles, posts, comments, and other social media content. By leveraging the indexing and search capabilities of LPGs, social media platforms can provide instant search results, ensuring users can quickly find relevant content, discover new connections, and engage in meaningful interactions.

  4. Fraud Detection and Risk Management: Applications that deal with fraud detection and risk management require real-time search capabilities to identify anomalies and potential risks. LPGs can efficiently index and search through large datasets, allowing for quick identification of suspicious patterns, fraudulent activities, or deviations from normal behavior. By leveraging the fast search capabilities of LPGs, fraud detection systems can expedite the identification and mitigation of risks, enhancing security and minimising potential losses.

Use case: Highly Connected Data

LPGs excel when dealing with highly interconnected data, such as social networks or complex relationship networks. Unlike RDF, which relies on triples and the subject-predicate-object structure, LPGs allow for flexible modelling of relationships by attaching properties to nodes and edges. This property-centric approach simplifies the representation of complex relationships and enables efficient traversal and querying of the graph. Therefore, if your use case involves managing and analysing interconnected data with varying relationship attributes, LPGs provide a more intuitive and effective solution.

Example applications in this use case:
  1. Social Networks: Social networks represent a prime example of highly connected data, with individuals, communities, and relationships forming intricate webs. LPGs are an ideal choice for modeling social network data, as they can capture the relationships between users, their connections, and the properties associated with these connections. The flexibility of LPGs allows for efficient querying and analysis of social network data, enabling tasks such as identifying influential users, detecting communities, and recommending connections.

  2. IoT and Sensor Networks: Internet of Things (IoT) applications generate vast amounts of data from interconnected devices and sensors. LPGs can efficiently model the relationships between devices, sensors, and the properties they generate. This modeling enables effective querying and analysis of sensor data, allowing for tasks such as anomaly detection, predictive maintenance, and situational awareness. LPGs provide a flexible and scalable solution for handling the complexity and interconnectedness inherent in IoT and sensor networks.

  3. Biological and Genetic Networks: In biological research, understanding the relationships between genes, proteins, and biological pathways is critical. LPGs offer a suitable approach for modeling biological and genetic networks, as they can capture complex relationships and attributes associated with biological entities. LPGs enable researchers to explore gene expression patterns, identify regulatory networks, and analyze protein-protein interactions. The ability to represent and traverse highly connected data makes LPGs valuable in biological and genetic research.

  4. Supply Chain and Logistics Networks: Supply chain and logistics networks involve intricate relationships and dependencies between various entities, such as suppliers, distributors, and transportation routes. LPGs provide an effective means of modeling these interconnected networks, enabling efficient querying and analysis. LPGs can support tasks such as optimising supply chain routes, tracking product flows, and identifying bottlenecks or inefficiencies within the network.

Use case: Agile Development and Rapid Prototyping

LPGs offer a more agile development environment compared to RDF. With RDF, modifying the ontology or schema often requires significant effort and can impact the entire data model. In contrast, LPGs provide greater flexibility in data modeling, allowing for easier modifications and iterations without disrupting the existing graph structure. This makes LPGs a preferred choice for scenarios that demand frequent changes, quick prototyping, and iterative development processes.

Example applications within this use case:
    1. Software Development: LPGs are well-suited for agile software development projects, where iterative development and continuous feedback are crucial. Developers can use LPGs to model and represent various aspects of the software, such as user stories, requirements, modules, and their relationships. The flexible schema allows for quick adjustments and modifications as the project evolves. By leveraging the querying capabilities of LPGs, developers can validate application logic, perform data analysis, and ensure alignment with user needs throughout the development process.

    2. User Experience Design: In user experience (UX) design, rapid prototyping is key to iteratively refine and validate user interfaces. LPGs provide a visual representation of the application’s data structure, allowing UX designers to easily map out user flows, screen transitions, and interactions. By visually exploring the graph, designers can identify potential usability issues, make informed design decisions, and quickly iterate on the user interface. LPGs support rapid prototyping by providing a flexible foundation for UX designers to create interactive and dynamic prototypes.

 

  1. Research and Experimental Applications: In research and experimental applications, agile development and rapid prototyping are essential for testing hypotheses and exploring novel ideas. LPGs provide a flexible and expressive data model that accommodates the evolving nature of research projects. Researchers can leverage LPGs to model relationships, capture experimental data, and perform iterative analysis. The ability to refine the data model and query the graph in real-time supports agile experimentation, enabling researchers to iterate quickly and adapt their approaches based on emerging insights.


 

RDF Knowledge Graphs: Overview

Resource Description Framework (RDF) Knowledge Graphs represent the other primary type of graph databases. Offering a standard model for data interchange on the web, RDF has become an integral part of the Semantic Web movement.

The Structure of RDF Knowledge Graphs

At the core of an RDF Knowledge Graph are three fundamental components: subject, predicate, and object, which together form a “triple”. Subjects and objects are entities (or resources), while predicates express the relationships between these entities. Subjects and predicates are identified by URIs (Uniform Resource Identifiers), whereas objects can be either URIs or literals (like a text string or number).

 

RDF knowledge Graphs: Exploring their accompanying Schema

A schema in RDF knowledge graphs serves as a blueprint that defines the structure and semantics of the data. It provides a shared vocabulary and a set of rules that govern how entities are represented and related to each other. The schema helps establish a common understanding of the data within an organisation or across different systems, enabling data interoperability and integration.

    1. Vocabulary Definition: The schema defines a vocabulary or ontology that describes the terms and relationships used in the knowledge graph. It specifies the classes (types of resources) and properties (relationships or attributes) that can be used to represent the data. By adhering to a shared vocabulary, organisations can ensure consistent and meaningful representation of their data.

    2. Data Integrity and Validation: The schema plays a vital role in ensuring data integrity within an RDF knowledge graph. It enables data validation by enforcing constraints and rules on the values and relationships. For example, it can define the expected data types, cardinality, and allowed value ranges for properties. By validating the data against the schema, organisations can maintain data quality and reliability.

 

  1. Interlinking and Inference: RDF knowledge graphs excel at connecting disparate data sources through the use of linked data principles. The schema facilitates the interlinking of resources by defining common properties and relationships. It enables the inference of new knowledge by applying reasoning techniques based on the defined schema. Inference allows for the discovery of implicit relationships and enables more advanced querying and analysis.

  2. Domain Understanding and Data Exploration: The schema provides a high-level view of the domain and its concepts. It helps users understand the structure of the data and the relationships between different entities. By exploring the schema, users can gain insights into the available data and formulate queries to extract meaningful information. The schema acts as a guide, empowering users to navigate the knowledge graph effectively.

 

Notable RDF databases

Database

Vendor

Link

GraphDB

Ontotext

Stardog

Stardog Union

Amazon Neptune

Amazon Web Services

https://aws.amazon.com/neptune/

AllegroGraph

Franz inc.

MarkLogic

Progress

RDFox

Oxford Semantic

 

Closing the gap to LPGs – Addition of RDF*

RDF-star (or RDF*) and the associated query language SPARQL-star (also written as SPARQL*) are the most widely supported extension of the existing standards. RDF* goes beyond the expressivity of Property Graphs in that one can make statements about statements, formally called statement-level annotations. For instance, one can provide a time span for a relationship or attach key-value pairs to relationships. The statement-level annotations enable a more efficient representation of scores, weights, temporal restrictions and provenance information.

 

In the original RDF standard (i.e. RDF1.1) this could only be achieved using suboptimal methods such as re-ification, which involved creating additional statements to represent metadata, leading to unnecessarily complex and convoluted graph structures.

The addition of RDF* has allowed the RDF standard to provides a more fine-grained approach to capturing metadata and additional contextual information. This enables the enrichment of data with annotations that can be used for various purposes, such as describing data quality, trustworthiness, temporal validity, or uncertainty. The ability to attach annotations directly to individual statements empowers users with a more nuanced representation of information.

Use Cases: Where RDF Knowledge Graphs Shine

RDF Knowledge Graphs have been widely adopted in areas where interoperability, standardisation, and semantic richness are paramount. They are extensively used in scholarly data, semantic search engines, digital libraries, bioinformatics, and linked open data projects.

RDF Knowledge Graphs are recognised for their universal interoperability. Since RDF is a standard model developed by the World Wide Web Consortium (W3C), RDF datasets can be readily combined or shared across different applications without loss of meaning. This is particularly important for open data initiatives and for organisations looking to integrate diverse data sources.

Another key advantage of RDF is its strong support for ontology. Through languages such as RDFS (RDF Schema) and OWL (Web Ontology Language), RDF Knowledge Graphs can encode rich semantics and complex hierarchies, providing a way to define explicit, machine-understandable domain models

When it comes to expressing and exploiting ontology, RDF proves to be an incredibly powerful tool. It’s not only capable of describing hierarchical classification (taxonomy) but also complex relationships between entities, and it carries these descriptive capabilities across a universal standard, lending immense interoperability.

RDF Knowledge Graphs, with their standardisation, interoperability, and robust semantics, provide a potent framework for linking and inferring data across an organisation.

Use case: Data Provenance

Statement-level annotations in RDF enable the inclusion of data provenance, which provides information about the origin and history of data within the graph design. Data provenance is significant as the interpretation of the same information can vary based on its history and context, such as the source of the data and how it was processed.

The introduction of statement-level annotations in RDF (via RDF*) simplifies the implementation of data provenance. Consequently, RDF becomes more suitable for applications that necessitate regulatory compliance and auditing. It can be utilised to audit transformations of datasets and evaluate the confidence levels in the data’s validity.

Example applications within this use case:
  1. Data Versioning and Provenance Tracking: RDF databases provide a robust foundation for data versioning and provenance tracking, essential for applications that require tracking changes and maintaining historical context. By associating metadata and timestamps with RDF triples, data practitioners can track the origin, modification history, and lineage of data over time. This provenance information can be invaluable for data quality analysis, auditing, reproducibility, and data governance. RDF databases offer the ability to query and analyse data based on different versions, enabling temporal-aware analysis and understanding of data evolution.

  2. Scientific Research and Reproducibility: In scientific research, data provenance is vital for ensuring reproducibility and transparency. RDF databases provide a semantic representation that allows scientists to capture detailed information about the experimental setup, data collection methods, processing steps, and analysis workflows. By associating provenance metadata with RDF triples, researchers can trace the origin of data, understand the transformations it has undergone, and reproduce experiments reliably. RDF’s flexible structure enables the integration of provenance information with scientific data, enabling comprehensive documentation and sharing of research findings.

  3. Data Governance and Compliance: RDF databases support data governance efforts by capturing data provenance and facilitating compliance with regulatory requirements. Provenance information helps organisations ensure data integrity, traceability, and accountability. By associating metadata, timestamps, and authorship information with RDF triples, organisations can track the lineage of data, monitor data usage, and ensure compliance with data protection regulations. RDF’s standardised representation and support for ontologies enable the definition and enforcement of data governance policies, making it easier to manage and control data throughout its lifecycle.

  4. Data Integration and Trustworthiness: RDF databases excel at integrating heterogeneous data from diverse sources, and provenance information enhances the trustworthiness of integrated datasets. By associating provenance metadata with RDF triples, data integration workflows can capture the source of each ingested piece of data, the transformations applied during integration, and the confidence level associated with the integrated result. This provenance-aware integration enhances data quality assessment, allows users to evaluate the reliability of integrated data, and supports decision-making processes based on trustworthy and traceable information.

  5. Cybersecurity and Intrusion Detection: In the realm of cybersecurity, data provenance plays a critical role in detecting and investigating security incidents. RDF databases can capture provenance information about data sources, data processing steps, and data flow within systems. By associating timestamps, metadata, and access control information with RDF triples, organisations can track the origin of suspicious data, identify potential vulnerabilities, and analyse the propagation of security breaches. Provenance-enabled RDF databases facilitate forensic analysis, incident response, and threat intelligence in the field of cybersecurity.

  6. Data Lineage and Data Quality Analysis: Understanding the lineage of data is essential for assessing and ensuring data quality. RDF databases allow for capturing detailed data lineage information by associating provenance metadata with RDF triples. Data practitioners can trace the source of each piece of data, examine its transformation steps, and identify potential issues or anomalies. This lineage information supports data quality analysis, error detection, and data cleaning processes. RDF’s semantic representation also enables the integration of quality metrics and annotations, providing a comprehensive view of data quality within the database.

Use case: Applications requiring a Temporal Dimension

Statement-level annotations in RDF may be treated not just as literal values but as nodes connected to other nodes in the graph. This opens up the opportunity for representing the dimension of temporal entity events. For example, a node describing an employee can include an annotation with the date of the last task, which is in turn connected to the node representing this task. This allows for the enriching of the semantic description of nodes with temporal relationships, resulting in richer data queries.

By incorporating temporal aspects into a knowledge graph, it becomes possible to capture the temporal dimension of data, such as when facts were true or events occurred. This temporal information can be associated with entities, relationships, or individual statements within the graph. Temporal modelling techniques for knowledge graphs typically involve annotating data with timestamps or intervals to indicate when they were valid (similar to Slowly changing dimensions in relational models). This can be achieved using specialised temporal extensions or by representing time as an explicit dimension within the graph structure.

With temporal modelling in place, time travel queries can be executed to retrieve data as it existed at a specific point in time, or to project data into the future based on certain assumptions. These queries can be formulated to retrieve historical states, track changes over time, or make predictions about future states.

Example applications within this use case:
  1. Historical Data Analysis: RDF databases are well-suited for analysing historical data, where capturing and analysing temporal information is crucial. Whether it’s studying historical trends, analysing long-term patterns, or conducting retrospective analyses, RDF databases excel at representing time-stamped data and capturing the evolution of information over time. By associating timestamps or temporal intervals with RDF triples, data scientists can track changes, perform trend analysis, and gain insights into how data has evolved over different time periods.

  2. Event-based Data Processing: Applications that deal with event-based data often require capturing and processing events in a temporal context. RDF databases can effectively capture event data by associating events with timestamps, durations, or intervals. This temporal modeling allows for efficient event correlation, analysis of event sequences, and detection of temporal patterns or anomalies. Whether it’s analysing IoT sensor data, monitoring system logs, or processing real-time event streams, RDF databases provide a semantic foundation for event-based data processing and analysis.

  3. Temporal Reasoning and Temporal Ontologies: RDF’s flexibility and support for ontologies make it an excellent choice for temporal reasoning and modeling of temporal knowledge. RDF databases can capture temporal ontologies that define temporal concepts, relationships, and reasoning rules. This enables data scientists to perform sophisticated temporal reasoning, such as temporal querying, temporal logic-based inference, and reasoning about temporal constraints. Applications that require temporal context awareness, such as scheduling systems, historical simulations, or temporal planning, can leverage RDF databases to incorporate temporal reasoning into their workflows.

  4. Temporal Data Integration: RDF’s capability to integrate heterogeneous data from various sources is particularly useful when dealing with temporal data integration. RDF databases allow for seamless integration of data with different temporal granularities, resolutions, or representations. By mapping temporal data from different sources into RDF triples and applying temporal reasoning, data scientists can harmonise temporal data, resolve inconsistencies, and perform temporal integration across diverse datasets. This integration can be essential in applications that require combining temporal data from multiple domains, such as environmental monitoring, epidemiology, or financial analysis.

Use case: Applications with Cross-Domain Knowledge

Many advanced data-intensive applications like ‘smart’ homes need to integrate data from different domains and sources and make real-time inferences based on this data. The same capabilities are required by many IoT applications used in smart urban environments and various industrial settings.

RDF offers many advantages for such kinds of applications. In particular, RDF Graphs are more suitable than LPGs for modelling ontologies, a set of properties, relations, and categories that represent a specific domain or subject of the ‘smart’ component. It’s simpler to model ontologies thanks to its atomic decomposition of subjects and relations, global uniqueness of nodes and edges, and built-in shareability of data.

Ontologies implemented via RDF enable cross-domain data sharing, data interoperability, expert systems, and the domain-based inference required by many IoT applications.  In contrast, the arbitrary data design used in LPG makes it harder to implement ontologies in LPG-based graphs, thus making the LPG architecture less suitable for cross-domain data sharing and domain-based inference.

Example applications within this use case:
  1. Semantic Search and Recommendation Systems: RDF databases excel in applications that require semantic search and recommendation systems. By representing data using RDF triples, RDF databases enable the integration of diverse data sources and knowledge domains. This integration empowers search engines and recommendation systems to provide more precise and context-aware results. RDF’s semantic model facilitates better understanding of user queries, matching them with relevant entities, relationships, and context from various domains. This cross-domain knowledge enhances search accuracy, semantic relevance, and recommendation quality.

  2. Data Analytics and Insights: RDF databases facilitate data analytics and insights by providing a semantic framework for integrating and analysing data from multiple domains. By capturing relationships and context using RDF triples, organisations can gain a comprehensive understanding of complex datasets. RDF’s flexible structure allows for the incorporation of diverse data sources, including structured data, unstructured text, multimedia, and sensor data. Integrating cross-domain knowledge enables advanced analytics, pattern recognition, and data-driven insights that transcend individual domains, leading to more informed decision-making and innovation.

  3. Interdisciplinary Research: RDF databases are particularly valuable in interdisciplinary research, where insights from multiple domains need to be combined and analysed. By representing knowledge using RDF triples, researchers can capture and integrate diverse information, methodologies, and perspectives from different disciplines. RDF’s semantic model enables the representation of relationships, dependencies, and interdisciplinary connections, fostering collaboration and knowledge sharing across domains. RDF databases facilitate interdisciplinary research by providing a shared knowledge representation that transcends disciplinary boundaries.

  4. Contextualised Data Integration: Applications that require contextualised data integration can benefit greatly from RDF databases. RDF’s flexible data model allows for the representation of contextual information alongside data from multiple domains. By associating metadata, contextual attributes, and provenance information with RDF triples, applications can capture the context in which data was generated, the purpose it serves, and the relationships it has with other data. This contextualisation enhances the meaning and relevance of integrated data, providing a more comprehensive view and facilitating informed analysis and decision-making.

Use case: Data Science and Analytics

RDF graphs provide a powerful and flexible approach to data modelling that offers several benefits for data science and analytics. When compared to Labeled Property Graphs (LPGs), RDF graphs bring unique advantages that make them particularly suitable for certain data-intensive use cases.

Example applications within this use case:
    1. Standardised Semantic Representation: RDF follows a standardised semantic data model, which means that data in RDF graphs can be easily understood and interpreted by different systems and tools. This semantic representation enables seamless data integration from diverse sources, allowing data scientists to merge and analyse data from various domains and platforms. RDF’s standardised structure also facilitates interoperability and data exchange, making it easier to collaborate and share data with other researchers and organisations.

    2. Knowledge Representation and Inference: RDF’s ability to represent knowledge using ontologies and semantic models brings significant advantages as they can capture domain-specific knowledge, relationships, and semantic hierarchies. This knowledge representation facilitates advanced reasoning and inference, enabling data scientists to derive new insights, make logical deductions, and discover implicit relationships in the data. RDF’s foundation in formal mathematical logic provides a robust framework for performing complex queries and reasoning tasks.

 

  1. Linked Data Integration: RDF’s core principle of linking data through URIs (Uniform Resource Identifiers) makes it highly compatible with the Linked Data concept. Linked Data allows for the integration and interlinking of data across different sources, creating a vast web of interconnected information. For data scientists and analysts, this means access to a wealth of diverse and interlinked data that can be leveraged to enrich their analyses, gain a broader context, and discover new patterns and relationships across an organisation

  2. SPARQL Querying and Reasoning Capabilities: SPARQL allows for expressive querying and advanced graph pattern matching. RDF’s inherent support for reasoning and inference also enhances the query capabilities, enabling data scientists to ask more complex questions, explore data relationships, and perform advanced analytics tasks.

 

Key Differences: RDF Knowledge Graphs and Labelled Property Graphs

Feature

RDF

Property Graph

Expressivity

Arbitrary complex descriptions via links to other nodes; no properties on edges With RDF* the model gets much more expressive than LPG

Limited expressivity, beyond the basic directed cyclic labeled graph
Properties (key-value pairs) for nodes and edges balance between complexity and utility

Formal semantics

Yes, standard schema and model semantics foster data reuse and inference

No formal model representation

Standardisation

Driven by W3C working groups and standardisation processes

Different competing vendors

Query language

SPARQL specifications: Query Language, Updates, Federation, Protocol (end-point)…

Cypher, PGQL, GCore, GQL (no standard)

Serialisation format

Multiple serialisation formats

No serialisation format

Schema language

RDFS, OWL, Shapes

None

Designed for

Linked Open Data (Semantic Web): Publishing and linking data with formal semantics and no central control

Graph representation for analytics

Processing Strengths

Set analysis operations (as in SQL, but with schema abstraction and flexibility)

Graph traversal
Plenty of graph analytics and ML libraries

Data Management Strengths

Interoperability via global identifiers Interoperability via a standard: schema language, protocol for federation, reasoning semantics

Data validation, data type support, multilinguality

Compact serialisation, shorter learning curve, functional graph traversal language (Gremlin)

Main use cases

  • Data-driven architecture

  • Master/reference data sharing in enterprises

  • Кnowledge representation

  • Data integration

  • Metadata management

  • Graph analytics

  • Highly performant path search

  • Real time data updates

 

Graph Query Languages: SPARQL vs. Cypher

SPARQL

SPARQL (pronounced “sparkle”) is a declarative graph query language that was developed by the World Wide Web Consortium (W3C) as a standard for querying RDF (Resource Description Framework) data. RDF is a data model that is used to represent information on the web, making SPARQL particularly well-suited for querying linked data and semantic web applications. SPARQL is supported by a variety of RDF stores, such as Virtuoso, Stardog, and GraphDB, as well as some graph databases like Neo4j and Amazon Neptune.

Cypher

Cypher is a declarative graph query language developed by Neo4j, one of the leading graph database platforms. It is designed to be expressive and easy to read, with a syntax that closely resembles natural language. This makes it a great choice for developers who are new to graph databases, as well as for those who prefer a more human-readable query language. Cypher is optimised for querying Neo4j databases, but it has also been adopted by other graph database platforms, such as SAP HANA Graph and RedisGraph.

The table below contains an informal mapping of the SPARQL and Cypher constructs.

Cypher

SPARQL

CREATE

INSERT

RETURN

SELECT

WITH

SELECT in a sub query

MATCH

WHERE

WHERE

FILTER

:label

owl:Class or rdfs:Class

[edge]

Predicate <edge> or
Tripel << <subject><edge> >>

(node)

a graph node (blank or IRI)

var:Person

?var a :Person

var.name

?name assuming a match ?var <name> ?name

nodeVar: label {key value, key2 value2}

?nodeVar a label: <key> value;key2 value2.

(ee)-[:KNOWS {since: 2001}]->(js)

<< ?ee <knows> <js> >> <since>2001

A quick look at Google trends shows a striking dominance of searches for Cypher rather than SPARQL, illustrating the level of market share currently held by Neo4j

 

Bloor Research’s Graph Database Market Update 2023

Bloor would argue that the market leaders in this space continue to be Neo4J and OntoText (GraphDB), which are graph and RDF database providers respectively. However, www.db-engines.com suggests that MarkLogic is the leader in the RDF space. This is a question of definition: GraphDB is a pure-play RDF database with multi-model capabilities while MarkLogic is a multi-model database with an underlying XML engine that offers RDF capabilities. In any case, they are both leading vendors in this space, along with Amazon Neptune.

Full report below:

To find out more about bigspark fill out your details below, and we will be in touch.