Deep dive into JSON-LD IRI IDs in TerminusDB

Sep 18, 20239 min read

TerminusDB is an RDF triple store for JSON-LD documents with a closed world approach and a strong schema.

RDF, the Resource Description Framework, is a foundational part of the Semantic Web, which organises contents as triples: subject, predicates and object, information and relationships between identifiers, often about real-world phenomena.

TerminusDB for content

What this RDF engine does beautifully is to provide a bridge between triples and the organisation of them into JSON-LD documents through its enterprise-oriented reference dat model.

TerminusDB and it’s robust datalog query engine is built with Rust and Prolog, offering a unique blend of performance and sophistication.

All content comes with IDs

To work effectively with RDF in TerminusDB, you need to master it’s way of constructing and using Internationalized Uniform Resource Identifiers, IRIs, for addressing content and triples.

You can view these IRIs as unique identifiers for specific JSON-LD documents (a specific JSON format for Linked Data), and for addressing layered parts of these documents, called subdocuments.

The TerminusDB schema checking engine offers a schema language to control each object layer of a JSON-LD document, as classes, enumerations, foreign typed identifiers and what is called tagged unions.

In JSON-LD, each object layer requires a specific @type, and a unique ID for the collection of keys (triples) that are present in that keyed object.

From JSON-LD to TerminusDB content

A lot of the world’s JSON-LD is built using schema definitions from schema.org. It’s used to make content on the web more accessible, for fellow humans, but mostly for machines. It enables inferrence of connections between such resource identifiers URIs and IRIs across the web. And, people use these JSON-LD expressions for making their websites known through Semantic SEO, Search Engine Optimisation.

As noted, most JSON-LD follows schema.org conventions, but are usually only with loose control that all content is accurate. The JSON-LD is all too often an afterthought from a knowledge graph perspective, as it is not stored or generated from an RDF triple store.

With TerminusDB, all the parts of the JSON-LD can be connected together as a well-controlled RDF triple store, with strong rules for how it all fits together.

It should be noted that TerminusDB uses its own enterprise-oriented digital-twin-style meta model instead of OWL and SHACL for performance and accuracy reasons.

When using TerminusDB as a Content Management System, you can even co-locate the semantic descriptions of your content, all JSON-LD relationships, as typed accurate data. And even as a homogenous reference model for the enterprise, subdivided into data products if needed.

But let’s get back to the point of resource identifiers, IRI IDs.

RDF IRI IDs in TerminusDB

Now, armed with the necessary background into what the IRIs used as IDs in TerminusDB are, and how they connect with RDF, let’s explore them in more detail.

Prefixes in JSON-LD

RDF IRIs can have parts of them abbreviated into prefixes. A long IRI (looks like a web address) can be decomposed. A schema ID like https://dfrnt.com/schema/Entity can be abbreviated as @schema:Entity.

{
  "@context": {
    "@schema": "https://dfrnt.com/schema/"
  },
  "@type": "Person",
  "@schema:name": "John Doe"
}

The above is an example JSON-LD where the @schema prefix is defined to abbreviate an IRI, such definitions are always located in accompanying @context sections of JSON-LD documents.

The @schema:name then, fully expanded, corresponds to https://dfrnt.com/schema/name.

Prefixes in TerminusDB

Schema elements such as classes in TerminusDB have a default prefix @schema, an abbreviation for terminusdb:///schema#

All schema have this @schema prefix unless you change it using the TerminusDB schema context document of the schema object.

JSON-LD documents returned by the system follows some assumptions. If you are not used to the world of JSON-LD, you easily miss out on the foundation, and that TerminusDB fully leverages RDF at the core, but hides the details until you need them.

To recap, if you don’t specify an @schema prefix when working with types and properties, the system assumes that you use the out of the box @schema prefix.

You may have noted that the schema IRI prefix contains a fragment identifier. When schema classes are created, they are JSON-LD documents in the schema graph. Each are a part of the overall schema (document). This is why they are each references using the fragment indicator of their IRI.

Adjusting the @base prefix

Both the context object and classes in TerminusDB have a special key, the @base key that enables IDs to not follow regular conventions.

Changing the @base key has implications and the semantic meaning may become unclear to the processing engine. Ensure well-tested scenarios to avoid snags.

For document instances, TerminusDB uses the @base prefix first from the context, then from the class, in constructing the specific prefix to use for creating instance IDs stored in the graph. It is normally defined just as the @base in the context object and references terminusdb:///data/

When the @base prefix of a Class needs to be changed into something else due to existing content IDs, or for other reasons, it is possible to adjust the schema-based suffix to the prefix for content ID creation with a custom string or replace the full IRI used. Existing IDs @base prefixes for existing documents will not be changed, which is important to know.

TerminusDB does not use the prefix to locate the documents of a type, it uses the @type parameter of relevant triples to know which document instances (actually triples in the RDF triple store) you are referring to.

The two levels of @base

The context has one level of @base context, and each class have their own. If the class-level @base is defined and is a full IRI, it replaces the context @base IRI, if it is just a string, it replaces the Class name in the resulting IRI ID of the document. See the TerminusDB schema documentation for the full details.

Assuming we have not adjusted @base at all, and we would have a type City with a property ”name”, then the instance document for London could be set as terminusdb://data/City/London

If we set the @base in the class to Destination, the resulting ID will be interpreted as terminusdb://data/Destination/London

This document instance ID set at creation time and can’t be changed.

When retrieving documents using the document interface, the id will be shortened into an ID of City/London or Destination/London in the response, depending on if using the default or changed @base in the class. The full prefix is not retrieved in the REST-ful document interface.

Do note that the GraphQL interface does expose full IRIs, since it does not expose a way to retrieve the contextual information.

Generation of specific ID segments

We used City/London as the type/key combination in the previous section, where City was the type, and London the key for the document’s ID segment of the IRI. The London ID segment was specifically chosen to be supplied by us when constructing the JSON document ID, and we set it ourselves in the @id key.

We could also let TerminusDB pick a key for us, using a key generation strategy. There are a few different ones, and DFRNT defaults to a random key generation strategy for documents to keep things simple and unique. Other keying strategies includes the Lexical strategy, in which one or more keys are used together to build a unique key. Read more about keying strategies in the TerminusDB schema documentation.

Changing prefixes in TerminusDB

When working with the data product schema in TerminusDB, we are offered full schema context flexility. We can change the schema and instance prefixes, and even create our own prefixes for a data product which spans IRI prefixes so that our schema can be dfrnt:Person if we use a specific prefix, such as dfrnt: for https://dfrnt.com/data/

The core idea in TerminusDB is that a data product is authoritative for a specific space in the Semantic Web. The reasoning engine follows a closed world approach, which means that if the ID is not returned by a query, and the instance prefix is recorded for a particular IRI, it means that ID does not exist.

RDF restrictions on the JSON-LD IRI ID

A TerminusDB document ID section, the part following it’s prefix, and type; is limited to a specific segment between slashes or the last segment after a slash (/). You may not include a slash unless it is percent-encoded in the segment (%2F), and thus not used as a slash.

The document ID part is a resource named by its segment and fully follows IRI semantics, with full support for UTF-8 encoding, allowing both schema types and document IDs to include the full Unicode set (following IRI rules). This is really cool, you can see it in action in the DFRNT Data Product Builder too!

Full Unicode for types and document IDs

Being able to use full UTF-8 for both types and document IDs has some interesting implications, allowing types and document IDs to use emojis.

As an example, 📄🎉/🏠♨️ for a house warming invitation (invitation to party/house warm). Maybe not a real-life example, but this ability opens up tremendous opportunity for local communities expressing types and content in local languages, an important aspect for many areas encoding artifacts and data.

Conclusions on TerminusDB IRI IDs

The TerminusDB IRI model for RDF content is powerful and offers a simple way to engage with RDF content and gradually grow into advanced use of JSON-LD and the RDF underpinnings.

You don’t need to know all the details to get started as the schema engine has clear error messages as to what is wrong and the TerminusDB documentation helpful to clarify how things work.

About me

The details presented in this article comes from lessons learned with years spent making the TerminusDB data modelling experience as painless as possible in the DFRNT data modelling tool.

The DFRNT Data Product Builder is built around the TerminusDB engine and helps you build portable graph data products with strong schema for JSON-LD documents. Data products with reference data that you can use in cloud, on your laptop and as data products to power your startup or the enterprise.

The author, Philippe Höij, is the founder of DFRNT and the creator of the data product builder.