Reification Done Right

From Blazegraph
Jump to: navigation, search

RDR

RDF syntax does not permit a Statement to be the Subject or Object of another statement. However, RDF Reification provides a syntax to create a model of an RDF statement and then allows statements to be made about that model. The work by Olaf Hartig and Bryan Thompson formalizes an extension to both RDF (known as RDF*) and SPARQL (known as SPARQL*). These extensions define a backwards compatible relationship between the RDF data model and the SPARQL query language, and an alternative perspective on RDF Reification. The RDF* and SPARQL* models are introduced and formally described in Foundations of an Alternative Approach to Reification in RDF. The key contributions of this paper are:

  • Formal extensions of the RDF data model and the SPARQL algebra that reconciles RDF Reification with statement-level metadata;
  • An extended syntax for TURTLE that permits easy interchange of statements about statements.
  • An extended syntax for SPARQL that make it easy to express queries and data for statements about statements.
  • Rewrite rules that may be used to translate RDF* into RDF and SPARQL* into SPARQL.

RDF* and SPARQL* allow statements to appear as Subjects or Objects in other statements. Statements about these “inline” statements can be interpreted as if they were statements about statements. The paper shows that this is equivalent to statements about reified RDF statement models. For example, the following statements declare a name for some resource “:bob”, an age for :bob, and provide assertions about how and where that age was obtained:

Contents of a file (rdr_test.ttl):

@prefix : <http://bigdata.com> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dct:  <http://purl.org/dc/elements/1.1/> .

:bob foaf:name "Bob" .
<<:bob foaf:age 23>> dct:creator <http://example.com/crawlers#c1> ;
                     dct:source <http://example.net/homepage-listing.html> .

Loaded into a local Blazegraph instance running at localhost:9999. You must set the content-type header in the CURL request or select Turtle-RDR if using the workbench.

curl -D -H 'Content-Type: application/x-turtle-RDR' --upload-file rdr_test.ttl -X POST 'http://localhost:9999/bigdata/sparql' 

and then queried using:

PREFIX : <http://bigdata.com>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dct:  <http://purl.org/dc/elements/1.1/>

SELECT ?age ?src WHERE {
   ?bob foaf:name "Bob" .
   <<?bob foaf:age ?age>> dct:source ?src .
}

In both cases the
<< >>
notation denotes a statement appearing as the Subject or Object of another statement. Further, statements may become bound to variables as shown in this alternative syntax:
SELECT ?age ?src WHERE {
   ?bob foaf:name "Bob" .
   BIND( <<?bob foaf:age ?age>> AS ?t ) .
   ?t dct:source ?src .
}

The paper proves that these examples are equivalent using RDF Reification. RDF Reification already gives us a mechanism to represent, interchange, and query statements about statements. However, the paper also shows that statements about statements may be modeled and queried within the database in a wide variety of different physical schemas that allow great efficiency and data density when compared to naive indexing of RDF statement models. This gives database designers enormous freedom in how they choose to represent those statements about statements and helps to counter the impression that RDF databases are necessarily bad for problems requiring link attributes. For example, any of the following physical schema could be used to represent these statements about statements:

  • Explicitly model the statements about statements as reified RDF statement models;
  • Associating a “statement identifier” with each statement in the database and then using it to represent statements about statements;
  • Directly embed the statement “:bob foaf:age 23″ into the representation of each statement about that statement (inlining within the statement indices using variable length and recursively embedded encodings of the Subject and Object of a statement); and
  • Extending the (s,p,o) table to include additional columns, in this case dct:creator and dct:source. This can be an advantage when some metadata predicate has a maximum cardinality of one and is used for most statements in the database (for example, this could be used to create an efficient bi-temporal database with statement-level metadata columns for the business-start-time, business-end-time, and transaction-time for each assertion).

By clarifying the formal semantics of RDF Reification and offering a simplified syntax for data interchange, query, and update, database designers and database users can now more easily and confidentially model domains that require statement level metadata. There is a long list of such domains, including domains that model events, domains that require link attributes, sparse matrices, the property graph model, etc.

When to use RDR

Today, bigdata implements RDR support by inlining statements into statements within the statement indices. This approach is very efficient if you have a small number of statements (<5) about each RDF statement in your dataset. However, the efficiency of the RDR storage model decreases in comparison to storing reified RDF statement models as plain triples as the number of statements about each statement continues to grow. This is because RDF Reified statement models are essentially a fully normalized form. If you have large numbers of statements about each statement, then a plain old RDF reification is more space efficient even if it is not as time efficient.

If you can meaningfully associate large groups of statements with the same metadata statements, then the quads storage model can be a good choice. A good example of this is, if you have manage ACLs for all triples from a specific source.

If fact, quads could be modeled as RDR statements that place other statements into containers. SPARQL named graphs can be seen as a convenience for this.

Eventually we would like to hide the decision about how the data are organized internally from the user.

Enabling RDR

The default namespace ("kb") will be in one of three modes: Triples, RDR, or Quads. You can determine the actual mode by examining the service description of the namespace using the NAMESPACES tab of the workbench. It will be one of the following values. The "Sids" mode (aka statement identifiers) corresponds to the support for RDR with triples.

	<feature xmlns="http://www.w3.org/ns/sparql-service-description#" rdf:resource="http://www.bigdata.com/rdf#/features/KB/Mode/Triples"/>
	<feature xmlns="http://www.w3.org/ns/sparql-service-description#" rdf:resource="http://www.bigdata.com/rdf#/features/KB/Mode/Quads"/>
	<feature xmlns="http://www.w3.org/ns/sparql-service-description#" rdf:resource="http://www.bigdata.com/rdf#/features/KB/Mode/Sids"/>

You can also examine the properties for the namespace using the workbench. If the RDR support is enabled, you will see:

       com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers	 = true

Workbench

Create a new namespace using the namespaces tab. Specify the mode as "RDR" rather than triples or quads. The new namespace will support the RDR* / SPARQL* extension.

Properties file

Bigdata supports RDF* and SPARQL* for the efficient interchange, query, and update of statements about statements. Today, this is enabled through the “SIDS” option

   com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=true

This enables the historical mechanism for efficient statements about statements in bigdata. In the future, we plan to add support for RDF* and SPARQL* to the quads mode of the platform as well. This will allow statement level metadata to co-exist seamlessly with the named graphs model.

Data Interchange

Bigdata automatically converts reified statements models when loading data in the RDR mode. So, you can load reified RDF data and then query it using SPARQL* (the <<>> extension). In addition, you can load data using the following RDR (aka RDR*) aware data interchange formats.

RDF Data Interchange

Bigdata supports the following RDR specific extensions. These are also declared at NanoSparqlServer#RDF_data

MIME Type File extension Charset Name URL Comments
application/x-n-triples-RDR .ntx US-ASCII N-Triples-RDR http://www.w3.org/TR/rdf-testcases/#ntriples The RDR extension supports the << >> syntax (parser only)
application/x-turtle-RDR .ttlx UTF-8 Turtle-RDR http://www.bigdata.com/whitepapers/reifSPARQL.pdf The RDR extension supports the << >> syntax (parser only)
application/sparql-results+json, application/json .srk, .json UTF-8 Bigdata JSON interchange for RDF/RDF* N/A (see example below) bigdata json interchange supports RDF RDR data and also SPARQL result sets.

CONSTRUCT/DESCRIBE

You can also use a CONSTRUCT or DESCRIBE in the RDR mode to obtain statements and statements about statements. This example uses curl, but you can do the same thing using the bigdata workbench.

Given:


@prefix x:         <http://example/> .

# simple RDF statement.
<x:a1> <x:b1> <x:c1> .

# simple RDF* statement
<<<x:a> <x:b> <x:c>>> <x:d> <x:e> .

The CONSTRUCT query:

curl -X POST http://localhost:8080/bigdata/namespace/RDR/sparql --data-urlencode 'query=CONSTRUCT  WHERE { ?s ?p ?o }' -H 'Accept:application/json'

will produce the following json data:

{
  "head" : {
    "vars" : [ "subject", "predicate", "object", "context" ]
  },
  "results" : {
    "bindings" : [ {
      "subject" : {
        "type" : "sid",
        "subject" : {
          "type" : "uri",
          "value" : "x:a"
        },
        "predicate" : {
          "type" : "uri",
          "value" : "x:b"
        },
        "object" : {
          "type" : "uri",
          "value" : "x:c"
        }
      },
      "predicate" : {
        "type" : "uri",
        "value" : "x:d"
      },
      "object" : {
        "type" : "uri",
        "value" : "x:e"
      }
    }, {
      "subject" : {
        "type" : "uri",
        "value" : "x:a1"
      },
      "predicate" : {
        "type" : "uri",
        "value" : "x:b1"
      },
      "object" : {
        "type" : "uri",
        "value" : "x:c1"
      }
    }, {
      "subject" : {
        "type" : "uri",
        "value" : "x:a"
      },
      "predicate" : {
        "type" : "uri",
        "value" : "x:b"
      },
      "object" : {
        "type" : "uri",
        "value" : "x:c"
      }
    } ]
  }
}

SPARQL Result Sets

Bigdata supports the following RDR specific extensions for SPARQL result sets. These are also defined at NanoSparqlServer#SPARQL_Result_Sets

MIME Type Name URL Comments
application/sparql-results+json, application/json SPARQL Query Results JSON Format http://www.w3.org/TR/rdf-sparql-json-res/ The bigdata extension allows the interchange of RDR data in result sets as well.

References