Linked Data

From Blazegraph
Jump to: navigation, search

Linked data depends on simple access patterns in which a GET of a resource will return a (machine readable) representation of that resource as RDF. This page documents linked data features in the bigdata platform.

VoID support

The SPARQL 1.1 Service Description response includes the VoID description of the default graph and may be configured to provide a VoID description of each named graph. This behavior is controlled through web.xml or com.bigdata.rdf.sail.webapp.ConfigParams.

DESCRIBE

A SPARQL "DESCRIBE" query provides a flexible mechanism for requesting resource descriptions. However, the SPARQL specifications do not specify the precise semantics of DESCRIBE. Therefore we have implemented several alternatives, as different semantics may be more appropriate depending on the application.

Describe Modes

We have added support for several DESCRIBE algorithms.

This feature will be available in 1.2.3 and is already in the development branch.

ForwardOneStep

The DESCRIBE is just the attributes and forward links.

SymmetricOneStep

The DESCRIBE is the attributes, the forward links, and the reverse links. This is the historical behavior for bigdata.

CBD

The DESCRIBE is the Concise Bounded Description as defined by [1].

SCBD

The DESCRIBE is the Symmetric Concise Bounded Description, as defined by [1].

Describe Configuration

The default behavior is specified in the QueryHints interface and can be overridden using the configuration parameters declared by BigdataSail.Options.

  • BigdataSail.Options.DESCRIBE_MODE - specifies the default DESCRIBE algorithm for a KB.
  • BigdataSail.Options.DESCRIBE_ITERATION_LIMIT - specifies the default limit on the #of iterations for iterative DESCRIBE algorithms (CBD, SCBD).
  • BigdataSail.Options.DESCRIBE_STATEMENT_LIMIT - specifies the default limit on the #of statements in the resource description for iterative DESCRIBE algorithms (CBD, SCBD).

Query Hints

In addition to configuration of the default behavior using BigdataSail.Options, it is possible to override the DESCRIBE behavior on a per-query basis using query hints.

hint:describeMode

The default DESCRIBE behavior is SymmetricOneStep, which is what bigdata has always implemented. You can now override this default when the KB is configured. For example, the following property will make Symmetric Concise Bounded Description the default DESCRIBE algorithm for a KB:

com.bigdata.rdf.sail.describeMode=SCBD

You can also specify the DESCRIBE algorithm as a query hint:

DESCRIBE <http://example.com/aReallyGreatBook>
{
   hint:Query hint:describeMode "SCBD"
}

The advantage of Symmetric Concise Bounded Description over the SymmetricOneStep is that blank nodes are always expanded to include their representation. This is an important advantage in a Linked Data world because you cannot query a blank node using SPARQL.

hint:describeIterationLimit

This query hint provides an optional constraint on the maximum #of iterative expansions that will be performed for DESCRIBE algorithms. The DESCRIBE limits are ANDed together. Therefore, if both the iteration limit and the statements limit are specified, then both limits must be met before the DESCRIBE query will be cutoff.

hint:describeStatementLimit

This query hint provides an optional constraint on the maximum #of statements that may be present in a DESCRIBE query response and is only applied for iterative DESCRIBE algorithms. The DESCRIBE limits are ANDed together. Therefore, if both the iteration limit and the statements limit are specified, then both limits must be met before the DESCRIBE query will be cutoff.

Linked Data Cache

This feature is under development. It will provide a maintained cache with invalidation for DESCRIBE queries. The goal is to provide an amortized O(1) cost for linked data requests. This will be a tremendous performance boost for linked data applications. The cache will maintain descriptions so you only pay for the cost of the DESCRIBE query once, and it is integrated with the change log listener so cache entries are invalidated automatically as the data set is updated.