SPARQL Update

From Blazegraph
Jump to: navigation, search

Availability

Bigdata supports the full SPARQL 1.1 Update specification since r6172 and in all releases after 1.1.

The SPARQL UPDATE extensions described on this page are available in bigdata release 1.2.3 (support is only available for the unisolated connection in that release).

Bigdata Extensions

The twin drawbacks of SPARQL 1.1 UPDATE are that (a) they do not let you store solution sets, so you have to do the work of rejoining triples unless your goal is a CONSTRUCT; and (b) you can not query the stored data from within an update sequence. We have introduced some small but very powerful extensions to SPARQL UPDATE that address these problems.

Named Solution Sets

Bigdata already supports the Anzo SPARQL extension for NamedSubquery and the concept of a "named solution set". As originally formulated, named solution sets let you compute and cache a Sub-Select result, which could then be reused (via INCLUDE %namedSet) at multiple locations within a query. The NamedSubquery syntax is thus really a short hard which tells the query plan generator that some SubSelect is being used at multiple locations within the query. The NamedSubquery directs the query plan generator to explicitly lift out the SubSelect and run it before the main WHERE clause in the query.

It is important to realize that much of the work of the database is join processing. Once you build up a complex solution set, you can reuse it over and over again. For example, when slicing the solutions in order to provide a paged view in a Web UI. What we have done is extend the SPARQL UPDATE syntax in a few simple ways to provide operations on solutions as well as graphs. This is a simple but powerful concept. Now you can use SPARQL UPDATE to save not only graphs but solution sets as well. You can make those solution sets durable or transient (cache).

For example, let's assume that you have pre-computed a solution set using a SPARQL UPDATE. Once you have created that named solution set, it can be used in cases where you want to page through a solution set:

SELECT ... { INCLUDE %solutionSet1 } OFFSET    0 LIMIT 1000
SELECT ... { INCLUDE %solutionSet1 } OFFSET 1000 LIMIT 1000
SELECT ... { INCLUDE %solutionSet1 } OFFSET 2000 LIMIT 1000

Or when you want to produce different aggregations or ORDER BYs for the same solution set:

SELECT ... { INCLUDE %solutionSet1 } ORDER BY ASC(?x)

or

SELECT ... { INCLUDE %solutionSet1 } GROUP BY ?x

Solution Set Update

DELETE/INSERT

The top-level grammar production is identical to SPARQL update.

( WITH IRIref )?
( ( DeleteClause InsertClause? ) | InsertClause )
( USING ( NAMED )? IRIref )*
WHERE GroupGraphPattern

The DeleteClause and InsertClause have been extended to support named solution sets.

DeleteClause ::= DELETE  ( QuadPattern | FROM %VARNAME SelectClause )
InsertClause ::= INSERT  ( QuadPattern | INTO %VARNAME SelectClause )

The FROM and INTO clauses are covered below. They specify the target named solution set. The SelectClause specifies the projection of the WHERE clause which will be inserted into or removed from the named solution set.

As described on the NamedSubquery page, you can use the INCLUDE keyword to join a named solution set within any GroupGraphPattern.

For example:

SELECT ?x ?o
WHERE {
  ?x rdfs:label ?o
  INCLUDE %namedSet1 
}

INSERT INTO

This extension allows you to add solutions into a named solution set. This is conceptually very similar to INSERT INTO a named graph. However, the projection of the query is being saved, not triples or quads constructed from the INSERT clause. This offers tremendous advantages. You can do all the work to create a complex solution set once, and then reuse it efficiently across multiple queries.

The INSERT INTO syntax looks like this:

INSERT INTO %solutionSet1
SELECT ?product ?reviewer
WHERE {
          ?product a bsbm-inst:ProductType1 .
          ?review bsbm:reviewFor ?product ;
                  rev:reviewer ?reviewer .
          ?reviewer bsbm:country ?country .
}

INSERT INTO with ORDER BY

The SPARQL INSERT INTO syntax does not allow an ORDER BY clause to be associated with the top-level WHERE clause. Therefore, to order the solutions before inserting them into the named solution set, you need to push down a sub-SELECT and apply the ORDER BY clause to that sub-SELECT. This will allow you to create a named solution set with a desired ordering which can then be sliced using a LIMIT and OFFSET. Please see the example below.

            PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            PREFIX foaf: <http://xmlns.com/foaf/0.1/>
            INSERT INTO %namedSet1
            SELECT ?x ?name
            WHERE {
                SELECT ?x ?name
                WHERE {
                ?x rdf:type foaf:Person .
                ?x rdfs:label ?name .
                }
                ORDER BY ?name # ORDER BY applied to sub-SELECT
            }

DELETE FROM

The DELETE FROM syntax allows you to remove some solutions from an existing named solution set. Again, the syntax is straight forward:

DELETE FROM %solutionSet1
SELECT ?product ?reviewer
WHERE {
          INCLUDE %solutionSet1 .
          FILTER (sameTerm(?product,<http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromProducer1092/Product53999>))
}

This UPDATE operation removes a specific product from the named solution set. It works by joining against the named solution set and filtering it for the specified product. The matching solutions are projected from the WHERE clause and then removed from the named solution set.

Solution Set Management

CREATE SOLUTIONS

Create a named solution set. The SILENT keyword is supported and will suppress an error if the named solution set already exists.

Named solution sets may be created implicitly using the INSERT INTO ... SELECT syntax. The CREATE SOLUTIONS syntax gives you control over the life cycle of the named solution set.

CREATE ( SILENT )? (GRAPH IRIref | SOLUTIONS %VARNAME ( QuadData )? )

Where the optional QuadData contains ground statements, which follow the general pattern for QueryHints. Currently (2013) no solution hints are supported for solution sets.

For example:

CREATE SOLUTIONS %solutionSet1 {
   # query hints 
}

will create a solution set named "solutionSet1".

DROP SOLUTIONS

Explicitly drop the named solution set. It is an error if the named solution set does not exist. The SILENT keyword is supported and will suppress an error if the named solution set does not exist.

DROP ( SILENT )? (GRAPH IRIref | DEFAULT | NAMED | ALL | GRAPHS | SOLUTIONS | SOLUTIONS %VARNAME)

For example:

DROP SOLUTIONS %solutionSet1

will drop the named solution set "solutionSet1". This can be used to drop a solution set cache before its expire age or to drop a persistent solution set.

and

DROP SOLUTIONS

will drop ALL named solution sets.

While

DROP GRAPHS

will drop all graphs without dropping the named solution sets.

CLEAR SOLUTIONS

Clear the named solution set (the solution set will be empty as a post-condition). It is an error if the named solution set does not exist. The SILENT keyword is supported and will suppress an error if the named solution set does not exist.

CLEAR ( SILENT )? (GRAPH IRIref | DEFAULT | NAMED | ALL | GRAPHS | SOLUTIONS | SOLUTIONS %VARNAME)

For example:

CLEAR SOLUTIONS %solutionSet1

will clear all solutions in the named solution set "solutionSet1".

and

CLEAR SOLUTIONS

Will clear ALL named solution sets.

While

CLEAR GRAPHS

will clear all graphs without clearing the named solution sets.

Truth Maintenance

When bigdata is configured for truth maintenance, it maintains the closure of the specified entailment rules over graphs. Truth maintenance is NOT performed over solution sets. However, if data in a solution set is converted into triples (through a CONSTRUCT) and INSERTed into a graph, then truth maintenance is performed for the triples in that graph.

Manage truth maintenance in SPARQL UPDATE

(This feature is available in releases starting with 2.0)

If you are going to upload a large dataset split into many files, you can significantly increase the performance of such operation by switching off an incremental truth maintenance and computing closure once after all needed files are uploaded.

This can be done by using the following operations in SPARQL UPDATE :

DISABLE ENTAILMENTS;

Disable incremental truth maintenance.

ENABLE ENTAILMENTS;

Enable incremental truth maintenance.

CREATE ENTAILMENTS;

(Re-)compute the entailments using an efficient "database-at-once" closure operation. This is much more efficient than incremental truth maintenance if you are loading a large amount of data into the database. It is not necessary to "DROP ENTAILMENTS" before calling "CREATE ENTAILMENTS" unless you have retracted some assertions. If you do not "DROP ENTAILMENTS" first, then "CREATE ENTAILMENTS" will have the semantics of "updating" the current entailments. Entailments which can be re-proven will have no impact and new entailments will be inserted into the KB. This is significantly more efficient than re-computing the fixed point closure of the entailments from scratch (that is, after a "DROP ENTAILMENTS").

DROP ENTAILMENTS;

Drop the entailments. This is only required if you have removed some statements from the database. If you are only adding statements, then just execute "CREATE ENTAILMENTS".

The following pattern illustrates a valid use of this feature when some assertions are retracted. This sequence of operations is ACID against a Journal. Clients will never observe an intermediate state where the full set of entailments are not available.

# mutations before this point are tracked by truth maintenance.
DISABLE ENTAILMENTS; # disable truth maintenance.
# mutations do not update entailments.
DELETE DATA { triples };
LOAD file1;
LOAD file2;
INSERT DATA { triples };
DROP ENTAILMENTS; # drop existing entailments and proof chains
CREATE ENTAILMENTS; # create new entailments using the database-at-once closure.
ENABLE ENTAILMENTS; # reenable truth maintenance.
# mutations after this point are tracked by truth maintenance.

Be attentive using this feature as you can come across with data inconsistency in case of missed "DROP ENTAILMENTS" after "DELETE" operation or omitted "CREATE ENTAILMENTS" operation!