Named Subquery

From Blazegraph
Jump to: navigation, search

Bigdata supports the ANZO SPARQL extension for named subqueries. Named subqueries let you pre-compute solution sets which may be used multiple times within your query. They are useful when you want to process some subset of your data in multiple ways within a single query. You may also have multiple named subqueries. Each named subquery result can be INCLUDEd into the query in one or more places. The solution sets will be stored on the native heap (HTree) if the analytic query mode is enabled.

Syntax

SELECT ...
WITH {
} AS %NAME
WHERE {
 ...
 INCLUDE %NAME .
}

Examples

Simple Example

The following two queries are nearly identical. However, in the second query the sub-select has been lifted out into a named subquery. This basically guarantees that it will run first.

PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?x ?o
WHERE {
    ?x rdfs:label ?o .
    {
      SELECT ?x WHERE {?x rdf:type foaf:Person}
    }
}
SELECT ?x ?o
  WITH {
    SELECT ?x WHERE { ?x rdf:type foaf:Person }
  } AS %namedSet1
WHERE {
  ?x rdfs:label ?o
  INCLUDE %namedSet1 
}

More complex example

Here is a significantly more complex example from our test suite. You can see that it has created one named solution set and then reused that result at two different locations within the query. The real power of named subqueries comes when you can leverage this reuse.

prefix : <http://example.org/> 
prefix xsd: <http://www.w3.org/2001/XMLSchema#> 

SELECT ?_set17
      (coalesce((COUNT(?_set16) / SAMPLE(?_set19)), 0) AS ?_set10)

WITH {
      SELECT DISTINCT ?_set12
      WHERE {
          ?_set13 :p1 ?_set12.
      }
      ORDER BY ?_set12
      LIMIT 4
} AS %_set15

WHERE {
      ?_set13 :i2d ?_set16.
      {
          SELECT (COUNT(?_set16) AS ?_set19)
          WHERE {
              ?_set13 :i2d  ?_set16.
              ?_set13 :i2c  ?_set17.
              ?_set13 :p1  ?_set12 
              INCLUDE %_set15
          }
      }
      ?_set13 :i2c ?_set17 .
      ?_set13 :p1 ?_set12
      INCLUDE %_set15

}
GROUP BY ?_set17
ORDER BY DESC(?_set10)

BSBM BI Q5

Here is an example for the BSBM BI benchmark. Per the benchmark specification, this query:

Show[s] the most popular products of a specific product type for each country - by review count
Use Case Motivation: For advertisement reasons the owners of the e-commerce platform want to
generate profiles for the two dimensions product type and the country of a customer. 

The original query has two sub-selects which are completely identical.

prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix rev: <http://purl.org/stuff/rev#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

Select ?country ?product ?nrOfReviews ?avgPrice
{
  { Select ?country (max(?nrOfReviews) As ?maxReviews)
    {
      { Select ?country ?product (count(?review) As ?nrOfReviews)
        {
          ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType4> .
          ?review bsbm:reviewFor ?product ;
                  rev:reviewer ?reviewer .
          ?reviewer bsbm:country ?country .
        }
        Group By ?country ?product
      }
    }
    Group By ?country
  }
  { Select ?product (avg(xsd:float(str(?price))) As ?avgPrice)
    {
      ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType4> .
      ?offer bsbm:product ?product .
      ?offer bsbm:price ?price .
    }
    Group By ?product
  }
  { Select ?country ?product (count(?review) As ?nrOfReviews)
    {
      ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType4> .
      ?review bsbm:reviewFor ?product .
      ?review rev:reviewer ?reviewer .
      ?reviewer bsbm:country ?country .
    }
    Group By ?country ?product
  }
  FILTER(?nrOfReviews=?maxReviews)
}
Order By desc(?nrOfReviews) ?country ?product

The common sub-select is:

        Select ?country ?product (count(?review) As ?nrOfReviews)
        {
          ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType4> .
          ?review bsbm:reviewFor ?product ;
                  rev:reviewer ?reviewer .
          ?reviewer bsbm:country ?country .
        }
        Group By ?country ?product

Here is the query rewritten using the NamedSubquery pattern. This query is about 25% faster.

prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix rev: <http://purl.org/stuff/rev#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

Select ?country ?product ?nrOfReviews ?avgPrice

WITH {
    Select ?country ?product (count(?review) As ?nrOfReviews)
    {
      ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType4> .
      ?review bsbm:reviewFor ?product .
      ?review rev:reviewer ?reviewer .
      ?reviewer bsbm:country ?country .
    }
    Group By ?country ?product
} AS %namedSet1

WHERE {
  { Select ?country (max(?nrOfReviews) As ?maxReviews)
    {
       INCLUDE %namedSet1
    }
    Group By ?country
  }
  { Select ?product (avg(xsd:float(str(?price))) As ?avgPrice)
    {
      ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType4> .
      ?offer bsbm:product ?product .
      ?offer bsbm:price ?price .
    }
    Group By ?product
  }
  INCLUDE %namedSet1 .
  FILTER(?nrOfReviews=?maxReviews)
}
Order By desc(?nrOfReviews) ?country ?product