Query Hints

From Blazegraph
Revision as of 13:08, 3 July 2015 by Michael (Talk | contribs)

Jump to: navigation, search

Bigdata supports query hints (since 1.1.0) using magic triples in SPARQL queries. Query hints may be used to change the default behavior of the query plan generator or the runtime evaluation of the compiled query plan. They are documented on the com.bigdata.rdf.sparql.ast.QueryHints interface. For example, the following SPARQL query uses a query hint to disable the join order optimizer. The Basic Graph Patterns (BPGs) will be run in the given order rather than being reordered.

PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?x ?o
WHERE {

  # disable join order optimizer for this group graph pattern.
  hint:Query hint:optimizer "None" .

  ?x rdfs:label ?o .
  ?x rdf:type foaf:Person .
}

Query hints are bound to a scope. The possible scopes are declared by com.bigdata.rdf.sparql.ast.hints.QueryHintScope. They include:

scope definition
Query The entire query.
SubQuery Either the top-level Select or a Sub-Select.
Group The current Graph Pattern Group (also called a "join group").
GroupAndSubGroups The current Graph Pattern Group and all of its subgroups.
Prior The previous construct in the current scope of the SPARQL query which was not itself a query hint. This may be used to bind the query hint to the previous Basic Graph Pattern, Group Graph Pattern, UNION, SERVICE, etc. This is typically used to bind a query hint to a Graph Pattern Group or a Basic Graph Pattern (also called a triple pattern).

Some query hints may require a specific scope, as indicated by the scope column in the table below. This is because some hints are interpreted by the query engine and apply to all operators while others are interpreted by the query plan generator and control either what kind of query plan is generated (when there are options) or parameters associated with specific operators.

Query hints that bind to a specific join generally require that you use the scope Prior in order to clearly identify which join should be runFirst or runLast. Query hints that can bind to a Graph Pattern Group, SubQuery, or Query allow more values.

When experimenting with query hints, it is a good idea to use the Explain view of the NSS in order to verify that the query hint has caused an appropriate change in the behavior of the query plan. See the com.bigdata.rdf.sparql.ast.QueryHints interface and the specific com.bigdata.rdf.sparql.ast.hints.IQueryHint implementations for more details.

Commonly used query hints include:

name scope definition values (default)
optimizer Query, SubQuery, Group, GroupAndSubGroups Control the join order optimizer. "None", "Static", "Runtime" (Static)
runFirst Prior The join should be run first in the current Graph Pattern Group. This can be used only once within a given Graph Pattern Group. xsd:boolean (false)
runLast Prior The join should be run last in the current Graph Pattern Group. This can be used only once within a given Graph Pattern Group. xsd:boolean (false)
runOnce SubQuery The sub-select should be lifted into a named subquery such that it is evaluated exactly once. See NamedSubquery. xsd:boolean (false)
atOnce Any The join(s) should not run until all of their source solutions are fully buffered. All solutions for an "atOnce" operator are materialized before the operator is evaluated. It is then evaluated against those materialized solutions exactly once. Note: "atOnce" evaluation is a general property of the query engine. This query hint does not change the structure of the query plan, but simply serves as a directive to the query engine that it should buffer all source solutions before running the operator. This query hint is allowed in any scope. The hint is transferred as an annotation onto all query plan operators generated from the annotated scope. xsd:boolean (false)
chunkSize Any Sets the target chunk size (aka vector size) for the output buffer of the operator. This query hint does not change the structure of the query plan, but simply serves as a directive to the query engine that it should allocate an output buffer for the operator that will emit chunks of the indicated target capacity. This query hint is allowed in any scope, but is generally used to effect the behavior of a join group, a subquery, or the entire query. xsd:int (100)
maxParallel Any The operator(s) should not execute more than this many times concurrently within a given query. Note: "maxParallel" is a general property of the query engine. This query hint does not change the structure of the query plan, but simply serves as a directive to the query engine that it should not allow more than the indicated number of parallel instances of the operator to execute concurrently. This query hint is allowed in any scope. The hint is transferred as an annotation onto all query plan operators generated from the annotated scope. xsd:int (5)
analytic Query Enable or disable the analytic query mode. xsd:boolean (false)
RTO-sampleType Query, SubQuery, Group, GroupAndSubGroups Specify the sampling mode for the Runtime Query Optimizer. EVEN, RANDOM, DENSE (DENSE)
RTO-limit Query, SubQuery, Group, GroupAndSubGroups Specify the initial vertex and cutoff join sampling limit for the Runtime Query optimizer. The limit will be dynamically adapted as necessary during RTO execution. xsd:int (100)
RTO-nedges Query, SubQuery, Group, GroupAndSubGroups Specify the number of join graph edges that will be explored as starting paths for the Runtime Query optimizer. xsd:int (1)
describeMode Query Specify the algorithm for a DESCRIBE query. SymmetricOneHop|CBD|SCBD) (SymmetricOneHop)
describeIterationLimit Query Specify the maximum #of iterations for an iterative DESCRIBE algorithm (CBD, SCBD) -or- ZERO (0) for no limit. Note that BOTH the iterations and statements limits must be reached before a DESCRIBE query will be terminated. xsd:int (5)
describeStatementLimit Query Specify the maximum #of statements in a DESCRIBE query result for an iterative DESCRIBE algorithm (CBD, SCBD) -or- ZERO (0) for no limit. Note that BOTH the iterations and statements limits must be reached before a DESCRIBE query will be terminated. xsd:int (5000)
filterExists Query, SubQuery Specify the evaluation mode for FILTER (NOT) EXISTS. There are two basic strategies. One vectors the source solutions into a sub-plan - this is more efficient if the FILTER is lightweight and there are a lot of source solutions flowing into the FILTER. The other uses a sub-query per source solution and imposes a LIMIT ONE on each subquery. This is more efficient if FILTER is relatively expensive to fully evaluate, has multiple solutions per source solution, and there are relatively few solutions flowing into the FILTER. Available in bigdata r1.3.2. SubqueryLimitOne) (VectoredSubPlan)
queryId Query Assign a UUID to a query. This may be used to CANCEL a running query. UUID (assigned automatically if not specified in the query)
normalizeFilterExpressions Query, SubQuery, Group, GroupAndSubGroups Starting with Blazegraph 1.5.2 filter expressions are normalized into conjunctive normal form and decomposed. This query hint can be used to turn normalization off. You may consider this option when you have very large, complex filter expressions and observe that the decomposition leads to a blow-up in the query size. xsd:boolean