Query Hints

From Blazegraph
Jump to: navigation, search

Bigdata supports query hints (since 1.1.0) using magic triples in SPARQL queries. Query hints may be used to change the default behavior of the query plan generator, or the runtime evaluation of the compiled query plan. They are documented on the com.bigdata.rdf.sparql.ast.QueryHints interface. For example, the following SPARQL query uses a query hint to disable the join order optimizer. The Basic Graph Patterns (BPGs) will be run in the given order rather than being reordered.

PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX hint: <http://www.bigdata.com/queryHints#> 

SELECT ?x ?o
WHERE {

  # disable join order optimizer for this group graph pattern.
  hint:Query hint:optimizer "None" .

  ?x rdfs:label ?o .
  ?x rdf:type foaf:Person .
}

Query hints are bound to a scope. The possible scopes are declared by com.bigdata.rdf.sparql.ast.hints.QueryHintScope. They include:

scope definition
Query The entire query.
SubQuery Either the top-level Select or a Sub-Select.
Group The current Graph Pattern Group (also called a "join group").
GroupAndSubGroups The current Graph Pattern Group and all of its subgroups.
Prior The previous construct in the current scope of the SPARQL query, which was not a query hint itself. This may be used to bind the query hint to the previous Basic Graph Pattern, Group Graph Pattern, UNION, SERVICE, etc. This is typically used to bind a query hint to a Graph Pattern Group or a Basic Graph Pattern (also called a triple pattern).

Some query hints may require a specific scope, as indicated by the scope column in the table below. This is because some hints are interpreted by the query engine and apply to all operators while others are interpreted by the query plan generator and control either what kind of query plan is generated (when there are options) or parameters associated with specific operators.

Query hints that bind to a specific join, generally requires that you use the scope Prior in order to clearly identify which join should be runFirst or runLast. Query hints that can bind to a Graph Pattern Group, SubQuery, or Query allow more values.

When experimenting with query hints, it is a good idea to use the Explain view of the NSS in order to verify that the query hint has caused an appropriate change in the behavior of the query plan. See the com.bigdata.rdf.sparql.ast.QueryHints interface and the specific com.bigdata.rdf.sparql.ast.hints.IQueryHint implementations for more details.

Commonly used query hints include:

Name Scope Definition Values (default)
optimizer Query, SubQuery, Group, GroupAndSubGroups Control the join order optimizer. "None", "Static", "Runtime" (Static)
runFirst Prior The join should be run first in the current Graph Pattern Group. This can be used only once within a given Graph Pattern Group. xsd:boolean (false)
runLast Prior The join should be run last in the current Graph Pattern Group. This can be used only once within a given Graph Pattern Group. xsd:boolean (false)
runOnce SubQuery The sub-select should be lifted into a named subquery such that it is evaluated exactly once. See NamedSubquery. xsd:boolean (false)
atOnce Any The join(s) should not run until all of their source solutions are fully buffered. All solutions for an "atOnce" operator are materialized before the operator is evaluated. It is then evaluated against those materialized solutions exactly once. Note: "atOnce" evaluation is a general property of the query engine. This query hint does not change the structure of the query plan, but simply serves as a directive to the query engine that it should buffer all source solutions before running the operator. This query hint is allowed in any scope. The hint is transferred as an annotation onto all query plan operators generated from the annotated scope. xsd:boolean (false)
rangeSafe Prior Declare that the data touched by the query for a specific triple pattern is strongly typed, thus allowing a range filter to be pushed down onto an index. Blazegraph stores data in their natural (lexiographic) order in the POS(C) index and the OSP(C) index. Thus, for a known predicate and an object position with a key-range restriction, the range restriction can be pushed down onto the POS(C) index. Likewise, if the predicate is not known, a key-range filter can still be pushed down onto the OSP(C) index. These key-range push downs are not enabled by default due to the data level typing of RDF/SPARQL. However, if your data is strongly typed (e.g., all xsd:int, all xsd:float, etc.) for a given predicate, then this filter is "safe" to push down onto a key range. To use this, specify this query hint immediately after the triple pattern in the query. This tells the query optimizer that the data in the Object position of that triple pattern are "safe" (e.g., of a single data type) such that the optimizer can push down a key range constraint onto the index. Blazegraph will assume that the data covered by the Object position of that triple pattern are the same data type used in the FILTER's LTE, LT, GT, or GTE expressions. xsd:boolean (false)
chunkSize Any Sets the target chunk size (aka vector size) for the output buffer of the operator. This query hint does not change the structure of the query plan, but simply serves as a directive to the query engine that it should allocate an output buffer for the operator that will emit chunks of the indicated target capacity. This query hint is allowed in any scope, but is generally used to affect the behavior of a join group, a subquery, or the entire query. xsd:int (100)
queryEngineChunkHandler Any Available in 2.1.4+. Sets the queryEngineChunkHandler to be managed or on the native heap. See BLZG-533. This query hint is allowed in any scope, but is generally used to affect the behavior of a join group, a subquery, or the entire query. "Managed", "Native"
maxParallel Any The operator(s) should not execute more than this many times concurrently within a given query. Note: "maxParallel" is a general property of the query engine. This query hint does not change the structure of the query plan, but simply serves as a directive to the query engine saying that it should not allow more than the indicated number of parallel instances of the operator to execute concurrently. This query hint is allowed in any scope. The hint is transferred as an annotation onto all query plan operators generated from the annotated scope. xsd:int (5)
analytic Query Enable or disable the analytic query mode. xsd:boolean (false)
RTO-sampleType Query, SubQuery, Group, GroupAndSubGroups Specify the sampling mode for the Runtime Query Optimizer. EVEN, RANDOM, DENSE (DENSE)
RTO-limit Query, SubQuery, Group, GroupAndSubGroups Specify the initial vertex and cutoff join sampling limit for the Runtime Query optimizer. The limit will be dynamically adapted as necessary during RTO execution. xsd:int (100)
RTO-nedges Query, SubQuery, Group, GroupAndSubGroups Specify the number of join graph edges that will be explored as starting paths for the Runtime Query optimizer. xsd:int (1)
describeMode Query Specify the algorithm for a DESCRIBE query. SymmetricOneHop|CBD|SCBD) (SymmetricOneHop)
describeIterationLimit Query Specify the maximum #of iterations for an iterative DESCRIBE algorithm (CBD, SCBD) -or- ZERO (0) for no limit. Note that BOTH the iterations and statements limits must be reached before a DESCRIBE query will be terminated. xsd:int (5)
describeStatementLimit Query Specify the maximum #of statements in a DESCRIBE query result for an iterative DESCRIBE algorithm (CBD, SCBD) -or- ZERO (0) for no limit. Note that BOTH the iterations and statements limits must be reached before a DESCRIBE query will be terminated. xsd:int (5000)
constructDistinctSPO Query Query hint for disabling the DISTINCT SPO behavior for a CONSTRUCT QUERY. When disabled, the CONSTRUCT will NOT eliminate duplicate triples from the constructed graph. Note that CONSTRUCT automatically avoids duplicate detection and removal for cases where a CONSTRUCT is already "obviously" distinct. Thus this query hint is really only for very large graphs where you do not want the overhead of that DISTINCT SPO filter and are willing to accept duplicate triples in the output. In order to work correctly, the query MUST NOT use a quads mode default graph access path (use the GRAPH keyword instead) and MUST NOT use any other constructions that require a HASH JOIN in the query plan. (You can also use {@link #ANALYTIC} query hint to use the native heap for the DISTINCT SPO filter, but this still imposes a RAM burden for the query.) Since blazegraph 1.5.2. xsd:boolean (false)
filterExists Query, SubQuery Specify the evaluation mode for FILTER (NOT) EXISTS. There are two basic strategies. One vectors the source solutions into a sub-plan - this is more efficient if the FILTER is lightweight and there are a lot of source solutions flowing into the FILTER. The other uses a sub-query per source solution and imposes a LIMIT ONE on each subquery. This is more efficient if FILTER is relatively expensive to fully evaluate, has multiple solutions per source solution, and there are relatively few solutions flowing into the FILTER. Available in bigdata r1.3.2. SubqueryLimitOne) (VectoredSubPlan)
queryId Query Assign a UUID to a query. This may be used to REST_API#CANCEL a running query. UUID (assigned automatically if not specified in the query)
normalizeFilterExpressions Query, SubQuery, Group, GroupAndSubGroups Starting with Blazegraph 1.5.2 filter expressions are normalized into conjunctive normal form and decomposed. This query hint can be used to turn normalization off. You may consider this option when you have very large, complex filter expressions and observe that the decomposition leads to a blow-up in the query size. xsd:boolean
defaultGraphDistinctFilter Query Disable the distinct filter for default graph access. This hint has effect only for quads mode and can be safely used whenever no triple pattern that is access by the query is present in more than one named graphs. As an alternative to the query hint, you may specify the system parameter -Dcom.bigdata.rdf.sparql.ast.QueryHints.defaultGraphDistinctFilter=false to obtain the same effect for all queries. xsd:boolean
regexMatchNonString Query Starting in 2.0.2, by default, regex is only applied to Literal String values. Enabling this query hint will attempt to autoconvert non-String literals into their string value. This is the equivalent of always using the str(...) function. -Dcom.bigdata.rdf.sparql.ast.QueryHints.regexMatchNonString=true to obtain the same effect for all queries. xsd:boolean
pipelinedHashJoin Query, SubQuery, Group, GroupAndSubGroups, Prior If set to true, the annotated subpattern(s) such as a complex join group, OPTIONAL, EXISTS, VALUES, or complex property path node will be executed using a pipelined hash join. If set to false, we use pipelined hash joins only for LIMIT queries that have no ORDER BY clause. Defaults to false. xsd:boolean
gearing Prior Starting 2.1.5, users can use this query hint to set the gearing for a property path. The prior node refers by the query hint must be an (arbitrary length) property path such as pred+ or pred*. The gearing determines whether we start out iterating with the subject (preceding the predicate) or the object (following the predicate). Note that the query hint only takes effect if the respective subject (in mode forward) or object (in mode reverse) is bound at the time when executing the property path. If not specified, the system will make a choice, typically preferring constants over dynamically bound variables. forward, reverse