Analytic Query

From Blazegraph
Jump to: navigation, search

Overview

Starting with our 1.1 release, bigdata includes an optional See “analytic query mode”. Enabling analytic query turns on support for the MemoryManager and the HTree and allows bigdata to scale to 4TB of data on the native process heap with zero GC overhead. In the future it will also turn on the runtime query optimizer (RTO).

Background

The problem is the Java architecture for managed memory. You can read about this, and about how we fix it, here. What you need to do for this query (and others like it) is turn on the “analytic” mode for bigdata (see below).

What the analytic query mode will do is buffer the data on the native process heap rather than on the JVM object heap. This will reduce the GC overhead associated with the query to basically zero. It performs this feat entirely within Java by leveraging the java.nio package.

There are analytic and non-analytic versions of all the joins, distinct, etc. operators. The analytic versions use the MemoryManager and the HTree. The non-analytic versions use Java collection classes. The Java collection classes are somewhat faster as long as you are not materializing a lot of data on the Java object heap. For example, for the BSBM “explore” use case the Java operators are about 10% faster overall. DISTINCT is a special case. The Java version of that operator uses a ConcurrentHashMap under the covers and can give you much higher concurrency in the query. But, if you are trying to DISTINCT a large number of solutions then you are going to run into trouble with the Garbage Collector.

Turning on Analytic Query

There are several ways to turn on the “analytic” mode for bigdata.

Using the Workbench

The easiest way to do this is to check the:

[x] analytic

option on the NanoSparqlServer’s SPARQL query form page.

Via the REST API

If you are using the NanoSparqlServer you can also specify the URL query parameter

...&analytic=true

Finally, you can enable this with a magic triple directly in the SPARQL query:

SELECT ...
...
hint:Query hint:analytic "true" .
...

Just put that triple somewhere in the WHERE clause of the query and the query will run with the “analytic” options that are enabled. You do not need to declare the “hint:” prefix, but if you want to the namespace should be “http://www.bigdata.com/queryHints#”.

Globally for All Queries

You can pass in the property below to globally enable the Analytic Query mode for a running instance.

-Dcom.bigdata.rdf.sparql.ast.QueryHints.analytic=true