Difference between revisions of " Analytic Query"

From Blazegraph
Jump to: navigation, search
(Turing on Analytic Query)
(updated documentation 1.5.2)
Line 5: Line 5:
 
The problem is the Java architecture for managed memory. You can read about this, and about how we fix it, [http://www.bigdata.com/bigdata/blog/?p=339 here]. What you need to do for this query (and others like it) is turn on the “analytic” mode for bigdata (see below).  
 
The problem is the Java architecture for managed memory. You can read about this, and about how we fix it, [http://www.bigdata.com/bigdata/blog/?p=339 here]. What you need to do for this query (and others like it) is turn on the “analytic” mode for bigdata (see below).  
  
What the analytic query mode will do for you is buffer the data on the native process heap rather than on the JVM object heap. This will reduce the GC overhead associated with the query to basically zero. It performs this feat entirely within Java by leveraging the java.nio package.
+
What the analytic query mode will do is buffer the data on the native process heap rather than on the JVM object heap. This will reduce the GC overhead associated with the query to basically zero. It performs this feat entirely within Java by leveraging the java.nio package.
  
There are analytic and non-analytic versions of all the joins, distinct, etc. operators. The analytic versions use the MemoryManager and the HTree. The non-analytic versions use Java collection classes. The Java collection classes are somewhat faster as long as you are not materializing a lot of data on the Java object heap. For example, for the BSBM “explore” use case the Java operators are about 10% faster overall. DISTINCT is a special case. The Java version of that operator uses a ConcurrentHashMap under the covers and can give you much higher concurrency in the query. But, if you are trying to DISTINCT a large number solutions then you are going to run into trouble with the Garbage Collector.
+
There are analytic and non-analytic versions of all the joins, distinct, etc. operators. The analytic versions use the MemoryManager and the HTree. The non-analytic versions use Java collection classes. The Java collection classes are somewhat faster as long as you are not materializing a lot of data on the Java object heap. For example, for the BSBM “explore” use case the Java operators are about 10% faster overall. DISTINCT is a special case. The Java version of that operator uses a ConcurrentHashMap under the covers and can give you much higher concurrency in the query. But, if you are trying to DISTINCT a large number of solutions then you are going to run into trouble with the Garbage Collector.
  
 
= Turning on Analytic Query =
 
= Turning on Analytic Query =
Line 28: Line 28:
 
</pre>
 
</pre>
  
Just put that triple somewhere in the WHERE clause of the query and the query will run with the “analytic” options enabled. You do not need to declare the “hint:” prefix, but if you want to the namespace should be “http://www.bigdata.com/queryHints#”.
+
Just put that triple somewhere in the WHERE clause of the query and the query will run with the “analytic” options that are enabled. You do not need to declare the “hint:” prefix, but if you want to the namespace should be “http://www.bigdata.com/queryHints#”.

Revision as of 11:31, 21 July 2015

Starting with our 1.1 release, bigdata includes an optional See “analytic query mode”. Enabling analytic query turns on support for the MemoryManager and the HTree and allows bigdata to scale to 4TB of data on the native process heap with zero GC overhead. In the future it will also turn on the runtime query optimizer (RTO).

Background

The problem is the Java architecture for managed memory. You can read about this, and about how we fix it, here. What you need to do for this query (and others like it) is turn on the “analytic” mode for bigdata (see below).

What the analytic query mode will do is buffer the data on the native process heap rather than on the JVM object heap. This will reduce the GC overhead associated with the query to basically zero. It performs this feat entirely within Java by leveraging the java.nio package.

There are analytic and non-analytic versions of all the joins, distinct, etc. operators. The analytic versions use the MemoryManager and the HTree. The non-analytic versions use Java collection classes. The Java collection classes are somewhat faster as long as you are not materializing a lot of data on the Java object heap. For example, for the BSBM “explore” use case the Java operators are about 10% faster overall. DISTINCT is a special case. The Java version of that operator uses a ConcurrentHashMap under the covers and can give you much higher concurrency in the query. But, if you are trying to DISTINCT a large number of solutions then you are going to run into trouble with the Garbage Collector.

Turning on Analytic Query

There are several ways to turn on the “analytic” mode for bigdata. The easiest way to do this is to check the:

[x] analytic

option on the NanoSparqlServer’s SPARQL query form page. If you are using the NanoSparqlServer you can also specify the URL query parameter

...&analytic=true

Finally, you can enable this with a magic triple directly in the SPARQL query:

SELECT ...
...
hint:Query hint:analytic "true" .
...

Just put that triple somewhere in the WHERE clause of the query and the query will run with the “analytic” options that are enabled. You do not need to declare the “hint:” prefix, but if you want to the namespace should be “http://www.bigdata.com/queryHints#”.