From Blazegraph
Revision as of 17:06, 20 December 2011 by Thompsonbry (Talk | contribs) (Build lubm: Added caution about the openrdf dependency version)

Jump to: navigation, search

The following instructions will let you run the LUBM benchmark against an embedded bigdata database.

Get the code

The LUBM benchmark can be downloaded from [1]. Directions on its use are available from the project home page. You can download a modified version of the LUBM benchmark which can make it a bit easier to use with bigdata from [2]. The core benchmark is the same. We've added an HTTP SPARQL end point which is used to connect to bigdata and some new options for the generator which are useful when you are generating very large data sets for a cluster. Please contact the project maintainers if you have questions about this modified version of the LUBM benchmark.

The rest of this page assumes that you are working with the modified version of the LUBM test harness.

Obtain and unpack the code.
 tar xvfz bigdata-lubm.tgz
 cd bigdata-lubm

Configure LUBM

Edit, paying attention to at least:

  1. bigdata.dir - Where to find the bigdata source code distribution.
  2. lubm.univ - The data set size.
  3. lubm.maxMem - The JVM heap used by the NanoSparqlServer in the tests.
  4. lubm.baseDir - Where to put the generated data files, etc.
  5. lubm.journalFile - The bigdata backing store file.

Build bigdata

 cd ...
 ant bundleJar

Build lubm

Note: The openrdf dependencies are required in order to build the bigdata-lubm project. You MUST use the correct version of the openrdf dependency for the version of bigdata that you are testing. If you compile the bigdata-lubm project against the wrong openrdf dependency version then you can have run-time dependency errors when you try to load the data or query the data.

 cd ...

Generate a data set

Generate the LUBM data set per the file.

 ant run-generator

Load a data set

Load an LUBM data set into bigdata per the file.

 ant run-load


The NanoSparqlServer is used to answer SPARQL queries. It "knows" about bigdata's MVCC semantics (multi-version concurrency control) and will issue queries to a read-only connection reading from the last commit time on the database and may have somewhat better performance or concurrency as a result. You can more or less follow the same instructions if you want to run against a bigdata federation, but you will have to have the federation up and running already and you will have to use the bulk data loader for the federation to get the data into the database.

Start an http sparql endpoint for that bigdata database instance.

  ant start-nano-server

Run the lubm queries (do this in a different terminal window).

  ant run-query


Here are some sample results.


LUBM U50 using the Journal in the WORM mode. The load time was 122 seconds (56,183 triples per second). Closure time was 44 seconds.

     [java] query       Time    Result#
     [java] query1      40      4
     [java] query3      8       6
     [java] query4      48      34
     [java] query5      59      719
     [java] query7      22      61
     [java] query8      260     6463
     [java] query10     22      0
     [java] query11     20      0
     [java] query12     27      0
     [java] query13     19      0
     [java] query14     3068    393730
     [java] query6      2800    430114
     [java] query9      3590    8627
     [java] query2      999     130
     [java] Total       10982

LUBM U50 (RWStore)

     [java] query	Time	Result#
     [java] query1	28	4
     [java] query3	17	6
     [java] query4	29	34
     [java] query5	39	719
     [java] query7	16	61
     [java] query8	166	6463
     [java] query10	29	0
     [java] query11	29	0
     [java] query12	25	0
     [java] query13	27	0
     [java] query14	2778	393730
     [java] query6	2920	430114
     [java] query2	540	130
     [java] query9	3356	8627
     [java] Total	9999