Installation guide

From Blazegraph
Jump to: navigation, search

Java Requirements

Java 7 is required build Blazegraph.

Encoding Requirements

Errors have been observed using encoding settings other than UTF-8. BLZG-1383. It is recommended to explicitly pass the encoding settings to the JVM. For the Oracle/SUN, the settings below are recommended.

 java ... -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 ...

Download the code

- You can download the WAR, Executable Jar, or HA installer from the bigdata sourceforge project page.

GIT

You can checkout bigdata from GIT. Older branches and tagged releases have names like BLAZEGRAPH_RELEASE_2_1_0.

Cloning the latest branch:


     git clone -b BLAZEGRAPH_RELEASE_X_Y_Z --single-branch https://github.com/blazegraph/database.git BLAZEGRAPH_RELEASE_X_Y_Z

Note, that --single-branch option requires Git v1.7.10, for earlier versions please use:

     git clone -b BLAZEGRAPH_RELEASE_X_Y_Z https://github.com/blazegraph/database.git BLAZEGRAPH_RELEASE_X_Y_Z


Tagged releases are available in GIT with tags in the form BLAZEGRAPH_RELEASE_X_Y_X.

    https://github.com/blazegraph/database

Eclipse

For embedded development, we highly recommend checking out bigdata from GIT into Eclipse as its own project. See Maven Notes to get started in Eclipse with Maven.

Other Environments

If you checkout the source from GIT, then use ./scripts/mavenInstall.sh, to build a local copy. You can run ./scripts/startBlazegraph.sh to run it from a local copy.

Blazegraph deployment models

Blazegraph supports several different deployment models (embedded, standalone, replicated, and scale-out). We generally recommend that applications decouple themselves at the REST layer (SPARQL, SPARQL UPDATE, and the REST API). However, there are applications where an embedded RDF/graph database makes more sense.

Non-Embedded Deployment Models

  • See NanoSparqlServer for easy steps on how to deploy a bigdata SPARQL end point + REST API either using an embedded jetty server (same JVM as your application), executable Jar file, or as a WAR (in a servlet container such as tomcat).
  • See Using_Bigdata_with_the_OpenRDF_Sesame_HTTP_Server for the significantly more complicated procedure required to deploy inside of the Sesame WAR. Note: We do NOT recommend this approach. The Sesame Server does not use the non-blocking query mode of bigdata. This can significantly limit the query throughput. However, the NanoSparqlServer delivers non-blocking query.
  • See CommonProblems page for a FAQ on common problems and how to fix them.

Embedded RDF/Graph Database

We have implemented the Sesame API over Blazegraph. Sesame is an open source framework for storing, inferencing and querying of RDF data, much like Jena. The best place to start would be to head to http://www.openrdf.org openrdf.org, download Sesame, read their User Guide (specifically Chapter 8 - “The Repository API”), and try writing some code using their pre-packaged memory or disk based triple stores. If you have a handle on this you are 90% of the way to being able to use bigdata as an embedded RDF store.

Running Bigdata

Make sure you are running with the -server JVM option and provide at least several GB of RAM for the embedded database (e.g., -Xmx4G). You should encounter extremely quick loading and great query performance. If your experience is not satisfactory, please contact us and let us help you get the most out of our product. Also see QueryOptimization, IOOptimization, and PerformanceOptimization.

Bundling Bigdata

Maven

You can use maven - see MavenRepository for the POM.

Scala

SBT is a popular and very powerful building tool. In order to add BigData to SBT projects in your build definition you should add:

1) bigdata dependency

libraryDependencies ++= Seq(
    "com.bigdata" % "bigdata" % bigDataVersion 
)

2) Aeveral Maven repositories to resolvers:

  resolvers += "nxparser-repo" at "http://nxparser.googlecode.com/svn/repository/",

  resolvers += "Bigdata releases" at "http://systap.com/maven/releases/",

  resolvers += "Sonatype OSS Releases" at "https://oss.sonatype.org/content/repositories/releases",

  resolvers += "apache-repo-releases" at "http://repository.apache.org/content/repositories/releases/"

Bigdata Modules and Dependencies

There are several project modules at this time. Each module bundles all necessary dependencies in its lib subdirectory.

  • bigdata (indices, journals, services, etc)
  • bigdata-rdf (the RDFS++ database)
  • bigdata-sails (the Sesame integration for the RDFS++ database)
  • bigdata-jini (jini integration providing for distributed services - this is NOT required for embedded or standalone deployments)

The following dependencies are required only for the scale-out architecture:

  • jini
  • zookeeper

ICU is required only if you want to take advantage of compressed Unicode sort keys. This is a great feature if you are using Unicode and is available for both scale-up and scale-out deployments. ICU will be used by default if the ICU dependenies are on the classpath. See the com.bigdata.btree.keys package for further notes on ICU and Unicode options. ICU also has an optional JNI library.

Removing jini and zookeeper can save you 10M. Removing ICU can save you 30M.

The fastutils dependency is quite large, but it is automatically pruned in our WAR release to just those classes that bigdata actually uses.