- 1 Java Requirements
- 2 Download the code
- 3 Blazegraph deployment models
- 4 Running Bigdata
- 5 Bundling Bigdata
Java 7 is required to run Blazegraph.
Errors have been observed using encoding settings other than UTF-8. BLZG-1383. It is recommended to explicitly pass the encoding settings to the JVM. For the Oracle/SUN, the settings below are recommended.
java ... -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 ...
Download the code
You can checkout bigdata from GIT. Older branches and tagged releases have names like BLAZEGRAPH_RELEASE_2_1_0.
Cloning the latest branch:
git clone -b BLAZEGRAPH_RELEASE_X_Y_Z --single-branch https://github.com/blazegraph/database.git BLAZEGRAPH_RELEASE_X_Y_Z
Note, that --single-branch option requires Git v1.7.10, for earlier versions please use:
git clone -b BLAZEGRAPH_RELEASE_X_Y_Z https://github.com/blazegraph/database.git BLAZEGRAPH_RELEASE_X_Y_Z
Tagged releases are available in GIT with tags in the form BLAZEGRAPH_RELEASE_X_Y_X.
For embedded development, we highly recommend checking out bigdata from GIT into Eclipse as its own project. See Maven Notes to get started in Eclipse with Maven.
If you checkout the source from GIT, then use ./scripts/mavenInstall.sh, to build a local copy. You can run ./scripts/startBlazegraph.sh to run it from a local copy.
Blazegraph deployment models
Blazegraph supports several different deployment models (embedded, standalone, replicated, and scale-out). We generally recommend that applications decouple themselves at the REST layer (SPARQL, SPARQL UPDATE, and the REST API). Decoupled applications can scale more gracefully and there is an easy path from the single machine deployment model to the highly available replication cluster deployment model. However, there are applications where an embedded RDF/graph database makes more sense.
Non-Embedded Deployment Models
- See NanoSparqlServer for easy steps on how to deploy a bigdata SPARQL end point + REST API either using an embedded jetty server (same JVM as your application), executable Jar file, or as a WAR (in a servlet container such as tomcat).
- See HAJournalServer for deploying a highly available replication cluster (SPARQL end point + REST API).
- See Using_Bigdata_with_the_OpenRDF_Sesame_HTTP_Server for the significantly more complicated procedure required to deploy inside of the Sesame WAR. Note: We do NOT recommend this approach. The Sesame Server does not use the non-blocking query mode of bigdata. This can significantly limit the query throughput. However, the NanoSparqlServer and HAJournalServer both deliver non-blocking query.
- See CommonProblems page for a FAQ on common problems and how to fix them.
Embedded RDF/Graph Database
We have implemented the Sesame API over Blazegraph. Sesame is an open source framework for storing, inferencing and querying of RDF data, much like Jena. The best place to start would be to head to http://www.openrdf.org openrdf.org, download Sesame, read their User Guide (specifically Chapter 8 - “The Repository API”), and try writing some code using their pre-packaged memory or disk based triple stores. If you have a handle on this you are 90% of the way to being able to use bigdata as an embedded RDF store.
Make sure you are running with the -server JVM option and provide at least several GB of RAM for the embedded database (e.g., -Xmx4G). You should encounter extremely quick loading and great query performance. If your experience is not satisfactory, please contact us and let us help you get the most out of our product. Also see QueryOptimization, IOOptimization, and PerformanceOptimization.
You can use maven - see MavenRepository for the POM.
SBT is a popular and very powerful building tool. In order to add BigData to SBT projects in your build definition you should add:
1) bigdata dependency
libraryDependencies ++= Seq( "com.bigdata" % "bigdata" % bigDataVersion )
2) Aeveral Maven repositories to resolvers:
resolvers += "nxparser-repo" at "http://nxparser.googlecode.com/svn/repository/", resolvers += "Bigdata releases" at "http://systap.com/maven/releases/", resolvers += "Sonatype OSS Releases" at "https://oss.sonatype.org/content/repositories/releases", resolvers += "apache-repo-releases" at "http://repository.apache.org/content/repositories/releases/"
Bigdata Modules and Dependencies
There are several project modules at this time. Each module bundles all necessary dependencies in its lib subdirectory.
- bigdata (indices, journals, services, etc)
- bigdata-rdf (the RDFS++ database)
- bigdata-sails (the Sesame integration for the RDFS++ database)
- bigdata-jini (jini integration providing for distributed services - this is NOT required for embedded or standalone deployments)
The following dependencies are required only for the scale-out architecture:
ICU is required only if you want to take advantage of compressed Unicode sort keys. This is a great feature if you are using Unicode and is available for both scale-up and scale-out deployments. ICU will be used by default if the ICU dependenies are on the classpath. See the com.bigdata.btree.keys package for further notes on ICU and Unicode options. ICU also has an optional JNI library.
Removing jini and zookeeper can save you 10M. Removing ICU can save you 30M.
The fastutils dependency is quite large, but it is automatically pruned in our WAR release to just those classes that bigdata actually uses.