Using Blazegraph with the OpenRDF Sesame HTTP Server

From Blazegraph
Jump to: navigation, search

Note: It is much easier to get started using the NanoSparqlServer, which provides a high performance native HTTP interface for bigdata. If you need to use the OpenRDF Sesame HTTP server, then also see the footnote at the bottom of this page for information on improving concurrent query performance in the Sesame HTTP Server.

Note: The cluster is not compatible with the Sesame WAR. The OpenRDF API is too "wide" for a cluster and there are major performance penalties since the Sesame WAR is not "aware" of bigdata snapshot isolation semantics (read-only versus read/write connections). For the cluster, you need to focus on the movement of data, not the OpenRDF API. Bulk load and SPARQL query.

The following instructions will help you run bigdata with the OpenRDF Sesame HTTP Server:

  1. As a pre-requisite you must install a Java 6.0 SDK and configure the JAVA_HOME environment variable. See the tomcat documentation for more information. You will also need to install Apache ant in order to run the ant install task for bigdata (below).
  2. Install Apache tomcat and make a note of where you've installed it. We’ll refer back later to this directory as the TOMCAT_HOME.
  3. Install Sesame 2.3. We have tested with Sesame 2.3.0, but 2.3.1 should probably work just fine. Download and unpack the SDK, and make a note of where you’ve installed it. We’ll refer back later to this directory as the SESAME_DIR.
  4. Locate the SESAME_DIR/war directory and copy both the openrdf-sesame.war and the openrdf-workbench.war web application into TOMCAT_HOME/webapps/.
  5. Start tomcat using either startup.bat or startup.sh in the TOMCAT_HOME/bin directory. When tomcat starts, it will automatically unpack these war files and cause Sesame to create its working data directory, known as ADUNA_DATA. Locate this ADUNA_DATA directory (this is also referred to as the ADUNA_DATA_DIR in the forum post). For us it gets created at:
    Windows XP: C:/Documents and Settings/[user]/Application Data/Aduna/OpenRDF Sesame console
    Linux     : ~/.aduna/openrdf-sesame-console
    Note that there is also an OpenRDF Sesame (it is called the openrdf-sesame under linux) directory. However the bigdata.ttl file described below must be installed into the OpenRDF Sesame console (openrdf-sesame-console under linux) directory, not the OpenRDF Sesame directory. It is also possible that this directory does not exist yet - that is Ok. It will get created if necessary when you run the Sesame console (below) and/or when you run the 'ant install.sesame.server' task (below).
  6. Shutdown tomcat. We will start it again below.
  7. Follow the instructions for GettingStarted.
  8. Locate the build.properties file in the root directory of the bigdata source tree. Search in the build.properties file for the text “OpenRDF Sesame HTTP Server” (it is near the bottom of the file). You will find four properties that need to be set: sesame.dir, sesame.server.dir, workbench.server.dir, and aduna.data.dir. Set these as appropriate to the directories from the steps above.
  9. Setup a bigdata properties file (this will be referred to as BIGDATA_PROPERTIES below). This file will be used to configure your remote bigdata instance. Inside this file is where you will specify the location of the journal (for a scale-up instance), along with all the configuration properties you want for this database instance. There are several examples of bigdata properties files in the bigdata-sails/src/samples/com/bigdata/samples directory of the code checked out. Those examples include: fastload.properties, quads.properties, rdfonly.properties, and fullfeature.properties. Choose one of these as the starting point for your own bigdata properties file. This will be your BIGDATA_PROPERTIES file. For example, the value of this might be "c:/bigdata/bigdata.properties". At a minimum you MUST specify the name of the file that bigdata will use as the persistence store, for example:
    com.bigdata.journal.AbstractJournal.file=/data/sesame/bigdata.jnl
    Also see [1] which discusses the bigdata properties file in the context of the OpenRDF Sesame HTTP Server.
    1. Note: The Sesame Server uses auto-commit to handle update, thus you must set the bigdata property "com.bigdata.rdf.sail.allowAutoCommit" to true. Bigdata does not allow auto-commit by default, because it dramatically reduces update performance. Thus the Sesame Server is not a good choice for loading data into a journal. For high-performance load, please see the com.bigdata.rdf.store.DataLoader.
      com.bigdata.rdf.sail.allowAutoCommit=true
  10. You can now use the ant task to compile the bigdata source, prepare a jar file, and install the various files necessary to get bigdata running behind the Sesame HTTP Server using
    ant install.sesame.server
  11. Start Tomcat to get the Sesame HTTP Server web application running.
  12. Use the Sesame console application (located in SESAME_DIR/bin) to create a new bigdata repository instance. You will need to specify a repository ID (default is “bigdata”), a repository title, and the location of a bigdata properties file.
    1. run console.bat or console.sh
    2. Assuming your Sesame HTTP Server is running on localhost:9999, connect to that from the Sesame console using
      connect http://localhost:9999/openrdf-sesame .
    3. Now create the bigdata repository instance.
      create bigdata .
      The "bigdata" in this command corresponds directly to the bigdata.ttl template file that describes a bigdata repository. If this command fails, the console is not finding the template file. Refer to the steps above where you edited the build.properties file for bigdata and ran the ant install.sesame.server task, and make sure you are getting the bigdata.tt' template file installed in the right place.
    4. If the create command can locate the bigdata.ttl template file, you will be prompted with a series of questions:
      1. For Repository ID [bigdata]:, this is the ID used to identify the specific remote database instance you are creating. The default is simply "bigdata", but use whatever ID you like.
      2. For Repository title [Bigdata store]:, this is a label to associate with your remote database instance.
      3. For Properties:, specify the name of BIGDATA_PROPERTIES file you configured above. If you are using Windows, you MUST use the appropriate path separator when you specify the BIGDATA_PROPERTIES file name and also in the file names which you specify in that properties file (such as the location of the bigdata database).
  13. These steps "create" a repository within the OpenRDF Sesame HTTP Server BUT the backing bigdata database instance is not created until you try to use it for the first time. Therefore, to validate you installation you must actually connect to the repository from an application. The DemoSesameServer can be used for this purpose or you can use your own application. The DemoSesameServer application [2] is located in bigdata-sails/src/samples/com/bigdata/samples/remoting in the checkout. This application will create some statements and then perform a simple SPARQL query. The backing bigdata database instance will be created automatically when this application is run. Please note that the initial extent of the database will be 10MB given the default configuration. The backing file will be grown in chunks as it fills up. You can run this application via an ant task
    ant DemoSesameServer

For more information, see this discussion in the bigdata developers forum, GettingStarted, and the Sesame 2.3 Users Guide. Assuming that you have tomcat running on localhost:8080, then you can connect to the Sesame Workbench on your local machine at [3].


See [4] for a thread which shows you how to modify the Sesame HTTP Server for better concurrent query performance with bigdata.