Common Problems

From Blazegraph
Jump to: navigation, search

This page provides a FAQ for common problems and how to fix them.

Problem: How do I build / install bigdata on a single machine?

To checkout, compile, and run see Compiling.

To setup with Eclipse, see Eclipse Setup.

Legacy (prior to 2.0)

1. bigdata is an eclipse project, so you can just check it out from under eclipse and it should build automatically.

2. There is an ant build script (build.xml). Use "ant" to generate the jar.

3. Use "ant bundleJar" to generate the jar and bundle all of the dependencies into the build/lib directory. You can then copy those jars to where ever you need them.

Problem: How do I install bigdata on a cluster?

"ant install" is the cluster install.

There are notes in build.properties and in build.xml for the "install" target on how to setup a cluster install. There are examples of bigdata configuration files for a 3-node cluster and for a 15-node cluster in src/resources/config. The cluster install is currently Linux specific, but we would be happy to help with ports to other platforms. It installs a bash script which should be run from a cron job. There is also a dependency on sysstat, http://pagesperso-orange.fr/sebastien.godard/, for collecting performance counters from the O/S.

Please see the ClusterGuide for more detail.

We recommend that you ask for help before attempting your first cluster install.

Problem: What are all these pieces?

Solution: There are several layers to the bigdata architecture. At its basic layer, you can create and manage named indices on a com.bigdata.journal.Journal. The journal is a fast append only persistence store suitable for purely local application. Scale-out applications are written to the com.bigdata.service.IBigdataClient and com.bigdata.service.IBigdataFederaion APIs. There are several implementation of the IBigdataFederation interface:

com.bigdata.journal.Journal: This is not a federation at all. However, the Journal may be used for a fast local persistence store with named indices.

com.bigdata.service.LocalDataServiceFederation: Provides a lightweight federation instance backed by a single com.bigdata.service.DataService. The DataService provides the building block for the scale-out architecture and handles concurrency control write access to indices hosted by the DataService. This federation class does NOT support key-range partitioned indices, but it is plug and play compatible with the federations that do, which makes this a good place to develop your applications

com.bigdata.service.EmbeddedDataServiceFederation: Provides an embedded (in-process) federation instance supporting key-range partitioned indices. This is mainly used for testing those aspects of bigdata or of specific applications which are sensitive to key-range partitioning of indices and to overflow events. An overflow event occurs the live journal absorbing writes for a DataService reaches its target maximum extent. Synchronous overflow processing is very fast. It creates a new "live" journal and defines new views of the indices found on the old live journal on the new journal. A background process then provides asynchronous compacting merges and related operations for the index views and also make decisions concerning whether to split, join or move index partitions.

com.bigdata.service.jini.JiniFederation: This is the scale-out architecture deployed using jini. Services may be started on machines throughout a cluster. The services use jini to register themselves and to discover other services.

Problem: You see a javac internal error from the ant build script.

Solution: Use jdk1.6.0_07 or better. Several problems with javac parsing were apparently resolved in 1.6.0_07 that were present in the 1.5 releases of the jdk.

Problem: You see an ArrayIndexOutOfBoundsException from the KeyDecoder in a SparseRowStore write.

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at com.bigdata.rdf.sail.BigdataSail.setUp(BigdataSail.java:405)
        at com.bigdata.rdf.sail.BigdataSail.<init>(BigdataSail.java:430)
        at Test.main(Test.java:73)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at com.bigdata.rdf.sail.BigdataSail.setUp(BigdataSail.java:398)
        ... 2 moreCaused by: java.lang.ArrayIndexOutOfBoundsException: -1
        at com.bigdata.sparse.KeyDecoder.<init>(KeyDecoder.java:261)
        at com.bigdata.sparse.AbstractAtomicRowReadOrWrite.atomicRead(AbstractAtomicRowReadOrWrite.java:234)
        at com.bigdata.sparse.AbstractAtomicRowReadOrWrite.atomicRead(AbstractAtomicRowReadOrWrite.java:152)
        at com.bigdata.sparse.AtomicRowWriteRead.apply(AtomicRowWriteRead.java:167)
        at com.bigdata.sparse.AtomicRowWriteRead.apply(AtomicRowWriteRead.java:25)

Solution: You don't have the ICU libraries on the classpath. The ICU libraries are in the bigdata/lib/icu folder. ICU provides fast correct unicode support for C and Java. The JDK's Unicode support is based on ICU, but does not support compressed Unicode sort keys. Hence we recommend the ICU package instead. If you DO NOT want Unicode sort keys (that is, if all String data in your index keys is ASCII) then you can use the com.bigdata.btree.keys.KeyBuilder.Options.COLLATOR option to disable Unicode support. This can also be done on a per-index basis when the index is provisioned.

Problem: You are seeing a LOT of log statements.

Solution: Configure log4j correctly!!! Bigdata uses log4j and conditional logging throughout. Some parts of bigdata (especially the B+Trees) produce an absolutely ENOURMOUS amount of logging data unless you have configured logging correctly. Also, logging is an ENOURMOUS performance burden as the StringBuilder operations required to (a) generate the log messages; and (b) generate and parse stack traces in order to give you nice metadata in your log (eg., classname and line number at which the log message was issued) drive the heap an a frantic pace.
By default, log4j will log at a DEBUG level. This is NOT acceptable. You MUST configure log4j to log at no more than ERROR or possibly WARN for bigdata.
In general, you configure log4j using a command line option such as:

-Dlog4j.configuration=file:src/resources/logging/log4j.properties

Notice that the log4j configuration file is specified as a URL, not a file name!

Note: This issue is resolved in CVS. The log level for com.bigdata will be defaulted to WARN if no default has been specified. This should prevent most surprises.

Problem: You see a stack trace out of com.bigdata.btree.Node.dump() or Leaf.dump()

  ERROR child[0] does not have parent reference.
Exception in thread "main" java.lang.RuntimeException: While loading: /tmp/data/test.owl
        at com.bigdata.rdf.store.DataLoader.loadFiles(DataLoader.java:800)
        at com.bigdata.rdf.store.DataLoader.loadData2(DataLoader.java:706)
        at com.bigdata.rdf.store.DataLoader.loadData(DataLoader.java:552)
        at com.bigdata.rdf.store.DataLoader.loadData(DataLoader.java:513)
        at TestDataLoader.main(TestDataLoader.java:26)
Caused by: java.lang.NullPointerException
        at com.bigdata.btree.Node.dump(Node.java:2545)

Solution: Configure log4j correctly!!!! (see above).

The dump() method is only invoked when the log4j level is at DEBUG. BTree code has some assertions that it makes within dump that are not valid during some kinds of mutation (rotation of a key, split or join of a node or leaf). We've tracked down most of these cases and just commented out the dump() invocations, but there are clearly some left over. This does not indicate a problem with the BTree -- you just need to configure log4j correctly!

Note: This issue is resolved in CVS. The log level for com.bigdata will be defaulted to WARN if no default has been specified. This should prevent most surprises.

== Problem: The write performance is slow when using the SAIL API.

- Turn off auto-commit in the SAIL. This forces a commit for each statement loaded. The commit is the slowest possible operation for a database. A commit for each statement added or retracted is the worst possible case. Turn off auto-commit!

Problem: I am using the Journal and the file size grows very quickly.

- Make sure that you are using the RWStore (BufferMode.DiskRW) backend.

- Make sure that you do not hold open query connections across a long series of updates. Bigdata provides snapshot isolation. If a query connection is open, then the data visible to that connection cannot be recycled. Under such circumstances, the size of the backing file on the disk will continue to grow. Storage will be recycled once the query connection is closed. See the documentation for the RWStore for more information about storage recycling.

See RWStore and TxGuide for more information.

Problem: How do I create a scale-out RDF database?

The BigdataSail is just a wrapper over a com.bigdata.rdf.AbstractTripleStore. It provides some constructors which make it easy on you and create the backing persistence store and the AbstractTripleStore. If you are trying to create a scale-out RDF database, then you need to work with the constructor that accepts the AbstractTripleStore object. The http://www.bigdata.com/bigdata/docs/api/com/bigdata/rdf/load/RDFDataLoadMaster.html has some code which creates an RDF database instance automatically if one does not already exist based on the description of an RDF database instance in the bigdata configuration file. You are basically running a distributed data load job. This is a good time to ask for help.

Problem: How do I use bigdata with Hadoop?

Bigdata uses zookeeper, but does not have any other integration points with hadoop at this time. This is something that we are interesting in doing. It should be possible to deploy bigdata over HDFS using FUSE, but we have not tried it.

One of the most common things that people want to do is pre-process a (huge) amount of data using map/reduce and then bulk load that data into a scale-out RDF database instance where they can they using high-level query (SPARQL). The easiest way to do this is to have bigdata clients running on the same hosts as your reduce operations, which are presumably aggregating RDF/XML, N3, etc. for bulk load into bigdata. You can use the file system loader to bulk load files out of a named directory where they are being written by a map/reduce job. Files will be deleted once they are restarted safely on the bigdata RDF database instance. If you are trying to do this, let us know and we can work with you to get things setup.

Problem: You have errors when compiling with Sesame 2.2.4.

Exception in thread "main" java.lang.NoSuchMethodError:
org.openrdf.sail.helpers.SailBase.getConnection()Lorg/openrdf/sail/NotifyingSailConnection;
    at com.bigdata.rdf.sail.BigdataSail.getConnection(BigdataSail.java:794)

The problem was an API compatibility change. This is long since fixed.

Problem: MagicTupleSerializer

bigdata-rdf/src/java/com/bigdata/rdf/magic/MagicTupleSerializer.java:184: error: name clash: serializeVal(MagicTuple) in MagicTupleSerializer overrides a method whose erasure is the same as another method, yet neither overrides the other

You are trying to compile the trunk. The trunk is no longer in use. See GettingStarted.

Problem: Blank nodes appear in the subject position when loading RDF/XML.

Note: The RDF/XML support for statement identifiers is no longer present with the 1.4.0 release. See the Reification Done Right page for the new statement identifier support mechanisms.

Solution: Make sure that the bigdata JARs appear before the Sesame JARs in your CLASSPATH.

The problem arises from an extension to the RDF/XML handling to support statement identifiers. We override some of the Sesame classes for RDF/XML handling to support that extension. We will introduce a MIME type and file extension so that improper handling of RDF/XML will not occur for standard RDF/XML when the JARs are out of order. In the meantime, you can fix this issue by ordering the bigdata JARs before the Sesame JARs.

ICUVersionChange

You see a stack trace whose root cause is complains about an "ICUVersionChange".

java.lang.RuntimeException: ICUVersionChange:
store=com.bigdata.btree.keys.ICUVersionRecord{icuVersion=3.4.3.0,ucolRuntimeVersion=6.0.0.0,ucolBuilderVersion=7.0.0.0,ucolTailoringsVersion=1.0.0.0},
runtime=com.bigdata.btree.keys.ICUVersionRecord{icuVersion=3.6.1.0,ucolRuntimeVersion=6.0.0.0,ucolBuilderVersion=7.0.0.0,ucolTailoringsVersion=1.0.0.0} at com.bigdata.journal.AbstractJournal.<init>(AbstractJournal.java:1033)

ICU (International Components for Unicode, http://site.icu-project.org/) is used by bigdata to generate compressed Unicode sort keys (aka collation keys). The JDK bundles similar functionality, but is unable to generated compressed sort keys, which can make a substantial difference in the size of indices with Unicode keys, and in general provides significantly less support for Unicode.

Unicode sort keys are used internally by the Name2Addr index which maps Unicode index names onto checkpoint addresses and also by the triple store index which maps RDF Values onto term identifiers. The Unicode collation keys used by those indices MUST be stable.

In order for the collation keys to be stable (same input generates the same key) you MUST NOT change the ICU version number. ICU provides some guidance in this regard [1]. In particular, applications which rely on the exact behavior of Unicode collation (sort keys) must link against a specific version of ICU. The ICU version system uses major.minor.milli.micro. Changes to milli and micro are generally Ok, but check the ICU change log and test before you deploy (read more about this below). Changes to major or minor do NOT provide a guarantee of binary compatibility for the generated collation keys and in general will require an export/input of your data.

Bigdata is tested against a specific set of dependencies and those dependencies are bundled with the project in the release and in deployment artifacts. It is NOT always safe to choose a newer version of some dependency as that can result in API and/or binary incompatibility. In the specific case of ICU, it is known that some ICU version changes will result in a Journal which cannot be "read" because the Unicode collation keys are different from those used when the Journal was written. Since Name2Addr maps indices by name onto checkpoint record, this can make it seem as if your data has "disappeared." Restoring the correct ICU dependency version fixes this problem.

A check was added to bigdata in March 2011 to detect ICU version changes at runtime. If the version of the dependency has been changed, an exception is thrown reporting an "ICUVersionChange". In general, the right step is to restore the correct ICU dependency to the classpath. If you are undergoing a deliberate upgrade of the ICU dependency then com.bigdata.journal.Options defines an option which may be used to force an ICU versions update. However, if the new ICU dependency is not binary compatible with your data then you will not be able to read the journal using the new ICU version. In any attempt to upgrade the ICU dependency version, always review the change log notes for ICU, create a backup of your data, and test extensively to verify that the data remains accessible before deploying the new ICU version. In the event that you need to change to an incompatible ICU version you will have to export/import your data, e.g., as RDF/XML.

[1] http://userguide.icu-project.org/design

java.lang.IllegalStateException: UNISOLATED connection is not reentrant

The highest throughput on the SAIL is achieved when using unisolated connections [1]. However, there can be at most ONE (1) open unisolated connection on a Journal. This exception is thrown if there is an attempt to acquire a second unisolated connection from within a thread which already owns the unisolated connection.

Bigdata offers you opportunities for a much higher performance when you architect your application to work with unisolated operations. However, it also supports full read/write transactions. These can be configured using the following option:

com.bigdata.sail.isolatableIndices=true

There can be multiple read-write transactions open concurrently, which is more in line with the assumptions of the openrdf APIs. However, full read-write transactions have less throughput for two reasons. First, each transaction must be fully buffered on an "isolated' index. When the transaction "commits" the write set is reviewed for conflicts with the current committed state of the corresponding unisolated index. If a write-write conflict is observed and cannot be reconciled (add-add conflicts are reconciled) then the transaction will be aborted and must be retried. Second, since RDF data describes a graph irreconcilable write-write conflicts can be quite common if transactions are performing retractions as well as assertions. For both reasons, an application will have higher throughput for writers using the (single) unisolated connection. (The situation is somewhat different in scale-out - see [1].)

[1] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=TxGuide

java.io.FileNotFoundException: rules.log (Permission denied)

This occurs when the process running bigdata does not have sufficient permissions to create or write on the named file. You can either modify the file permissions for the parent directory and/or the file or you can edit the appropriate log4j.properties file to not log on a file.

java.net.NoRouteToHostException: Cannot assign requested address

This exception can show up on Linux platforms if the sockets linger in their open state too long after a connection. For example, we have encountered this when running the BSBM EXPLORE mixture several 100 times in a row (that is, several hundred runs of the BSBM benchmark without the warmup protocol). Similar problems could be observed on production servers with very high low-latency workloads.

Could not connect to SPARQL Service.
java.net.NoRouteToHostException: Cannot assign requested address
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
 at java.net.Socket.connect(Socket.java:579)
 at java.net.Socket.connect(Socket.java:528)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:378)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:473)
 at sun.net.www.http.HttpClient.<init>(HttpClient.java:203)
 at sun.net.www.http.HttpClient.New(HttpClient.java:290)
 at sun.net.www.http.HttpClient.New(HttpClient.java:306)
 at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:995)
 at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:931)
 at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:849)
 at benchmark.testdriver.NetQuery.exec(NetQuery.java:79)
 at benchmark.testdriver.SPARQLConnection.executeQuery(SPARQLConnection.java:109)
 at benchmark.testdriver.ClientThread.run(ClientThread.java:74)
java.net.NoRouteToHostException: Cannot assign requested address
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)

See http://www.speedguide.net/articles/linux-tweaking-121 for my source of the fix/workaround

TCP_TW_REUSE
This allows reusing sockets in TIME_WAIT state for new connections when it is safe from protocol viewpoint. Default value is 0 (disabled). It is generally a safer alternative to tcp_tw_recycle

The fix:

echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

Gateway Timeout

This HTTP message with a 504 error code can occur when the HALoadBalancer is used and the service to which the request was proxied is experiencing a very heavy workload or a prolonged GC event. GC pause time should be minimized as described on the QueryOptimization page, e.g., by restricting the size of the JVM heap, by using a garbage collector that favors liveness, by using the analytic query mode, etc. {{{ Query execution: Received error code 504 from server Error message: Gateway Timeout }}}

Problem: calling RDFWriterRegistry.getInstance() and RDFParserRegistry.getInstance() from different threads at the same time causes a deadlock in ServiceProviderHook.

Solution: the method ServiceProviderHook.forceLoad() should be called first of all (in main application thread) to induce a forehanded init of RDFParserRegistry and RDFWriterRegistry classes.