- 1 Support Subscriptions
- 2 Change Log for Backwards Compatibility Issues
- 3 Data migration
Customers with support subscriptions should contact their support provider.
Change Log for Backwards Compatibility Issues
This page provides information on changes which break binary compatibility, data migration procedures, and links to utilities which you can use to migrate your data from one bigdata version to another. We try to minimize the need for data migration as much as possible by building versioning information into the root blocks and persistent data structures. However, sometimes implementing a new feature or performance optimization requires us to make a change to bigdata which breaks binary compatibility with older data files. Typically this is because there is a change in the physical schema of the RDF database.
version 1.0.4 => 1.0.6
If a 1.0.4 journal was created using MIN_RELEASE_AGE GT ZERO (0) then an exception will be reported when writing on the journal using 1.0.6. The default for MIN_RELEASE_AGE is ZERO (0), so this will only effect people who have explicitly configured deferred deletes (the recycler) over session protection. The exception is caused by an attempt to re-process deferred deletes associated with older commit records. There is a migration utility which fixes this by pruning the older commit records. This issue is also fixed in the 1.0.x maintenance branch after r6008. Opening a journal with a post 1.0.6 release will not encounter this problem.
See Error releasing deferred frees using 1.0.6 against a 1.0.4 journal for the migration utility.
version 1.0.0 => 1.0.1
The following changes in 1.0.1 cause problems with backwards compatibility.
- https://sourceforge.net/apps/trac/bigdata/ticket/107 (Unicode clean schema names in the sparse row store).
- https://sourceforge.net/apps/trac/bigdata/ticket/124 (TermIdEncoder should use more bits for scale-out).
- https://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder limits Journal to 2B distinct RDF Values per triple/quad store instance).
These changes were applied to the 1.0.0 release branch:
Please note: if you are already using the 1.0.0 release branch after r4863 then you do NOT need to migrate your data as these changes were already in the branch.
version 1.0.x => 1.1.0
The follow changes in 1.1.0 cause problems with backwards compatibility. Of these, the main change was the introduction of the BLOBS index for large literals and URIs. This change in the physical schema of the RDF database made it impossible to maintain backward compatibility with the 1.0.x branch.
- http://sourceforge.net/apps/trac/bigdata/ticket/109 (Store large literals as "blobs")
- http://sourceforge.net/apps/trac/bigdata/ticket/401 (inline xsd:unsigned datatypes)
- http://sourceforge.net/apps/trac/bigdata/ticket/324 (Inline predeclared URIs and namespaces in 2-3 bytes)
version 1.1.0 => 1.2.0
session protection mode
If a 1.1.0 journal was created using MIN_RELEASE_AGE GT ZERO (0) then an exception may be reported when writing on the journal using 1.2.0. The default for MIN_RELEASE_AGE is ZERO (0), so this will only effect people who have explicitly configured deferred deletes (the recycler) over session protection. The exception is caused by an attempt to re-process deferred deletes associated with older commit records. There is a migration utility which fixes this by pruning the older commit records. This issue is also fixed in the 1.1.x maintenance branch after r6008.
See Error releasing deferred frees using 1.0.6 against a 1.0.4 journal for the migration utility.
full text index
As part of the refactor to support a subject-centric full text index, the schema for the integrated bigdata value-centric full text index has been changed. The changes are (a) the term weight is now stored in the key within the B+Tree tuples; and (b) the term weight is now modeled by a single byte (rather than 4 or 8 bytes). These changes reduce the size on disk of the full text index and allow search results for a single keyword to be delivered in relevance order directly from the index without sorting.
These changes ONLY effect stores using the bigdata full text index. While this property is on by default, it is explicitly disabled in many of the sample property files. If you are uncertain, check your property file to see if this change will effect you:
Data migration can be achieved through an export / import.
version 1.3.1 => 1.3.2 (Metabits Demi Spaces)
As of 1.3.2, new and old RWStore instances will implicitly be converted to use a demi-space for the metabits IFF the maximum size of the metabits region is exceeded. This conversion permits the addressing of more than 8k allocators, which was the maximum #of allocators supported in releases up through 1.3.1. Before conversion, the metabits (which identify the addresses of the allocators) were stored in a single allocation slot on the store. The maximum recommended allocation slot size is 8k, which corresponds to the maximum number of supported allocators. After conversion the metabits are stored in two alternating demi-spaces near the head of the RWStore file structure. Older code is NOT able to read the RWStore after conversion. However, older code was unable to address more than 8k of metabits and so could not have read or written on a store which addressed more than 8k metabits.
A utility class (MetabitsUtil.java) exists to convert between these two operational modes for the metabits. However, it is not possible to convert an RWStore to the older (non-demi-space) mode once the number of allocators is greater than 8k since the allocators can no longer be stored in an 8k allocation slot.
If the maximum size of the allocators has been overridden from the default / recommended 8k, then the conversion point is also changed to the overridden maximum slot size.
- RWStore version before conversion: - RWStore version after conversion:
The most straight forward way to migrate data between bigdata versions is an export/import pattern. The ExportKB utility described below may be used to facilitate this, but this is also easy to do within program code.
Each bigdata instance may contain multiple RDF triple stores or quad stores (aka Knowledge Base or KB). Each KB has its own configuration options. If you have only one KB or if all of your KBs have the same configuration, then things are simpler. If you have KBs with different configurations then you will need to pay attention to the export/import procedure for each one.
Standard RDF semantics for blank nodes requires that an export/import process maintains a mapping from the blank node ID to the internal BNode object used to model that blank node. This works fine as long as you export / import a KB as a single RDF document. However, if references to the same blank node ID appear in different RDF documents then they will be construed as distinct blank nodes!
Bigdata also supports a "told bnodes" option. When using this option, the blank node IDs are treated in much the same manner as URIs. They are stable identifiers which may be used to refer to the blank node. In this case, the interchange of RDF data may be broken down into multiple documents and blank node identity will be preserved.
Bigdata supports three main modes for a KB: triples, triples with statement identifiers (SIDs), and quads. Statement identifiers provide for statements about statements. See [[ http://wiki.bigdata.com/wiki/index.php/GettingStarted#You_claim_that_you.27ve_.22solved.22_the_provenance_problem_for_RDF_with_statement_identifiers._Can_you_show_me_how_that_works.3F]]].
Statement identifiers behave in many ways like blank nodes. However, the identity of the statement is grounded in the blank node identifier associated with the context position of a "ground" statement. Interchange of SIDs mode data MAY be broken into multiple documents as long as each document contains all ground statements for any metadata statement also found in that document.
SIDs mode data interchange MUST use the bigdata extension of RDF/XML. The simplest solution is simply to export all data in a KB as a single RDF/XML document and then import that RDF/XML document into a new KB instance.
Axioms and Inferences
When a KB contains materialized inferences you will typically want to export only the "told" triples (those explicitly written onto the KB by the application). After you import the data you can then recompute the materialized inferences. If you export the inferences and/or axioms as well then they will become "told" triples when you import the data into a new bigdata instance.
The com.bigdata.rdf.sail.ExportKB class may be used to facilitate data migration. The ExportKB utility will write each KB onto a separate subdirectory. Both the configuration properties for the KB and the data will be written out. By default, only told triples/quads will be exported.
java -cp ... -server -Dlog4j.configuration=file:bigdata/src/resources/logging/log4j.properties com.bigdata.rdf.sail.ExportKB [options] propertyFile (namespace*)
Note: This class was introduced after the 1.0.0 release, but the code is backwards compatible with that release. People seeking to migrate from 1.0.0 to 1.0.1 should check out both the 1.0.0 release and the 1.0.1 release, the copy the ExportKB class into the same package in the 1.0.0 release and compile a new jar (ant jar). You can then use the ExportKB to export data from the 1.0.0 release. You can also download the ExportKB class directly from .
Before you import your data make sure that the new KB is created with the appropriate configuration properties. If you want to change any configuration options for the KB, now is the time to do it. Simple edit the exported configuration properties file (or copy it to a new location and edit the copy).
If you have only a single KB instance on a Journal, then you just need to copy the exported configuration properties for your KB into the configuration properties for your new Journal.
If there are multiple KBs to be imported, then you need to first create the Journal and then create and import each KB in turn using its exported configuration file.
Once you have created the Journal and are ready to import your data, there are a number of ways to import data.
- The DataLoader class. See the javadoc for more detailed information.
- The NanoSparqlServer.
- The openrdf API.
Data Migration in Scale-Out
Data migration for a bigdata federation is more complex due to the data scales involved. If the KBs in the federation are using told blank nodes mode then export/import can be achieved using the same patterns described above. However, if the cluster is using standard RDF blank node semantics then export/import is more complex as the data can only be reliably interchanged as a single massive RDF "document" or via special purpose code designed to handle the isomorphism of a very large number of blank nodes between two graphs.