B Tree Guide

From Blazegraph
Jump to: navigation, search

Bigdata is a key-range partitioned distributed B+Tree architecture. When running on a single machine, all data is stored in a com.bigdata.journal.Journal, which is backed by a single file on the disk. When running on a cluster, then B+Tree is dynamically sharded and its state will be distributed across the cluster.

The main B+Tree interface is IIndex. This interface is used for both single machine and scale-out B+Tree instances. The main APIs for creating and accessing IIndex objects are IIndexManager and IIndexStore. The easiest way to get started is with the Journal. It implements IIndexManager so you can register named indices and recover them.

The core B+Tree implementation is BTree. All keys in bigdata are unsigned byte arrays. All values are byte arrays. There is autoboxing for Java objects. If you have a simple indexing problem, there are Map and Set implementations (BigdataMap, BigdataSet).

There are a lot of options for the Journal and for IndexMetadata, but the defaults will generally work well. If you have custom key requirements, then you will want to use KeyBuilder to format various Java data types and compound fields as unsigned byte arrays. You can also override the ITupleSerializer, which can make the handling of custom data types for keys and values transparent to your appliction.

Here are the basic steps to get started with a B+Tree on a Journal.

Properties properties = new Properties();
properties.set(Journal.Options.FILE,filename+".jnl");

// create or re-open the journal.
Journal jnl = new Journal(properties);

// create a new index.
BTree btree = BTree.create(jnl,new IndexMetadata(name,UUID.randomUUID()));
jnl.registerIndex(name,btree);
jnl.commit();

// lookup an index by name.
btree = jnl.getIndex(name);

// Close the journal (normal shutdown).
jnl.shutdown();

If you are managing indices at this level, then remember to call jnl.commit(). Basically you are in charge of concurrency control when you approach the B+Tree from the IIndexManager API. The alternative is to submit tasks to the ConcurrencyManager for execution, in which case it handles concurrency control and commits. For scale-out, everything goes through the ConcurrencyManager and uses auto-commit.

The IIndex API itself is pretty straight forward:

btree.insert(key,val):oldVal
btree.lookup(key):val
btree.remove(key):oldVal

You can get range counts and range iterators:

btree.rangeCount(fromKey,toKey):long
btree.rangeIterator(fromKey,toKey):iterator

There is some sample code in GIT. The examples also cover the use of transactions on the Journal.


Scale-out tips

IIndexProcedures are the bigdata equivalent of a stored procedure, except you are running Java code. There are many things that can be written efficiently inline against a local B+Tree which will have miserable performance against a scale-out B+Tree. Sometimes the code can be made more efficient by passing a filter through a range iterator. Other times that code needs to be rewritten as an IIndexProcedure.

There are three kinds of index procedures. Those which handle point tests and are always mapped to a single shard, those which handle a key-range identified by a fromKey and a toKey and are mapped onto the shards which span that key range, and those which handle a set of keys and are mapped onto the specific shards which have data for those keys. Index procedures can be used for efficient batch inserts, batch lookups, scattered writes, gathered reads, etc. The range iterator mechanism is also very flexible.

Also, for scale-out each task submitted for remote execution uses "auto-commit" semantics. Full distributed transactions are in the works but not ready yet.


Please provide us with feedback so we can make this documentation more useful for you!