Tx Guide

From Blazegraph
Jump to: navigation, search

Background

Blazegraph uses Multi-Version Concurrency Control (MVCC) for transactions. MVCC is in the family of optimistic concurrency control algorithms. Blazegraph does not obtain a lock when you start a transaction. Instead, it validates the transaction when it commits. The advantage of MVCC is that readers and writers will never block and writers will always be successful unless there is a conflict. This can yield a higher concurrency called Two Phase Locking (2PL).

Timestamps are central to transaction processing in Blazegraph. There is a unique timestamp for each commit point and each transaction. When a transaction commits, it first validates each tuple in its write set and then annotates each tuple with the revision time for that transaction. A transaction will abort if there is a write-write conflict. This occurs when a concurrent transaction (one running at the same time) modifies the same tuple and commits the changes first. Write-write conflicts are detected by a revision timestamp on the tuple, which is more recent than the start time of the transaction.

Unisolated Indices

By default, Blazegraph registers indices that do NOT support transactions. Write operations on such indices are always "unisolated". Unisolated write operations provide a higher throughput since writes are not double-buffered, but writes on a given index will be serialized.

Against a Journal, unisolated writes can provide full ACID semantics with high performance.

In scale-out, unisolated writes provide shard-wise ACID semantics.

Note that read-only transactions with snapshot isolation are always supported, even when the indices are not configured to support full read/write transactions.

Registering an index that supports transactions

You MUST explicitly enable transaction support when you register an index. Transaction processing requires that the index maintains both per-tuple delete markers and per-tuple version identifiers. While scale-out indices always maintain per-tuple delete markers, neither local nor scale-out indices maintain the per-tuple version identifiers by default.

final IndexMetadata indexMetadata = new IndexMetadata( "testIndex", UUID.randomUUID());

// this index will support transactions.
indexMetadata.setIsolatable(true);
                
// register the index.
store.registerIndex(indexMetadata);

Kinds of transactions

There are two kinds of transactions:

  • read-only transactions
  • read-write transactions

Read-only transactions are always supported. They provide extremely fast, highly concurrent snapshot isolation. You specify a read-only transaction by declaring the commit point from which you want to read to the transaction service. The returned transaction identifier provides snapshot isolation with a fully consistent view of the state of the database as of that commit point.

Read-write transactions fully buffer writes on "isolated" indices, then validate those writes during the commit protocol, and will fail a transaction if the write set cannot be validated (due to intervening commits). Read-write transaction support must be configured when you create an index.

In addition to transactions, you can have unisolated operations. Unisolated operations are key to extremely high concurrency since they do not require any global coordination. Both the RDF database and the "row store" make extensive use of unisolated operations.

Local transaction support

Creating and using transactions with the Journal is straightforward.

Journal store = ...

// start a read-write transaction.
final long txid = store.newTx(ITx.UNISOLATED);

// Obtain a view of a named index isolated by that transaction.
final IIndex isolatedBTree = store.getIndex("testIndex", txid);

// Write on the index.
isolatedBTree.insert("Hello", "World!");

// Commit the transaction.
store.commit(txid);

There is some [sample code] in GIT, which covers the use of transactions on the Journal.

BigdataSail Update Transactions

The BigdataSail wraps the Journal or the Scale-Out architecture. When wrapping the Scale-Out architecture, the index updates are shard-wise ACID as described below.

When wrapping the Journal, the index updates are fully ACID. The following pattern shows how to obtain a connection that supports mutation, work on that connection, and then commit the connection. If anything goes wrong, then the patterns will rollback the work performed on the connection. A similar pattern may be used with the BigdataSailRepository. This class is just a wrapper over the BigdataSail and the connection objects that it returns are just a wrapper over the BigdataSailConnection objects.

BigdataSailConnection conn = null;
boolean ok = false;
try {
conn = sail.getConnection();
doWork(conn);
conn.commit();
ok = true;
} finally {
   if( conn != null ) {
      if(!ok) {
         conn.rollback();
      }
   conn.close();
   }
}

Scale-out transaction support

At this time, only read-only transactions are supported by the distributed database. What good is that you ask? Well, a read-only transaction asserts a read-lock which guarantees that the view you are reading on will not disappear during your operation. This is important for the scale-out architecture since the release of older resources (and the views based on historical commit points on those resources) is driven by write activity. If there is a high write volume on the cluster, it becomes increasingly likely that older views will be discarded. The read lock prevents that.

For the distributed database, you use the IBigdataFederation to obtain a proxy for the ITransactionService, and then you request the transaction identifier from the transaction service.

IBigdataFederation fed = ...

// create a read-only transaction from the most recent commit point on the federation.
final long txid = fed.getTransactionService.newTx(ITx.READ_COMMITTED);

...

// discard the read-only transaction (releases the read-lock).
fed.getTransactionService().abort(txid);

It is important to terminate read-only transactions for the federation since they will prevent resources from being released. If the read-only transaction is very long lived and there is heavy write volume on the database, then the storage demands will continue to increase since older commit points cannot be released while there are outstanding read locks.

Transaction Logger

Recycling behavior depends critically on the close of open transactions. The MVCC architecture of Blazegraph means that data for the historical commit points cannot be recycled until there are no active transactions reading on those commit points. If you are holding open a transaction (either a read-only or a read-write transaction) while writing on the database, the database cannot recycle storage and will start to grow in size on the disk once it fills up the available allocations. See the page on RetentionHistory for more about this issue, including the specifics of the RWStore recycler behavior.

If you suspect a storage leak, you should turn on the following logger in the log4j configuration file:

 com.bigdata.txLog=INFO

This will cause the following events to be logged:


Event Fields Description
OPEN-JOURNAL The UUID, file, and BufferMode of the Journal A Journal was opened.
CLOSE-JOURNAL The UUID and file of the Journal. A Journal was closed.
COMMIT commitTime The unisolated write set was committed.
OPEN txId, readsOnCommitTime A read-only or read-write transaction was opened.
CLOSE txId, readsOnCommitTime A read-only or read-write transaction was closed.
RECYCLER lastCommitTime, latestReleasableTime, lastDeferredReleaseTime, activeTxCount This is an information message generated when the recycler runs. The recycler cannot recycle allocations unless activeTxCount is ZERO (0). If the counter never becomes ZERO (0), then the RWStore will "leak storage". This is generally an application bug.
RECYCLED fromTime, toTime, totalFreed, commitPointsRecycled, commitPointsRemoved Deferred frees of allocations were released (recycled). Check totalFreed and commitPointsRemoved to see if anything was actually recycled.
ABORT N/A The unisolated write set of the Journal was discarded.
ROLLBACK N/A The state of the Journal was restored to the previous root block.
SAIL-CREATE-NAMESPACE namespace A new namespace was created (since 2.2.0).
SAIL-DESTROY-NAMESPACE namespace A namespace was destroyed (since 2.2.0).
SAIL-START-CONN conn A new BigdataSailConnection was created.
SAIL-NEW-TX txId, connn A new read/write transaction identifier was assigned to a BigdataSailConnection. This occurs when a read/write tx is created and each time you call rollback() or commit() on a read/write tx.
SAIL-COMMIT-CONN commitTime, conn commit() was invoked on a BigdataSailConnection.
SAIL-ROLLBACK-CONN conn rollback() was invoked on a BigdataSailConnection.
SAIL-CLOSE-CONN conn close() was invoked on a BigdataSailConnection.
REST-API-TASK-OPEN task A REST API task was created in response to an HTTP request (since 2.2).
REST-API-TASK-SUCCESS task A REST API task completed normally (since 2.2).
REST-API-TASK-ERROR task, cause A REST API task failed (since 2.2)