Contributors

From Blazegraph
Jump to: navigation, search

This page needs updating for the migration to GIT.

Getting Setup as a Contributor

Contributor Agreements

If you are interested in being a contributor, we will need you to complete a contributor agreement. Please contract the project managers for more details.

Wiki

In order to have wiki edit permission, someone with wiki admin permissions has to edit the privileges for your sourceforge username on the wiki [1]

Developer Email List

Please use the developer email list to communicate about project development issues. This will help us to create artifacts which will be of use to people who are interested in getting involved in the future. Membership in this list is restricted to project developers, but the list is archived and searchable by anyone.

Use this like to subscribe to the developers list:

   https://lists.sourceforge.net/lists/listinfo/bigdata-developers

Use this email address to post to the list:

   bigdata-developers-request@lists.sourceforge.net

The list is archived and can be searched using this link:

   http://sourceforge.net/mailarchive/forum.php?forum_name=bigdata-developers

Please note that your email server MUST accept mail for the postmaster account or the list serve will reject your posts (the postmaster account is part of the email RFCs and the list serve software insists that it exist).

Commit Email List

Developers interested in observing the commit traffic may subscribe to this list. Membership in this list is restricted to project developers, but the list is archived and searchable by anyone.

You MUST specify your sourceforge email address for the commit email list subscription since that is what is associated with your commits.

Use this link to subscribe to the list:

   https://lists.sourceforge.net/lists/listinfo/bigdata-commit

Use this email address to post to the list:

   bigdata-commit-request@lists.sourceforge.net

The list is archived and can be searched using this link:

   http://sourceforge.net/mailarchive/forum.php?forum_name=bigdata-commit

Please note that your email server MUST accept mail for the postmaster account or the list serve will reject your posts (the postmaster account is part of the email RFCs and the list serve software insists that it exist).

Help Forum

Please monitor the help forum. Help is provided using a forum so that anyone can post regardless of whether their mail server is setup correctly (this is lower bar for entry when compared to the developers list serve).

Development

The main development branch should remain stable at all times. Releases are tagged as branches for maintenance. Major change sets should be created in branches (see below). Discussion regarding the project should take place on the developers list so everyone can participate and benefit.

Consistency and coherence in the architecture and the implementation is critical for databases correctness and performance. Coordinate with component owners before making changes to those components. When in doubt, ask first on the developers list. Final resolution for questions concerning the database architecture will be made by the project administrators.

Maintaining Tickets

Issues are maintained on jira.blazegraph.com.

Developers must:

  • file an issue on jira.blazegraph.com for any planned work;
  • accept the issue before making changes; and
  • update the status for accepted issues at least weekly (Friday morning).

This provides everyone with oversight on planned and active change sets via the jira dashboard and makes it easier to minimize conflicts in the code base.


Note: The older trac system can be accessed at trac.bigdata.com. This system is read-only. See the mapping of trac tickets to jira if you need to cross walk a ticket backward or forward between these systems.

Pull Requests (GIT)

The proper process for getting changes into the code base is:

  1. Discuss the feature on the mailing list (bigdata-developers@lists.sourceforge.net). Do this first to make sure that the concept has traction with the developer community. Make sure that you are subscribed to the mailing list first since it will not accept email if you are not subscribed.
  2. Create a ticket for the feature.
  3. Create a feature branch.
  4. Do your work in that branch.
  5. Make sure that you have not broken the tests.
  6. Create a pull request.
  7. Email bigdata-developers@lists.sourceforge.net with the pull request (make sure that you are subscribed to the mailing list first).

Do not commit to the master. Changes will be merged to master from the pull request by one of the project maintainers.

Eclipse EGit Plugin

There is an extremely nice feature in the EGit integration when you can hover over a line of code to see who last modified it. Make sure that EGit is installed. Configure the GIT perspective to point to your local git repository. Right click on an editor and select Team => Show Annotations.

Branching and Merging (GIT)

We strongly recommend taking an hour to work through a Git tutorial:

https://www.atlassian.com/git/tutorials/using-branches

Branching and merging is much, much easier under git. If you want to create your own new branch:

git checkout -b my_branch
If you want to checkout out someone else's branch:
git checkout --track origin/daves_branch
To revert to master
git checkout master
Note: The following will put you in a detached head state where your local repository will not track the remote repository.
git checkout origin/master
This is generally undesirable. To recover from this do
git checkout master

To checkout a tagged release, do the following.

git checkout tags/BIGDATA_RELEASE_1_5_0

Again, this puts you in a detached head state so do

git checkout master

to get back to master.

Pulling changes up from master

If your feature branch is behind master, you can pull up changes using the following command:

git merge origin/master

Private Branches (Sandboxes)

Individual developers interested in exploring new concepts may create a private branch to serve as a sandbox in which they can explore those ideas without introducing changes into the trunk.

To discard all changes and revert to a previous commit

Find the commit point to restore (or just look at github).

git log

Reset to that commit point:

git reset --hard FULL-COMMIT-HASH

Continuous Integration (CI)

CI results are available at [2]. You can download the result of test suite runs. There is an additional artifact for analyzing the logs for the HA CI test suite. If you need a specific branch to be entered into CI, please contact one of the project admins or if you have GitHub Access try the guide below.

Jenkins Configuration

Jenkins is accessed using your GitHub credentials. It is configured to pull automatically from GitHub and spawn up to four EC2 instances dynamically to handle the workload of CI. The authentication is tied to the Github credentials. In general, you should not need to create new Jenkins jobs as the CI should be run through the Github Pull Request integration.

Know Good JVM Settings

ANT_OPTIONS="-XX:MaxPermSize=256m -Dfile.encoding=UTF-8 -Xmx8g -server -Dsun.jnu.encoding=UTF-8"

The maven options are set in the global Jenkins configuration, but are also included here for reference.

MAVEN_OPTIONS="-XX:MaxPermSize=256m -Dfile.encoding=UTF-8 -Xmx8g -server -Dsun.jnu.encoding=UTF-8"

Getting thread dumps

To get a thread dump, you must have your SSH public key installed on the Jenkins SSH Slave EC2 image. Create a JIRA ticket to make this request and include your public ssh key. Then, determine the SLAVE IP that ran the job and ssh directly to the machine to grab the thread dump.

Generating an SSH Public Key

ssh-keygen -t rsa -b 2048 -f ~/.ssh/blazegraph

Hit enter twice for a blank pass phrase or choose one.

cat ~/.ssh/blazegraph.pub

Include the output of your public key in the JIRA ticket.

Initiating a CI Run from GitHub

To initiate a CI run from GitHub, first create a Pull Request (PR) from GitHub. The CI job will run automatically. If you need to retest without a code change, in the comment of the PR, include the text Computer, please test this. This will trigger an automatic CI run in Jenkins. The results will be posted back into the PR and you can take an appropriate action based on the results of the CI (Success, Failure, Error).

CI Job Naming Conventions

<github-module>* (CI job for master for github-module). The exception is the bigdata master, which is called GIT_DEVELOPMENT_MAVEN.

<github-module>*-PR-tester (Pull request tester for github-module), i.e. bigdata-github-maven-PR-tester

Unit tests

Bigdata has a large and growing test suite. Whether you code the unit tests first or after, do not commit code without writing a test suite for that code and verifying the test suite for your changes plus any affected modules. When in doubt, ask or run the entire test suite. After you commit, please review the CI results to see if you have broken anything.

Proxy Test Suites

Some of the test suites in use a "proxy" pattern to allow the same test suite to execute against different implementations or parameterizations of a given implementation. This feature is heavily used to:

  • exercise different backend storage models (the RWStore, MemStore, etc.);
  • run (nearly identical) test suites in triples vs RDR vs quads modes; and
  • exercise the REST API test suite against both embedded and scale-out architectures.

You can specify the delegate for the proxy using

-DtestClass=fully-qualified-class-name

You may need to hunt around a little bit (typically in the TestAll suite) to figure out what are the different proxy class names that you can use on a given proxy test suite. Some of the common ones are:

  • TestBigdataSailWithQuads
  • TestLocalTripleStore
  • TestRWJournal
  • TestWORMStrategy
  • TestNanoSparqlServerWithProxyIndexManager

Running the test suite with maven

You can run the entire test suite using:

mvn clean package

You can run the tests in an individual class in the test suite using:

mvn clean package -Dtest=com.bigdata.journal.jini.ha.TestHA1GroupCommit

An example using the proxy test suite from the command line:

mvn -DtestClass=com.bigdata.rdf.sail.webapp.TestNanoSparqlServerWithProxyIndexManager -Dtest=Test_REST_ASK test

Running the test suite with eclipse

Many of the test suites can be run directly under eclipse. However some of the test suites do have dependencies on external services that must be running before the tests are executed:

  • HA depends on the river ClassServer and LookupStarter
  • Scale-out depends on the river ClassServer and LookupStarter
  • The external text index feature depends on SOLR

These external resources are setup through the maven POM associated with the appropriate projects. You can also start these resources yourself and there are examples on how to do this at the bottom of this page for HA/scale-out.

Hunting resource leaks in CI

Add this to Manage Jenkins => Configure under "Global Properties" "Environment variables". The specific path depends on the version of yourkit that is installed on the CI node.

LD_LIBRARY_PATH
/nas/install/yjp-2014-build-14100/bin/linux-x86-64

Add this to the Advanced options for the jenkins project configuration, e.g., where it says "-server -ea" etc. This specific command begins the profiler with everything disabled. Once you connect to the process, you can then selectively enable things. Replace port=XXXXX is something like port=10001. This is the port that you will use to connect to yourkit. See http://yourkit.com/docs/75/help/getting_started/running_with_profiler/agent.jsp for the background on setting this up.

-agentlib:yjpagent=disableexceptiontelemetry,disablestacktelemetry,port=XXXXX

Setup local port forwarding for the CI machine and ssh into it. Again, replace XXXXX with the specific port.

# ~/.ssh/config
Host ci.bigdata.com
#...
LocalForward XXXXX localhost:XXXX

You can then start yourkit locally and connect to the running CI job (if any).

Intellectual Property

Everyone who is a contributor is bound by a signed contributor license agreement (CLA).

Your own work

Your contributions MUST be your own work. DO NOT incorporate code from other projects or other sources. There MUST be an explicit contribution made the the copyright holders before 3rd party intellectual property may be incorporated into the project. Please refer any such matters to the project administrators.

Dependencies

The choice of a dependency is very important and must be made in consultation with the project administrators. In addition to choosing technically sound dependencies, there are also a number of legal rules that must be followed to properly acknowledge the copyright for the dependency and a number of administrative tasks that must be performed to ensure that the dependency is correctly integrated into development, CI, and the various deployment environments.

Adding a dependency

You MUST NOT add a dependency without contacting the project administrators.

The following all need to be addressed when adding a dependency:

what definition
build.properties The dependency version number needs to be declared.
build.xml The dependency needs to be integrated into the WAR, stage, bundleJar, and javadoc (external links), and various other deployment targets. This is both tricky and vital.
pom.xml The dependency needs to be declared.
Depends.java The dependency needs to be declared. This is responsible for generating the list of dependencies at runtime as part of the banner.
bigdata-XXX/lib The dependency needs to be placed into an appropriate library directory with the correct bigdata module. The choice of the module depends on the scope in which the dependency will be used.
bigdata-XXX/LEGAL The license for the dependency must be placed into the LEGAL directory within the module in which the dependency is housed. The name of the license should include the name of the dependency. E.g., "jetty-license.txt". Many dependencies have the same license, but a separate license file MUST be present for each dependency.
bigdata/NOTICE This file must include any text from a NOTICE file associated with the dependency. This is a requirement of the Apache license!

Updating a Dependency

You MUST verify that the license associated with a dependency has not changed BEFORE updating that dependency.

You MUST NOT update a dependency if there is has been license change. Instead, refer the matter to the project administrators.'

Coding Style, Copyright comment blocks, and related matters.

Head of file comment block

The correct comment block for the head of each source file is the GPL license block as follows:

/*

Copyright (C) SYSTAP, LLC 2006-2014.  All rights reserved.

Contact:
     SYSTAP, LLC
     4501 Tower Road
     Greensboro, NC 27410
     licenses@bigdata.com

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
*/

Author Tags

Author tags should be provided on each class you create and on each class where you make major changes. This helps us to track who are the most knowledgeable people for a given class.

TODOs

Please use the follow tags to mark todos in the code:

  • FIXME - Encouraged for more important tasks.
  • TODO - Encouraged for minor tasks or possible future directions in the code.

Margins and Code Formatting

Please:

  1. Set margins to 80 columns.
  2. Wrap comments and code at the margin.
  3. Set display width of tabs to 4 spaces and set editor to convert tabs to spaces (4). In Eclipse, to set tabs to spaces, there are two settings that must be updated:
    1. Preferences => Java => Code Style => Formatter => Indentation => Tab Policy := Spaces Only
    2. Preferences => General => Editors => Text Editors => [x] Insert Tabs for Spaces

Please do not broadly reformat existing code, especially code for which you are not the primary maintainer, since that makes it significantly more difficult to handle merges.

Conditional Logging

Each class which will have log output should declare its own logger. Loggers should be private, static, and final. Logging at INFO, DEBUG, or TRACE MUST be condition using the pattern:

if(log.isInfoEnabled() {
   log.info(...);
}

Conditional logging is critical for performance. Generating log messages (when they are not directly given strings such as "Hello") produces a tremendous amount of heap churn from String concatenation. Heap churn is evil and must be avoided for performance. Hence, the conditional logging pattern.

System.out and System.err

Do NOT use either System.out or System.err in anything other than a main() routine. It is very difficult to locate the code where such output is being produced and unconditional output not only drives the heap, but it also clogs the CI servers since CI buffers the output of the test suite in memory during the test run.

Eclipse based developers can obtain colorization of their output using grep-console. The defaults colorize java.util.logging output. They can be edited (by removing the square brackets) to also colorize log4j colorizing.

Developer Setups

Scale-out

See ScaleOutCI for some notes on how to setup the scale-out architecture on a single machine.

HA Developers

Note: This is NOT the recommended way to DEPLOY HA. See the HAJournalServer page for deployment guidance. There are also some useful scripts in src/resources/HAJournal that can be used to start the ClassServer and lookup service from the command line.

The following pattern can be used to instantiate 3 HA services on a local workstation. This is only recommended for developers. There are HAJournal configuration files that are designed to operate on the same machine. Assuming that you have bigdata checked out, these files are in the same directory as the HAJournal.java file. They are:

bigdata-jini/src/java/com/bigdata/journal/jini/ha/HAJournal-A.config
bigdata-jini/src/java/com/bigdata/journal/jini/ha/HAJournal-B.config
bigdata-jini/src/java/com/bigdata/journal/jini/ha/HAJournal-C.config

Zookeeper

Zookeeper must be running and is assumed to be at port 2081 on localhost. You can setup zookeeper by downloading it, creating a file conf/zoo.cfg similar to the following and then running bin/zkServer.sh start

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/var/tmp/zookeeper
# the port at which the clients will connect
clientPort=2081

ClassServer

java ... com.sun.jini.tool.ClassServer -verbose -stoppable -port 23333 -dir bigdata/lib-dl

LookupStarter

The lookup service must be started and must use the same federation name as the HAJournal config files ("benchmark"). This example works within eclipse and references eclipse variables.

java ... \
-Djini.lib=${project_loc}/bigdata-jini/lib/jini/lib \
-Djini.lib.dl=${project_loc}/bigdata-jini/lib/jini/lib-dl \
-Djava.security.policy=${project_loc}/policy.all \
-Djava.security.debug=off \
-Djava.protocol.handler.pkgs=net.jini.url \
-Dlog4j.configuration=file:bigdata/src/resources/logging/log4j-dev.properties \
-Dcodebase.port=23333 \
-Djava.net.preferIPv4Stack=true \
-Dbigdata.fedname=benchmark \
-Ddefault.nic= \
-Dapp.home=${project_loc} \
com.bigdata.service.jini.util.LookupStarter 

HA Services

You can start these services as follows. They will expose the REST API at ports 8080, 8081, and 8082 as specified in the commands below.

A:

java \
-Djava.security.policy=policy.all \
-Djava.util.logging.config.file=bigdata/src/resources/logging/logging.properties \
-Dlog4j.configuration=bigdata/src/resources/logging/log4j-dev.properties \
-Djetty.port=8080 \
bigdata-jini/src/java/com/bigdata/journal/jini/ha/HAJournal-A.config \
com.bigdata.journal.jini.ha.HAJournalServer \

B:

java \
-Djava.security.policy=policy.all \
-Djava.util.logging.config.file=bigdata/src/resources/logging/logging.properties \
-Dlog4j.configuration=bigdata/src/resources/logging/log4j-dev.properties \
-Djetty.port=8081 \
bigdata-jini/src/java/com/bigdata/journal/jini/ha/HAJournal-B.config \
com.bigdata.journal.jini.ha.HAJournalServer

C:

java \
-Djava.security.policy=policy.all \
-Djava.util.logging.config.file=bigdata/src/resources/logging/logging.properties \
-Dlog4j.configuration=bigdata/src/resources/logging/log4j-dev.properties \
-Djetty.port=8082 \
bigdata-jini/src/java/com/bigdata/journal/jini/ha/HAJournal-C.config \
com.bigdata.journal.jini.ha.HAJournalServer

HA Test Suite

Regression tests for HA should be created here. You can just search for the TestHAJournal.java class.

bigdata-jini/src/test/com/bigdata/journal/jini/ha