Cluster Guide

From Blazegraph
Jump to: navigation, search

We recommend that you ask for help when attempting your first cluster install!

Cluster Setup

Preconditions

  • Linux. The installer assumes linux. We are happy to do ports to other platforms, but right now bigdata installs on linux. The main dependency is how we collect performance counters from the OS. Some counters are collected from vmstat, but many are taken from sysstat. Performance counter collection for Windows uses typeperf, but we do not have an installer for Windows. There is a ClusterSetupGuide, which shows you how to configure a fedora core 10 basic install for use with bigdata.
  • Swap. As root, do 'sysctl -w vm.swappiness=0' on each host. By default, Linux will start swapping out processes proactively once about 1/2 of the RAM has been allocated. This is done to ensure that RAM remains available for processes which might run, but it basically denies you 1/2 of your RAM for the processes that you want to run. Changing vm.swappiness from its default (60) to ZERO (0) fixes this.
  • Either procmail (for the lockfile command) or dotlockfile must be installed on each machine. This is used to coordinate the bigdata subsystem lock.
  • sysstat must be installed on each machine. This is used to collect performance counters from the OS. (Let us know if you can't install sysstat on your cluster. We are considering workarounds to make this dependency optional).
  • ntpd. This is not a strict requirement, but we highly recommend having ntpd setup on your cluster. This will make the bigdata script much more predictable since all hosts will run the cron job at the same moment. It will also make it easier to correlate performance counter and events reported by the system since they will be on a common time base. There are pointers to resources, for how to do this in build.properties. There are sample configurations for an ntpd server and an ntpd client in src/resources/config.
  • crontab. The easiest way to control bigdata is by running the 'bigdata' script from cron. You can do this as root or as a normal user. Specific directions are given when you run the bigdata installer.
  • JDK 1.6 must be installed on each machine. We recommend 1.6.0_17.
  • Ant must be installed on the machine where you will build bigdata.

Bundled components

  • jini 2.1 is bundled with bigdata. It will be installed and setup automatically. If you are already running jini, then you can edit the bigdata configuration file to specify the jini groups and locators and use your existing setup.
  • Zookeeper is bundled with bigdata. It will be installed and setup automatically. The bigdata configuration file can even start an ensemble of multiple zookeeper servers. If you want to use your existing instance you need to make sure that zookeeper is not listed as one of the services that the bigdata is trying to manage -- this is done in the main configuration file.
  • logging. The installer will setup a log4j SimpleSocketServer writing on files in $NAS/log, where $NAS is the value you specify in the build.properties file. If you want to do something different, you will need to edit the bigdataup script as well.

Getting the software

You can download a pre-packaged deployer or checkout the source code and build it.

Configuration

There are two main files:

  • build.properties. This is used by the installer. You MUST specify the NAS (shared volume) and LAS (either local or shared storage for the data) properties, JAVA_HOME, the BIGDATA_CONFIG property (which identifies the main configuration file), the names of the hosts on which certain logging services are running (look for 'XXX' in the build.properties file), and a bunch of similar properties.
  • bigdataCluster.config. This is the main bigdata configuration file. There are two bigdata configuration file templates in src/resources/config. bigdataCluster.config is for a 3-node 32-bit Linux install. bigdataCluster16.config is for a 15-node 64-bit Linux install. Figure out which template is closest for your purposes and start from there.

Install

The installer will create a directory structured on the NAS directory, including bin, config, log, etc. The configuration file will be copied into the NAS/config directory and the installer will inject values from build.properties into the configuration file and the various script files.

ant install

Post-install

The installer will write post-install notes on the console -- please read them! Those notes are in src/resources/scripts/POST-INSTALL so you can read them first.

  • After the installer runs, you can define a bunch of useful environment variables by executing the installed bigdataenv script as follows.
source ${NAS}/bin/bigdataenv
  • If you are installing as a normal user, the presumption is that you will start bigdata on all machines in the cluster as that user.
  • Edit crontab on each host (crontab -e). There are examples for both root and normal users in the $NAS/bin directory.
  • Edit $NAS/state. This is the run state for bigdata. The default value is 'status'. Change this to 'start'. Next time cron runs bigdata instances will start on each host. If you are running ntpd, then this is very predicable and all machines should start at the same time.
  • Watch $NAS/log/state.log (the output from the cron jobs, which is assigned to the alias $stateLog) and $NAS/log/error.log (the output from the aggregated log4j messages from the bigdata services, which is assigned to the alias $errorLog). The $stateLog may have overwrites since many shared file systems do not support atomic append. However, you should still be able to see some activity there when the bigdata script ($NAS/bin/bigdata) tries to start services on each host. Once services are started, you should also see log messages appearing in $NAS/log/error.log (if there are errors) and in $NAS/log/detail.log (for various things).
  • After 1-2 minutes, run $NAS/bin/listServices.sh. This will connect to the jini registrar(s) and report on which services it finds running. It provides some detail on the #of and type of bigdata services. It will also tell you if zookeeper is running and if jini is running (or at least, if a jini registrar was discovered). If you don't see all the services yet, make sure that cron is running the $NAS/bin/bigdata script, that the $NAS/state file reads "start", and check the stateLog for obvious errors. If you don't see anything in $NAS/log/detail.log, then verify that the SimpleSocketServer is running on the configured host.

bigdata generally starts in 1-2 minutes. Most of this latency is getting some basic services bootstrapped (zookeeper, jini, logging). The $NAS/bin/bigdataup script is responsible for starting the SimpleSocketServer on the configured host. It also starts a bigdata ServicesManagerServer on each host. The services manager reads the bigdata service configuration out of the main configuration file, which was installed into $NAS/config, and then loads the services templates into zookeeper. The services managers then compete for locks in zookeeper. When they get a lock, they consider whether they can start a given service on their host. If yes, then the service is started. The bigdata services start very quickly.

The $NAS/state is the bigdata run state file. There are several values which are understood:

  • start - start the ServicesManagerServer on the local host if it is not already running.
  • stop - stop the ServicesManagerServer and any child processes (the bigdata services).
  • hup - causes the ServicesManagerServer to reload the main configuration file and pushes the updated configuration into zookeeper. This can cause new services to start.
  • destroy - destroy the bigdata services including ALL your persistent data.
  • status - report whether or not bigdata is running on the local host.

Note: 'bigdata destroy' will destroy your data! Also, it will issue "killall -9 java" by default to provide a sure kill for the bigdata services. You can disable the sure kill in build.properties, but the 'stop' and 'destroy' commands may not succeed (known bug). There are also known bugs when going from 'stop' to 'start'. We figured that "Go!" was the most important feature.

Using a bigdata federation

Working with the scale-out system is a bit different. The main class to get connected to the federation is com.bigdata.jini.JiniClient. You pass in the name of the main configuration file as installed to $NAS/config as an argument. Then you do JiniClient#connect() to obtain a JiniFederation instance. The JiniFederation is an IIndexManager, which is the same interface that is used by the scale-up architecture.

See ScaleOutBasics, ScaleOutTripleStore

Performance counters

Performance counters from the operating system, the services, and the clients are aggregated, logged, and exposed via httpd for inspection. Each service runs a httpd instance on a random open port, but the load balancer runs a httpd instance on a known port whose value you specified in the build.properties file. You can bring up that link in a browser using

http://localhost:9999/

on the machine running the load balancer. If you setup port forwarding for local 9999 to localhost:9999 on the machine running the load balancer, then you can examine the performance counters directly from your local web browser and load views of the counters into Excel worksheets.

There is a navigational view that let's you see what is happening on each host. There are also ways to look at cross sections of the performance counters. The easiest way to do this is to use the Excel worksheets in src/resources/analysis. The main worksheet is bigdataServices.xls. It has tabs for CPU, DISK, RAM and a whole bunch of other goodies. Each tab is just a URL using Excel's web data request mechanism. You can paste the same URLs into the web browser and get back the raw data. "On the list" is a web app that directly renders the graphs from the data into the browser. There are a large number of ways to investigate what the system is doing.

If you are trying to report a problem, we will ask you to run 'extractCounters.sh', which will archive the configuration, the error log, and a bunch of interesting performance counters and wrap it up as a tarball.

Debugging a cluster configuration

See ClusterStartupFAQ for tricks, tips and a general approach to debugging a cluster configuration. Also see the notes above.

Feedback

Please give us feedback! We would like to make the cluster install as easy as possible.