Cluster Guide

From Blazegraph
Revision as of 22:43, 31 July 2009 by Thompsonbry (Talk | contribs) (Imported from wikispaces)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

We recommend that you ask for help when attempting your first cluster install!

Setup:

  • bigdata. check out from svn.
svn co https://bigdata.svn.sourceforge.net/svnroot/bigdata/trunk bigdata
  • Linux. The installer assumes linux. We are happy to do ports to other platforms, but right now bigdata installs on linux. The main dependency is how we collect performance counters from the OS. Some counters are collected from vmstat, but many are taken from sysstat. Performance counter collection for Windows uses typeperf, but we do not have an installer for Windows.
  • crontab. The easiest way to control bigdata is by running the 'bigdata' script from cron. You can do this as root or as a normal user.
  • kernel. As root, do 'sysctl -w vm.swappiness=0' on each host. By default, Linux will start swapping out processes proactively once about 1/2 of the RAM has been allocated. This is done to ensure that RAM remains available for processes which might run, but it basically denies you 1/2 of your RAM for the processes that you want to run. Changing vm.swappiness from its default (60) to ZERO (0) fixes this.
  • JDK 1.6 must be installed on each machine. We recommend 1.6.0_10 or higher.
  • procmail must be installed on each machine. The specific dependency is the lockfile command.
  • sysstat must be installed on each machine. This is used to collect performance counters from the OS. (Let us know if you can't install sysstat on your cluster. We are considering workarounds to make this dependency optional).
  • jini 2.1 is bundled with bigdata. It will be installed and setup automatically. If you are already running jini, then you can edit the bigdata configuration file to specify the jini groups and locators and use your existing setup.
  • zookeeper is bundled with bigdata. It will be installed and setup automatically. The bigdata configuration file can even start an ensemble of multiple zookeeper servers. If you want to use your existing instance you need to make sure that zookeeper is not listed as one of the services that the bigdata is trying to manage -- this is done in the main configuration file.
  • logging. The installer will setup a log4j SimpleSocketServer writing on files in $NAS/log, where $NAS is the value you specify in the build.properties file. If you want to do something different, you will need to edit the bigdataup script as well.
  • ntpd. This is not a strict requirement, but we highly recommend having ntpd setup on your cluster. This will make the bigdata script much more predicatable since all hosts will run the cron job at the same moment. It will also make it easier to correlate performance counter and events reported by the system since they will be on a common time base. There are pointers to resources for how to do this in build.properties. There are sample configurations for an ntpd server and an ntpd client in src/resources/config.

Configuration

There are two main files:

  • build.properties. This is used by the installer. You MUST specify the NAS (shared volume) and LAS (either local or shared storage for the data) properties, JAVA_HOME, the BIGDATA_CONFIG property (which identifies the main configuration file) and a bunch of similar properties.
  • bigdatCluster.config. This is the main bigdata configuration file. There are two bigdata configuration file templates in src/resources/config. bigdataCluster.config is for a 3-node 32-bit linux install. bigdataCluster16.config is for a 15-node 64-bit linux install. Figure out which template is closest for your purposes and start from there.

Install

The installer will create a directory structured on the NAS directory, including bin, config, log, etc. The configuration file will be copied into the NAS/config directory and the installer will inject values from build.properties into the configuration file and the various script files.

ant install

Post-install:

The installer will write post-install notes on the console -- please read them! Those notes are in src/resources/scripts/POST-INSTALL so you can read them first.

  • After the installer runs, you can define a bunch of useful environment variables by executing the installed bigdataenv script as follows.
source NAS/bin/bigdataenv

NAS

  • If you are installing as root, you will need to fix the owner/group and permissions for all files on $NAS so that they are read/write for all hosts with access to that share. The correct commands will be written on the console as part of the post-install notes.
  • If you are installing as a normal user, the presumption is that you will start bigdata on all machines in the cluster as that user.
  • Edit crontab on each host (crontab -e). There are examples for both root and normal users in the $NAS/bin directory.
  • Edit $NAS/state. This is the run state for bigdata. The default value is 'status'. Change this to 'start'. Next time cron runs bigdata instances will start on each host. If you are running ntpd, then this is very predicable and all machines should start at the same time.
  • Watch $NAS/log/state.log (the output from the cron jobs, which is assigned the alias $stateLog) and $NAS/log/error.log (the output from the aggregated log4j messages from the bigdata services, which is assigned the alias $errorLog). The $stateLog may have overwrites since many shared files systems do not support atomic append. However, you should still be able to see some activity there when the bigdata script ($NAS/bin/bigdata) tries to start services on each host. Once services are started, you should also see log messages appearing in $NAS/log/error.log (if there are errors) and in $NAS/log/detail.log (for various things).
  • After 1-2 minutes, run $NAS/bin/listServices.sh. This will connect to the jini registrar(s) and report on services which it finds running. It provides some detail on the #of and type of bigdata services. It will also tell you if zookeeper is running and if jini is running (or at least, if a jini registrar was discovered). If you don't see all the services yet, make sure that cron is running the $NAS/bin/bigdata script, that the $NAS/state file reads "start", and check the stateLog for obvious errors. If you don't see anything in $NAS/log/detail.log, then verify the the SimpleSocketServer is running on the configured host.

bigdata generally starts in 1-2 minutes. Most of this latency is getting some basic services bootstrapped (zookeeper, jini, logging). The $NAS/bin/bigdataup script is responsible for starting the SimpleSocketServer on the configured host. It also starts a bigdata ServicesManagerServer on each host. The services manager reads the bigdata service configuration out of the main configuration file, which was installed into $NAS/config, and then loads the services templates into zookeeper. The services managers then compete for locks in zookeeper. When they get a lock, they consider whether they can start a given service on their host. If yes, then the service is started. The bigdata services themselves start very quickly.

The $NAS/state is the bigdata run state file. There are several values which are understood:

  • start - start the ServicesManagerServer on the local host if it is not already running.
  • stop - stop the ServicesManagerServer and any child processes (the bigdata services).
  • hup - causes the ServicesManagerServer to reload the main configuration file and pushes the updated configuration into zookeeper. This can cause new services to start.
  • destroy - destroy the bigdata services including ALL your persistent data.
  • status - report whether or not bigdata is running on the local host.

Note: 'bigdata destroy' will destroy your data! Also, it will issue "killall -9 java" by default to provide a sure kill for the bigdata services. You can disable the sure kill in build.properties, but the 'stop' and 'destroy' commands may not succeed (known bug). There are also known bugs with going from 'stop' to 'start'. We figured that "Go!" was the most important feature :-)

Using a bigdata federation:

Working with the scale-out system is a bit different. The main class to get connected to the federation is com.bigdata.jini.JiniClient. You pass in the name of the main configuration file as installed to $NAS/config as an argument. Then you do JiniClient#connect() to obtain a JiniFederation instance. The JiniFederation is an IIndexManager, which is the same interface that is used by the scale-up architecture.

See ScaleOutBasics, ScaleOutTripleStore

Performance counters:

Performance counters from the operating system, the services, and the clients are aggregated, logged, and exposed via httpd for inspection. Each service runs an httpd instance on a random open port, but the load balancer runs an httpd instance on a known port whose value you specified in the build.properties file. You can bring up that link in a browser. E.g.,

http://192.168.10.1/

Of course, you must use the URL for your installation.

There is a navigational view that let's you see what is happening on each host. There are also ways to look at cross sections of the performance counters. The easiest way to do this is to use the Excel worksheets in src/resources/analysis. The main worksheet is bigdataServices.xls. It has tabs for CPU, DISK, RAM and a whole bunch of other goodies. Each tab is just a URL using Excel's web data request mechanism. You can paste the same URLs into the web browser and get back the raw data. "On the list" is a web app that directly renders the graphs from the data into the browser. There are a large number of ways to investigate what the system is doing.

If you are trying to report a problem, we will ask you to run 'extractCounters.sh', which will archive the configuration, the error log, and a bunch of interesting performance counters and wrap it up as a tarball.

Feedback:

Please give us feedback! We would like to make the cluster install as easy as possible.




Common problems


Exception occurred during unicase discovery

INFO: exception occured during unicast discovery to 192.168.0.3:4160
with constraints InvocationConstraints[reqs: {}, prefs: {}]
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:519)
        at java.net.Socket.connect(Socket.java:469)

This stack trace is normal. Jini logs this message if registrar discovery lookup fails for a given IP address. However, this is how we test to see whether or not jini is running on a host where jini is configured to start. If the lookup fails, then jini SHOULD be started automatically.


Configuration requires exact match for host names (build.properties only)

The build.properties file has some values which are host names. For the build.properties file ONLY, it is critical that the configuration value for the property is the value reported on that host by the *hostname* command. The bigdata and bigdataup shell scripts do exact string comparisons on the hostnames in order to handle some conditional service bootstrapping. Those shell script string compares will fail if the hostname command reports a different value. The main configuration file is more flexible since the configured hostnames are evaluated using DNS.