Cluster Setup Guide CentOS 5 3/5 4

From Blazegraph
Jump to: navigation, search


This page contains some information about setting up bigdata over a CentOS 5.3/5.4 minimum install and presumes that you have root privileges and will install bigdata to run as root (the latter is not necessary, but that is what is shown here). See the ClusterGuide for more general information on a bigdata cluster install.


Install the following packages. Some of these are optional (telnet, emacs, nfs-utils, ntp).

yum -y install man # optional (man page support).
yum -y install mlocate # optional (used to locate procmail's lockfile, which is at /usr/bin/lockfile).
yum -y install emacs-nox # optional.
yum -y install make # install make (required to build sysstat).
yum -y install vixie-cron # install vixie cron (cron is used to manage the bigdata runstate).
yum -y install telnet # optional (useful for testing services and firewall settings)
yum -y install nfs-utils  # optional (used iff you will use NFS for the shared volume).
yum -y install ntp # optional, but highly recommended.
yum -y install subversion # used to checkout bigdata from its SVN repository (only necessary for the main server).
yum -y install procmail # needed for lockfile command.
yum -y install gcc # needed for sysstat compilation, below.


CentOS 5.3 uses an earlier build of sysstat, which does not include pidstat, so DO NOT install the RPM. If installed, it must be removed. Then download and install the sysstat rpm as follows. You will have to do this on each node (or you can do it once on a shared volume and then just do 'make install' on each node).

rpm -e sysstat # uninstall the sysstat rpm bundle (iff installed)
yum -y install kernel-devel gettext
cd /tmp
tar xvfz sysstat-9.0.6.tar.gz
cd sysstat-9.0.6
make install


CentOS 5.3 uses an earlier build of ant, so DO NOT install the RPM. Download and install an appropriate ant binary instead.

cd /tmp
tar xvfz apache-ant-1.8.1-bin.tar.gz
cp -r apache-ant-1.8.1 /usr/java


Linux, like many other operating systems, has a very aggressive posture towards free memory. By default, Linux will allow your applications to occupy no more than 1/2 of the available RAM before it begins to swap things out. You can fix this by turning down the swappiness parameter to ZERO.

sysctl -w vm.swappiness=0

Host Configuration

You MUST be able to resolve the hostnames in the cluster using DNS. Normally someone is administering DNS and you don't have to worry about this. If that is false, then the easy fix is to edit /etc/hosts to make sure each host in the cluster knows the name and IP associated with all the hosts in the cluster.

Here is a sample /etc/hosts file. Your file must reflect the IP addresses and host names in your cluster.     localhost localhost.localdomain
x.y.z.129     BigData0
x.y.z.130     BigData1
x.y.z.131     BigData2

VNC (optional, "main" host only)

VNC can be used to remotely login to the X-Windows desktop on the machines in the cluster. This can be very useful and it can be done securely using an ssh tunnel. This installs X-Windows, the KDE desktop, and the VNC server. See [1] for more information.

# install X and KDE
yum -y install xorg*
yum -y install xfce*
yum -y update # required to get around kdebase-wallpapers conflict for fc10.
yum -y install kde*

It appears that NetworkManager (the network-manager package) can cause a conflict if you are using static IPs, in which case it should be removed. See and

rpm -qa | grep -i network | egrep -i 'manager|management'                                        

Once you have removed those packages, continue with the vnc install.

Install vnc.

yum -y install vnc-server #(0:4.1.3-1.fc10)

Set the vnc password.


Edit /etc/sysconfig/vncservers. You must define at least one vncserver here. Choose your own display resolution. Use the "-localhost" option to restrict connections to SSH tunnels. The remote machine should port forward local 5901 to remote localhost:5901 and then connect using "localhost:1".

VNCSERVERARGS[1]="-geometry 1280x1024 -nolisten tcp -nohttpd -localhost"

Specify KDE as the display manager by editing /etc/sysconfig/display. This only has effect each time you start vncserver. It will not effect a session that is already running.


Start vncserver and configure the vncserver runlevels.

/etc/init.d/vncserver start
chkconfig vncserver on

Edit ~/.vnc/xstartup

# Uncomment the following two lines for normal desktop:
exec /etc/X11/xinit/xinitrc

See the notes above on how to connect using an ssh tunnel.

NFS (optional, done differently for the NFS server and the clients)

Bigdata requires a shared volume to hold the JARs, configuration files, and similar things. This volume must be mounted by each host in the cluster. One way to do this is to use NFS. See the ClusterSetupGuide for guidance on how to setup NFS while leaving iptables enabled.

Open up the iptables firewall for log4j, zookeeper and jini

If this is necessary in your environment, then see ClusterSetupGuide for information on how to configure the firewall.

Install JDK

Install the JDK on each node in the cluster. The JDK must be installed into the same location on each machine. Java 7 is required to run Blazegraph.

Checkout, configure and install bigdata

Now that you have the cluster nodes prepped, please see the ClusterGuide for details on how to checkout, configure and install bigdata.