Cluster Setup Guide CentOS 5 3/5 4
This page contains some information about setting up bigdata over a CentOS 5.3/5.4 minimum install and presumes that you have root privileges and will install bigdata to run as root (the latter is not necessary, but that is what is shown here). See the ClusterGuide for more general information on a bigdata cluster install.
Install the following packages. Some of these are optional (telnet, emacs, nfs-utils, ntp).
yum -y install man # optional (man page support). yum -y install mlocate # optional (used to locate procmail's lockfile, which is at /usr/bin/lockfile). yum -y install emacs-nox # optional. yum -y install make # install make (required to build sysstat). yum -y install vixie-cron # install vixie cron (cron is used to manage the bigdata runstate). yum -y install telnet # optional (useful for testing services and firewall settings) yum -y install nfs-utils # optional (used iff you will use NFS for the shared volume). yum -y install ntp # optional, but highly recommended. yum -y install subversion # used to checkout bigdata from its SVN repository (only necessary for the main server). yum -y install procmail # needed for lockfile command. yum -y install gcc # needed for sysstat compilation, below.
CentOS 5.3 uses an earlier build of sysstat, which does not include pidstat, so DO NOT install the RPM. If installed, it must be removed. Then download and install the sysstat rpm as follows. You will have to do this on each node (or you can do it once on a shared volume and then just do 'make install' on each node).
rpm -e sysstat # uninstall the sysstat rpm bundle (iff installed) yum -y install kernel-devel gettext cd /tmp wget http://pagesperso-orange.fr/sebastien.godard/sysstat-9.0.6.tar.gz tar xvfz sysstat-9.0.6.tar.gz cd sysstat-9.0.6 ./configure make make install
CentOS 5.3 uses an earlier build of ant, so DO NOT install the RPM. Download and install an appropriate ant binary instead.
cd /tmp wget http://www.apache.org/dist/ant/binaries/apache-ant-1.8.1-bin.tar.gz tar xvfz apache-ant-1.8.1-bin.tar.gz cp -r apache-ant-1.8.1 /usr/java
Linux, like many other operating systems, has a very aggressive posture towards free memory. By default, Linux will allow your applications to occupy no more than 1/2 of the available RAM before it begins to swap things out. You can fix this by turning down the swappiness parameter to ZERO.
sysctl -w vm.swappiness=0
You MUST be able to resolve the hostnames in the cluster using DNS. Normally someone is administering DNS and you don't have to worry about this. If that is false, then the easy fix is to edit
/etc/hosts to make sure each host in the cluster knows the name and IP associated with all the hosts in the cluster.
Here is a sample /etc/hosts file. Your file must reflect the IP addresses and host names in your cluster.
127.0.0.1 localhost localhost.localdomain x.y.z.129 BigData0 x.y.z.130 BigData1 x.y.z.131 BigData2
VNC (optional, "main" host only)
VNC can be used to remotely login to the X-Windows desktop on the machines in the cluster. This can be very useful and it can be done securely using an ssh tunnel. This installs X-Windows, the KDE desktop, and the VNC server. See  for more information.
# install X and KDE yum -y install xorg* yum -y install xfce* yum -y update # required to get around kdebase-wallpapers conflict for fc10. yum -y install kde*
It appears that NetworkManager (the network-manager package) can cause a conflict if you are using static IPs, in which case it should be removed. See http://ubuntuforums.org/showthread.php?t=253221 and https://bugs.launchpad.net/ubuntu/+source/knetworkmanager/+bug/280762.
rpm -qa | grep -i network | egrep -i 'manager|management' NetworkManager-0.7.1-1.fc10.x86_64 kde-plasma-networkmanagement-0.1-0.12.20090519svn.fc10.x86_64 NetworkManager-glib-0.7.1-1.fc10.x86_64 NetworkManager-vpnc-0.7.0.99-1.fc10.x86_64 kde-plasma-networkmanagement-openvpn-0.1-0.12.20090519svn.fc10.x86_64 NetworkManager-glib-devel-0.7.1-1.fc10.x86_64 kde-plasma-networkmanagement-vpnc-0.1-0.12.20090519svn.fc10.x86_64 kde-plasma-networkmanagement-devel-0.1-0.12.20090519svn.fc10.x86_64 NetworkManager-openvpn-0.7.0.99-1.fc10.x86_64 NetworkManager-devel-0.7.1-1.fc10.x86_64
Once you have removed those packages, continue with the vnc install.
yum -y install vnc-server #(0:4.1.3-1.fc10)
Set the vnc password.
Edit /etc/sysconfig/vncservers. You must define at least one vncserver here. Choose your own display resolution. Use the "-localhost" option to restrict connections to SSH tunnels. The remote machine should port forward local 5901 to remote localhost:5901 and then connect using "localhost:1".
VNCSERVERS="1:root" VNCSERVERARGS="-geometry 1280x1024 -nolisten tcp -nohttpd -localhost"
Specify KDE as the display manager by editing /etc/sysconfig/display. This only has effect each time you start vncserver. It will not effect a session that is already running.
Start vncserver and configure the vncserver runlevels.
/etc/init.d/vncserver start chkconfig vncserver on
# Uncomment the following two lines for normal desktop: unset SESSION_MANAGER exec /etc/X11/xinit/xinitrc
See the notes above on how to connect using an ssh tunnel.
NFS (optional, done differently for the NFS server and the clients)
Bigdata requires a shared volume to hold the JARs, configuration files, and similar things. This volume must be mounted by each host in the cluster. One way to do this is to use NFS. See the ClusterSetupGuide for guidance on how to setup NFS while leaving iptables enabled.
Open up the iptables firewall for log4j, zookeeper and jini
If this is necessary in your environment, then see ClusterSetupGuide for information on how to configure the firewall.
Install the JDK on each node in the cluster. The JDK must be installed into the same location on each machine. Java 7 is required to run Blazegraph.
Checkout, configure and install bigdata
Now that you have the cluster nodes prepped, please see the ClusterGuide for details on how to checkout, configure and install bigdata.