Cheap HA KVM Cluster w/ Shared File System

Document to go through setting up KVM install on CentOS with clustering.

Network Configuration

Configure hostname

# hostnamectl set-hostname centos-vm1.home.labrats.us

Configure Layer 3 Interface Bonding and Bridge Interface

For this documentation, we are going to use 2 ethernet interfaces, one on the system board (em1) and one on an expansion card (p1p1).

Configure em1

/etc/sysconfig/network-scripts/ipcfg-em1. Make sure the MAC address (HWADDR) is correct.

DEVICE=em1
NAME=em1
UUID=a401d981-52cd-4c4d-8c0d-3f81f182fe45
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
MASTER=bond0
SLAVE=yes

Configure p1p1

/etc/sysconfig/network-scripts/ipcfg-p1p1. Make sure the MAC address (HWADDR) is correct.

DEVICE=p1p1
NAME=p1p1
UUID=75ba61cf-0781-4023-b451-911f2e0b69d3
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
MASTER=bond0
SLAVE=yes

Configure bond0

/etc/sysconfig/network-scripts/ipcfg-bond0.

DEVICE=bond0
NAME=bond0
TYPE=Bond
BOOTPROTO=none
#DEFROUTE=yes
#PEERDNS=yes
#PEERROUTES=yes
#IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_PEERDNS=yes
#IPV6_PEERROUTES=yes
#IPV6_FAILURE_FATAL=no
ONBOOT=yes
BRIDGE=virbr0

/etc/modprobe.d/bond0.conf

alias bond0 bonding
options bond0 primary=em1 miimon=100 mode=1 updelay=30000

Configure virbr0

/etc/sysconfig/network-scripts/ipcfg-virbr0.

DEVICE="virbr0"
TYPE=BRIDGE
ONBOOT=yes
BOOTPROTO=none
IPADDR="192.168.1.201"
NETMASK="255.255.255.0"
GATEWAY="192.168.1.1"
DNS1="216.136.95.2"
DNS2="64.132.94.250"
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_PEERDNS=no
IPV6_PEERROUTES=no
IPV6_FAILURE_FATAL=no
NM_CONTROLLED="no"

Enable IP Bridging:

# systemctl start libvirtd
# systemctl enable libvirtd
# echo "net.ipv4.ip_forward = 1" | tee /etc/sysctl.d/99-ipforward.conf
# sudo sysctl -p /etc/sysctl.d/99-ipforward.conf

Configure resolv.conf

domain home.labrats.us
search home.labrats.us labrats.us
nameserver 216.136.95.2
nameserver 64.132.94.250

Disable Network Manager

/bin/systemctl disable NetworkManager
/bin/systemctl disable NetworkManager-dispatcher

Delete Network Manager Packages

# yum erase NetworkManager-tui NetworkManager-glib NetworkManager

Restart networking

# /etc/init.d/network restart
Restarting network (via systemctl):                        [  OK  ]

It is possible that you will need to reboot the server.

# shutdown -r now

Show Interface Bonding

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: p1p1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 30000
Down Delay (ms): 0

Slave Interface: em1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Slave queue ID: 0

Slave Interface: p1p1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Slave queue ID: 0

Setup Bond1 and Bond2

Use similar configuration steps as above to map em2/p1p2 to bond1, and em3/p1p3 to bond3. We will skip the bridging setup for these interfaces, as they are for Cluster and Storage Networks.

Bond interface configurations look like this:

Bond1 interface

/etc/modprobe.d/bond1.conf

alias bond1 bonding
options bond1 primary=em2 miimon=100 mode=1 updelay=30000

/etc/sysconfig/network-scripts/ifcfg-em2

NAME=em2
UUID=903d0880-7c69-4bfd-b928-35f507be0d73
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
MASTER=bond1
SLAVE=yes

/etc/sysconfig/network-scripts/ifcfg-p1p2

DEVICE=p1p2
NAME=p1p2
UUID=7516861d-5dc2-4cd7-a4b0-525bb398d8f9
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
MASTER=bond1
SLAVE=yes

/etc/sysconfig/network-scripts/ifcfg-bond1

DEVICE=bond1
NAME=bond1
TYPE=Bond
BOOTPROTO=none
IPADDR="192.168.101.201"
NETMASK="255.255.255.0"
NM_CONTROLLED="no"

Bond2 interface

/etc/modprobe.d/bond2.conf

alias bond2 bonding
options bond2 primary=em3 miimon=100 mode=1 updelay=30000

/etc/sysconfig/network-scripts/ifcfg-em3

DEVICE=em3
NAME=em3
UUID=2215a365-2d29-45c1-8f68-fbee29713c86
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
MASTER=bond2
SLAVE=yes

/etc/sysconfig/network-scripts/ifcfg-p1p3

DEVICE=p1p3
NAME=p1p3
UUID=7aff546a-1dec-4763-b2fc-8cfcfdaf086b
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
MASTER=bond2
SLAVE=yes

/etc/sysconfig/network-interfaces/ifcfg-bond2

DEVICE=bond2
NAME=bond2
TYPE=Bond
BOOTPROTO=none
IPADDR="192.168.102.201"
NETMASK="255.255.255.0"
NM_CONTROLLED="no"

Setup /etc/hosts file

Run the following command to setup /etc/hosts file.

# cat << EOF >> /etc/hosts
192.168.1.200    centos-vm.home.labrats.us centos-vm
192.168.1.201    centos-vm1.home.labrats.us centos-vm1
192.168.1.202    centos-vm2.home.labrats.us centos-vm2
192.168.1.211    centos-vm1-ipmi.home.labrats.us centos-vm1-ipmi
192.168.1.212    centos-vm2-ipmi.home.labrats.us centos-vm2-ipmi
192.168.101.201  centos-vm1-cn.home.labrats.us centos-vm1-cn
192.168.101.202  centos-vm2-cn.home.labrats.us centos-vm2-cn
192.168.102.201  centos-vm1-sn.home.labrats.us centos-vm1-sn
192.168.102.202  centos-vm2-sn.home.labrats.us centos-vm2-sn

EOF

Configure Networking On Each Additional Node

Setup Gluster

Create Gluster Brick

mkfs.xfs -i size=512 /dev/sdb1
mkdir -p /data/brick1
vi /etc/fstab

Mount Brick

Add to /etc/fstab

/dev/sdb1 /data/brick1 xfs defaults 1 2

Mount the brick

# mount /data/brick1

Install Gluster

Install EPEL and Gluster Repos

We'll need to install the Gluster repo for the server package, and EPEL repo to satisfy dependencies.

# rpm -i https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm 
# wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-epel.repo

Install glusterfs-server

Now we can install glusterfs-server package.

# yum install glusterfs-server

Start Gluster Server

# service glusterd start
Redirecting to /bin/systemctl start  glusterd.service

Check the status of the gluster service

# service glusterd status
Redirecting to /bin/systemctl status  glusterd.service
glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled)
   Active: active (running) since Wed 2015-09-09 12:27:58 MDT; 12s ago
  Process: 13774 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid (code=exited, status=0/SUCCESS)
 Main PID: 13775 (glusterd)
   CGroup: /system.slice/glusterd.service
           └─13775 /usr/sbin/glusterd -p /var/run/glusterd.pid

Sep 09 12:27:58 centos-vm1.home.labrats.us systemd[1]: Started GlusterFS, a clustered file-system server.

Add Firewall Rules

We'll use firewall-cmd in this example. IPTables would have similar rules.

First we have to open up the port for the Gluster service. This runs on TCP port 24007.

# firewall-cmd --add-port=24007/tcp
# firewall-cmd --permanent --add-port=24007/tcp

Next we have to open up a port for each brick, starting at 49152 (for GlusterFS 3.4 and later, 24009 for GlusterFS 3.3 and older). Since we are running two bricks in each node, we only need to listen to 49152 and 49153.

# firewall-cmd --add-port=49152-49153/tcp
# firewall-cmd --permanent --add-port=49152-49153/tcp

If we want to open up GlusterFS to external nodes, we will also need to open up TCP ports 38465, 38466, 38468, 38469 and 2049.

Let's validate that is shows up:

# firewall-cmd --list-all
public (default, active)
  interfaces: virbr0
  sources: 
  services: dhcpv6-client ssh
  ports: 24007/tcp 49152-49153/tcp
  masquerade: no
  forward-ports: 
  icmp-blocks: 
  rich rules:

Configure each additional node

Configure each node in the Gluster Cluster with the same configuration.'

Configure Gluster Service

Probe each server from the other one

Node 1:

# gluster peer probe centos-vm2-sn

Node 2:

# gluster peer probe centos-vm1-sn

Create volume directory on each node

This needs to be ran on each node.

# mkdir /data/brick1/gv0

Create Gluster Volume

This only needs to be ran on ONE node.

# gluster volume create gv0 replica 2 centos-vm1-sn:/data/brick1/gv0 centos-vm2-sn:/data/brick1/gv0
volume create: gv0: success: please start the volume to access data

Start Gluster Volume

Again, this only needs to be done on one node.

# gluster volume start gv0
volume start: gv0: success

Check Gluster Volume Status

This can be ran on either node, and should be ran on each to validate operation.

# gluster volume info
 
Volume Name: gv0
Type: Replicate
Volume ID: 877e2a93-89c5-4f19-b98c-72d79abbae83
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: centos-vm1-sn:/data/brick1/gv0
Brick2: centos-vm2-sn:/data/brick1/gv0
Options Reconfigured:
performance.readdir-ahead: on

Set Gluster Service to start with the system

Run this command on each node.

# systemctl enable glusterd.service

Testing Gluster

Perform the following steps to test and validate that Gluster is operating correctly.

Mount Gluster volume on each server

On each node, mount the local Gluster volume.

# mount -t glusterfs localhost:/gv0 /mnt

Test Write & Synchronization

On node 1, run the following commands.

# for i in `seq -w 1 100`; do cp -rp /var/log/messages /mnt/copy-test-$i; done
<pre>

On node 1, check to see that this has been written to the local brick.

<pre>
# ls -lA /data/brick1/gv0 | wc -l

There should be 100 files on the local brick.

On node 2, check both the /mnt directory, and the local brick.

# ls -lA /mnt | wc -l
# ls -lA /data/brick1/gv0 | wc -l

Both should list 100 files. If the numbers are different, check that synchronization is working correctly.

Setup Second Gluster Volume

We will also need to setup a "gv1" volume as well, using the above steps.

Setup KVM

Install KVM

# yum -y install kvm virt-manager libvirt virt-install qemu-kvm
# yum -y install xauth xorg-x11-apps

Mount GlusterFS

# mkdir -p /var/lib/libvirt/images /var/lib/libvirt/configs
# mount -t glusterfs localhost:/gv0 /var/lib/libvirt/images
# mount -t glusterfs localhost:/gv1 /var/lib/libvirt/configs

Start KVM (libvirtd)

We must start libvirtd temporarily to create the virtual machine disk images and config files.

# systemctl start libvirtd.service

Create Virtual Machines

We will create two machines, test-centos-1 and test-centos-2.

# virt-install --connect qemu:///system -n test-centos-1 -r 2048 --vcpus=1 \
  --disk path=/var/lib/libvirt/images/test-centos-1.img,size=10 --graphics vnc,listen=0.0.0.0 \
  --noautoconsole --os-type linux --os-variant rhel7 --accelerate --network=bridge:virbr0 --hvm \
  --cdrom /var/lib/libvirt/images/CentOS-7.0-1406-x86_64-DVD.iso
# virt-install --connect qemu:///system -n test-centos-2 -r 2048 --vcpus=1 \
  --disk path=/var/lib/libvirt/images/test-centos-2.img,size=10 --graphics vnc,listen=0.0.0.0 \
  --noautoconsole --os-type linux --os-variant rhel7 --accelerate --network=bridge:virbr0 --hvm \
  --cdrom /var/lib/libvirt/images/CentOS-7.0-1406-x86_64-DVD.iso

Connect to VNC console and install the OD from CD image. This can be done via a VNC client application, or via the virt-manager GUI application.

Configure Virtual Machines

Configure Guest Network

Remove Network Manager software.

# /bin/systemctl disable NetworkManager
# /bin/systemctl disable NetworkManager-dispatcher
# yum erase NetworkManager-tui NetworkManager-glib NetworkManager

Run the commands below to set up a static ip address (192.168.1.221) and hostname (test-centos-1).

# export remote_hostname=guest1
# export remote_ip=192.168.1.221
# export remote_gateway=192.168.1.1

# hostnamectl set-hostname $remote_hostname

# sed -i.bak "s/.*BOOTPROTO=.*/BOOTPROTO=none/g" /etc/sysconfig/network-scripts/ifcfg-eth0

# cat << EOF >> /etc/sysconfig/network-scripts/ifcfg-eth0
IPADDR0=$remote_ip
PREFIX0=24
GATEWAY0=$remote_gateway
DNS1="216.136.95.2"
DNS2="64.132.94.250"
NM_CONTROLLED="no"
EOF

# systemctl restart network
# systemctl enable network.service
# systemctl enable sshd
# systemctl start sshd

# echo "checking connectivity"
# ping www.google.com

To simplify the tutorial we'll go ahead and disable selinux on the guest. We'll also need to poke a hole through the firewall on port 3121 (the default port for pacemaker_remote) so the host can contact the guest.

# setenforce 0
# # sed -i.bak "s/^SELINUX=.*/SELINUX=disabled/g" /etc/selinux/config
# firewall-cmd --add-port 3121/tcp --permanent
# firewall-cmd --add-port 3121/tcp

At this point you should be able to ssh into the guest from the host.

Configure pacemaker_remote

On the 'host' machine, run these commands to generate an authkey and copy it to the /etc/pacemaker folder on both the host and guest.

# mkdir -p --mode=0750 /etc/pacemaker
# chgrp haclient /etc/pacemaker
# dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1
# scp -r /etc/pacemaker root@192.168.1.221:/etc/

Now on the 'guest', install the pacemaker-remote package, and enable the daemon to run at startup. In the commands below, you will notice the pacemaker package is also installed. It is not required; the only reason it is being installed for this tutorial is because it contains the Dummy resource agent that we will use later for testing.

# yum install -y pacemaker pacemaker-remote resource-agents
# systemctl enable pacemaker_remote.service

Now start pacemaker_remote on the guest and verify the start was successful.

# systemctl start pacemaker_remote.service

# systemctl status pacemaker_remote

  pacemaker_remote.service - Pacemaker Remote Service
	  Loaded: loaded (/usr/lib/systemd/system/pacemaker_remote.service; enabled)
	  Active: active (running) since Thu 2013-03-14 18:24:04 EDT; 2min 8s ago
	Main PID: 1233 (pacemaker_remot)
	  CGroup: name=systemd:/system/pacemaker_remote.service
		  └─1233 /usr/sbin/pacemaker_remoted

  Mar 14 18:24:04 guest1 systemd[1]: Starting Pacemaker Remote Service...
  Mar 14 18:24:04 guest1 systemd[1]: Started Pacemaker Remote Service.
  Mar 14 18:24:04 guest1 pacemaker_remoted[1233]: notice: lrmd_init_remote_tls_server: Starting a tls listener on port 3121.

Verify Host Connection to Guest

Before moving forward, it's worth verifying that the host can contact the guest on port 3121. Here's a trick you can use. Connect using ssh from the host. The connection will get destroyed, but how it is destroyed tells you whether it worked or not.

First add guest1 to the host machine's /etc/hosts file if you haven't already. This is required unless you have dns setup in a way where guest1's address can be discovered.

# cat << EOF >> /etc/hosts
192.168.1.221     test-centos-1
EOF

If running the ssh command on one of the cluster nodes results in this output before disconnecting, the connection works.

# ssh -p 3121 test-centos-1
ssh_exchange_identification: read: Connection reset by peer

If you see this, the connection is not working.

# ssh -p 3121 test-centos-1
ssh: connect to host test-centos-1 port 3121: No route to host

Repeat for second (and additional) guest virtual machines.

Shut Down Virtual Machines

Power down guest virtual machines, as they will be controlled by Pacemaker, as documented below.

From the host, youcan run this command.

# virsh shutdown test-centos-1
# virsh shutdown test-centos-2

Export Virtual Machine Config Files

We will export the config files for the two virtual machines. This will later be used for loading into Pacemaker.

# virsh dumpxml test-centos-1 > /var/lib/libvirt/configs/test-centos-1.xml
# virsh dumpxml test-centos-2 > /var/lib/libvirt/configs/test-centos-2.xml

Shut Down KVM (libvirt)

We can shut down KVM, as it will be started by Pacemaker following the below steps.

# systemctl start libvirtd.service
# systemctl disable libvirtd.service

Unmount Gluster File Systems

We need to unmount the Gluster file systems, as we will be mounting them with pacemaker below.

# umount /var/lib/libvirt/images
# umount /var/lib/libvirt/configs

Setup Pacemaker Cluster

Pacemaker Firewall rules

Add the following rules for pacemaker/pcs/corosync to communicate correctly.

# firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --add-service=high-availability
# firewall-cmd --permanent --direct --add-rule ipv4 filter IN_public_allow 0 -p igmp -j ACCEPT
# firewall-cmd --direct --add-rule ipv4 filter IN_public_allow 0 -p igmp -j ACCEPT
# firewall-cmd --permanent --add-port=5405/tcp
# firewall-cmd --add-port=5405/tcp

Disable SELinux

# sed -i.bak "s/^SELINUX=.*/SELINUX=disabled/g" /etc/selinux/config

Install Pacemaker Packages

# yum install pacemaker corosync pcs resource-agents

Setup /etc/hosts file

Create the /etc/hosts file containing entries for each hypervisor host in the cluster.

# cat << EOF >> /etc/hosts
192.168.1.201     centos-vm1
192.168.1.202     centos-vm2
EOF

Setup Cluster Auth

Run the following command on ALL cluster to setup user authentication.

# passwd hacluster

Run the following on ONE node to authenticate the cluster nodes/

# pcs cluster auth centos-vm1-cn centos-vm2-cn

Setup Cluster

Run this on ALL hypervisor machines.

# pcs cluster setup --local --name mycluster centos-vm1-cn centos-vm2-cn

Start Cluster Software

Start the cluster. Run the following on ONE node.

# pcs cluster start --all

Verify Cluster Operation

Verify corosync membership

# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 centos-vm1.home.labrats.us (local)
         2          1 centos-vm2.home.labrats.us

Verify pacemaker status. It may take a moment or two for the other node to appear.

# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Tue Sep 15 14:16:51 2015		Last change: Tue Sep 15 14:14:01 2015 by hacluster via crmd on centos-vm2
Stack: corosync
Current DC: centos-vm2 (version 1.1.13-a14efad) - partition with quorum
2 nodes and 0 resources configured

Online: [ centos-vm1 centos-vm2 ]

Full list of resources:


PCSD Status:
  centos-vm1: Offline
  centos-vm2: Offline

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: inactive/disabled

Verify Corosync relationship.

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
	id	= 192.168.101.201
	status	= ring 0 active with no faults

Setup Pacemaker Active/Standby Cluster

We will initiall setup an active/standby cluster to get things running and tested. This only works in a two-node cluster.

Setup PCSD and Virtual IP

Start PCSD on Primary Node

We need to start PCSD on the Promary node, so it can generate the certificate and key files.

# systemctl start pcsd.service
# systemctl status pcsd.service
pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled)
   Active: active (running) since Tue 2015-09-15 13:54:50 MDT; 7s ago
 Main PID: 19779 (pcsd)
   CGroup: /system.slice/pcsd.service
           ├─19779 /bin/sh /usr/lib/pcsd/pcsd start
           ├─19783 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           └─19784 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb

Sep 15 13:54:50 centos-vm1-cn systemd[1]: Started PCS GUI and remote configuration interface.

Copy PCSD certificate and key files to other node

We need to make sure that both nodes have the same certificate and key files, or our browser will pester us each time the service fails over.

# cd /var/lib/pcsd
# scp -pv pcsd.* root@centos-vm2-cn:/var/lib/pcsd

Start PCSD opn Secondary Node

# systemctl start pcsd.service
# systemctl status pcsd.service
pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled)
   Active: active (running) since Tue 2015-09-15 14:01:24 MDT; 3s ago
 Main PID: 21311 (pcsd)
   CGroup: /system.slice/pcsd.service
           ├─21311 /bin/sh /usr/lib/pcsd/pcsd start
           ├─21315 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           ├─21316 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           └─21319 python /usr/lib/pcsd/systemd-notify-fix.py

Sep 15 14:01:24 centos-vm2-cn systemd[1]: Started PCS GUI and remote configuration interface.

Create Virtual IP Resource

Now we create the virtual IP resource.

First we will make a backup of the current cluster config.

# cd /tmp
# pcs cluster cib /tmp/cluster-active_config.orig

Now we will configure the virtual IP in an offline file.

# pcs cluster cib /tmp/VirtualIP_cfg

# pcs -f /tmp/VirtualIP_cfg resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.1.200 cidr_netmask=24 op monitor interval=30
# pcs -f /tmp/VirtualIP_cfg constraint location ClusterIP prefers centos-vm1-cn=200
# pcs -f /tmp/VirtualIP_cfg constraint location ClusterIP prefers centos-vm2-cn=50

Now we will push the configuration to the cluster.

# pcs cluster cib-push /tmp/VirtualIP_cfg
CIB updated

Now we can whow the status of the Virtual IP.

# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Tue Sep 15 14:26:37 2015		Last change: Tue Sep 15 14:25:45 2015 by root via cibadmin on centos-vm1-cn
Stack: corosync
Current DC: centos-vm2-cn (version 1.1.13-a14efad) - partition with quorum
2 nodes and 1 resource configured

Online: [ centos-vm1-cn centos-vm2-cn ]

Full list of resources:

 ClusterIP	(ocf::heartbeat:IPaddr2):	Stopped

PCSD Status:
  centos-vm1-cn: Online
  centos-vm2-cn: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled

Note that the service shows "stopped". This is because we have not configured or disabled fencing yet.

Disable Fencing

To disable fencing, run the following commands. We will discuss enabling it later.

# pcs property set stonith-enabled=false

Now we can check the status again, to see if the new service is running.

# pcs status
Cluster name: mycluster
Last updated: Tue Sep 15 14:28:59 2015		Last change: Tue Sep 15 14:28:55 2015 by root via cibadmin on centos-vm1-cn
Stack: corosync
Current DC: centos-vm2-cn (version 1.1.13-a14efad) - partition with quorum
2 nodes and 1 resource configured

Online: [ centos-vm1-cn centos-vm2-cn ]

Full list of resources:

 ClusterIP	(ocf::heartbeat:IPaddr2):	Started centos-vm1-cn

PCSD Status:
  centos-vm1-cn: Online
  centos-vm2-cn: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/disabled

Enable PCSD at system startup

We can tell the system to start pcsd at startup by running the below on ALL nodes.

# systemctl enable pcsd.service

Setup GlusterFS and Libvirtd (PCS)

Gluster setup is basically to control the mounted of the Gluster File system after system boot. This is needed as the Gluster application is not running when the system mounted the normal file systems, so can not be mounted at the same time.

To turn up GlusterFS mounted, running the following commands on an offline configuration file.

# pcs cluster cib /tmp/GlusterFS_cfg

# pcs -f /tmp/GlusterFS_cfg resource create gluster-configs Filesystem device="localhost:/gv1" directory="/var/lib/libvirt/configs/" fstype="glusterfs"
# pcs -f /tmp/GlusterFS_cfg resource create gluster-images Filesystem device="localhost:/gv0" directory="/var/lib/libvirt/images/" fstype="glusterfs"
# pcs -f /tmp/GlusterFS_cfg resource create libvirtd systemd:libvirtd
# pcs -f /tmp/GlusterFS_cfg constraint colocation add gluster-configs with gluster-images INFINITY
# pcs -f /tmp/GlusterFS_cfg constraint colocation add libvirtd with gluster-images INFINITY
# pcs -f /tmp/GlusterFS_cfg constraint order set gluster-configs gluster-images libvirtd sequential=true require-all=true setoptions kind=Serialize symmetrical=true

Now we can push the GlusterFS configs to the cluster.

# pcs cluster cib-push /tmp/GlusterFS_cfg
CIB updated

Let's show the cluster status.

# pcs status
Cluster name: mycluster
Last updated: Tue Sep 15 14:51:03 2015		Last change: Tue Sep 15 14:50:17 2015 by root via cibadmin on centos-vm1-cn
Stack: corosync
Current DC: centos-vm2-cn (version 1.1.13-a14efad) - partition with quorum
2 nodes and 4 resources configured

Online: [ centos-vm1-cn centos-vm2-cn ]

Full list of resources:

 ClusterIP	(ocf::heartbeat:IPaddr2):	Started centos-vm1-cn
 gluster-configs	(ocf::heartbeat:Filesystem):	Started centos-vm2-cn
 gluster-images	(ocf::heartbeat:Filesystem):	Started centos-vm2-cn
 libvirtd	(systemd:libvirtd):	Started centos-vm2-cn

PCSD Status:
  centos-vm1-cn: Online
  centos-vm2-cn: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Setup KVM (PCS)

We actually handled this in the previous section, setting the libvirtd resource to run co-located with the GlusterFS resources. We also serialized the order to ensure that GlusterFS resource was running and completed start up before libvirtd resource is attempted to start.

Setup Virtual Machines (PCS)

To turn up Virtual Machine resources, running the following commands on an offline configuration file.

# pcs cluster cib /tmp/virtual-machine_cfg

# pcs -f /tmp/virtual-machine_cfg resource create test-centos-1_vm VirtualDomain hypervisor="qemu:///system" config="/var/lib/libvirt/configs/test-centos-1.xml" meta remote-node=test-centos-1
# pcs -f /tmp/virtual-machine_cfg resource create test-centos-2_vm VirtualDomain hypervisor="qemu:///system" config="/var/lib/libvirt/configs/test-centos-2.xml" meta remote-node=test-centos-2
# pcs -f /tmp/virtual-machine_cfg constraint colocation add test-centos-1_vm with libvirtd INFINITY
# pcs -f /tmp/virtual-machine_cfg constraint colocation add test-centos-2_vm with libvirtd INFINITY
# pcs -f /tmp/virtual-machine_cfg constraint order start libvirtd then start test-centos-1_vm kind=Serialize symmetrical=true
# pcs -f /tmp/virtual-machine_cfg constraint order start libvirtd then start test-centos-2_vm kind=Serialize symmetrical=true

We also want to tell teh cluster not to attempt to run the GlusterFS on teh virtual machines.

# pcs -f /tmp/virtual-machine_cfg constraint location gluster-configs avoids test-centos-1
# pcs -f /tmp/virtual-machine_cfg constraint location gluster-images avoids test-centos-1
# pcs -f /tmp/virtual-machine_cfg constraint location gluster-configs avoids test-centos-2
# pcs -f /tmp/virtual-machine_cfg constraint location gluster-images avoids test-centos-2

Now we can push the Virtual Machine configs to the cluster.

# pcs cluster cib-push /tmp/virtual-machine_cfg
CIB updated

Let's show the cluster status.

# pcs status
Cluster name: mycluster
Last updated: Tue Sep 15 15:01:23 2015		Last change: Tue Sep 15 15:01:00 2015 by root via crm_resource on centos-vm2-cn
Stack: corosync
Current DC: centos-vm2-cn (version 1.1.13-a14efad) - partition with quorum
4 nodes and 8 resources configured

Online: [ centos-vm1-cn centos-vm2-cn ]
GuestOnline: [ test-centos-1@centos-vm2-cn test-centos-2@centos-vm2-cn ]

Full list of resources:

 ClusterIP	(ocf::heartbeat:IPaddr2):	Started centos-vm1-cn
 gluster-configs	(ocf::heartbeat:Filesystem):	Started centos-vm2-cn
 gluster-images	(ocf::heartbeat:Filesystem):	Started centos-vm2-cn
 libvirtd	(systemd:libvirtd):	Started centos-vm2-cn
 test-centos-1_vm	(ocf::heartbeat:VirtualDomain):	Started centos-vm2-cn
 test-centos-2_vm	(ocf::heartbeat:VirtualDomain):	Started centos-vm2-cn

PCSD Status:
  centos-vm1-cn: Online
  centos-vm2-cn: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Setup FAKE Services (PCS)

We want to run fake services on the virtual machines to test out the pacemaker-remote service. This is not a critical part of the cluster, but it can come in handy if we are trying to track cluster services, such as DNS, mail, etc.

To turn up FAKE service resource, running the following commands on an offline configuration file.

# pcs cluster cib /tmp/FAKE_cfg

# pcs -f /tmp/FAKE_cfg resource create FAKE1 ocf:pacemaker:Dummy
# pcs -f /tmp/FAKE_cfg resource create FAKE2 ocf:pacemaker:Dummy

We will also set preference for the FAKE resources, so they only run on the virtual guests.

# pcs -f /tmp/FAKE_cfg constraint location FAKE1 prefers test-centos-1=200
# pcs -f /tmp/FAKE_cfg constraint location FAKE1 avoids centos-vm1-cn
# pcs -f /tmp/FAKE_cfg constraint location FAKE1 avoids centos-vm2-cn
# pcs -f /tmp/FAKE_cfg constraint location FAKE2 prefers test-centos-2=200
# pcs -f /tmp/FAKE_cfg constraint location FAKE2 avoids centos-vm1-cn
# pcs -f /tmp/FAKE_cfg constraint location FAKE2 avoids centos-vm2-cn

Now we can push the FAKE Service configs to the cluster.

# pcs cluster cib-push /tmp/FAKE_cfg
CIB updated

Let's show the cluster status.

# pcs status
Cluster name: mycluster
Last updated: Tue Sep 15 15:20:33 2015		Last change: Tue Sep 15 15:20:23 2015 by root via cibadmin on centos-vm1-cn
Stack: corosync
Current DC: centos-vm2-cn (version 1.1.13-a14efad) - partition with quorum
4 nodes and 10 resources configured

Online: [ centos-vm1-cn centos-vm2-cn ]
GuestOnline: [ test-centos-1@centos-vm2-cn test-centos-2@centos-vm2-cn ]

Full list of resources:

 ClusterIP	(ocf::heartbeat:IPaddr2):	Started centos-vm1-cn
 gluster-configs	(ocf::heartbeat:Filesystem):	Started centos-vm2-cn
 gluster-images	(ocf::heartbeat:Filesystem):	Started centos-vm2-cn
 libvirtd	(systemd:libvirtd):	Started centos-vm2-cn
 test-centos-1_vm	(ocf::heartbeat:VirtualDomain):	Started centos-vm2-cn
 test-centos-2_vm	(ocf::heartbeat:VirtualDomain):	Started centos-vm2-cn
 FAKE1	(ocf::pacemaker:Dummy):	Started test-centos-1
 FAKE2	(ocf::pacemaker:Dummy):	Started test-centos-2

PCSD Status:
  centos-vm1-cn: Online
  centos-vm2-cn: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Set Gluster Resource to Clone

One last thing we want to do is set GlusterFS resources to be cloned. This will start the resources on all hypervisor nodes, and will quicken the recovers after node failure.

To turn up GlusterFS Clone service resource, running the following commands on an offline configuration file.

# pcs cluster cib /tmp/GlusterFS-Clone_cfg

# pcs -f /tmp/GlusterFS-Clone_cfg resource clone gluster-configs
# pcs -f /tmp/GlusterFS-Clone_cfg resource clone gluster-images

Now we can push the GlusterFS Clone Service configs to the cluster.

# pcs cluster cib-push /tmp/GlusterFS-Clone_cfg
CIB updated

Let's show the cluster status.

# pcs status
Cluster name: mycluster
Last updated: Tue Sep 15 15:24:53 2015		Last change: Tue Sep 15 15:24:48 2015 by root via cibadmin on centos-vm1-cn
Stack: corosync
Current DC: centos-vm2-cn (version 1.1.13-a14efad) - partition with quorum
4 nodes and 16 resources configured

Online: [ centos-vm1-cn centos-vm2-cn ]
GuestOnline: [ test-centos-1@centos-vm2-cn test-centos-2@centos-vm2-cn ]

Full list of resources:

 ClusterIP	(ocf::heartbeat:IPaddr2):	Started centos-vm1-cn
 libvirtd	(systemd:libvirtd):	Started centos-vm2-cn
 test-centos-1_vm	(ocf::heartbeat:VirtualDomain):	Started centos-vm2-cn
 test-centos-2_vm	(ocf::heartbeat:VirtualDomain):	Started centos-vm2-cn
 FAKE1	(ocf::pacemaker:Dummy):	Started test-centos-1
 FAKE2	(ocf::pacemaker:Dummy):	Started test-centos-2
 Clone Set: gluster-configs-clone [gluster-configs]
     Started: [ centos-vm1-cn centos-vm2-cn ]
     Stopped: [ test-centos-1 test-centos-2 ]
 Clone Set: gluster-images-clone [gluster-images]
     Started: [ centos-vm1-cn centos-vm2-cn ]
     Stopped: [ test-centos-1 test-centos-2 ]

PCSD Status:
  centos-vm1-cn: Online
  centos-vm2-cn: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Fencing

To configure fencing, there are several ways to do this, including isolating the network, isolating the power, or simply rebooting the machine. There are some that argue that the fence device should not share power with the device being fenced, since in the event of a power failure, the cluster will not be able to fence a failed device, and the cluster will remain in a partially failed state. This can be solved by configuring layered fencing, however this can be left to future discussions.

Configure Fencing and Constraints

The following commands will configure fencing on an offline file, which will be applied later. We set constraints so that the fencing resource on each node does not run on the node that it would act upon.

# pcs cluster cib /tmp/fencing_cfg

# pcs -f /tmp/fencing_cfg stonith create fence_centos-vm1_ipmi fence_ipmilan pcmk_host_list="centos-vm1-cn" ipaddr="centos-vm1-ipmi" login=fencer passwd=2eznRKdeTu7cEee op monitor interval=60s
# pcs -f /tmp/fencing_cfg constraint location fence_centos-vm1_ipmi avoids centos-vm1-cn

# pcs -f /tmp/fencing_cfg stonith create fence_centos-vm2_ipmi fence_ipmilan pcmk_host_list="centos-vm2-cn" ipaddr="centos-vm2-ipmi" login=fencer passwd=2eznRKdeTu7cEee op monitor interval=60s
# pcs -f /tmp/fencing_cfg constraint location fence_centos-vm2_ipmi avoids centos-vm2-cn

Now we can apply the config.

# pcs cluster cib-push /tmp/fencing_cfg

Finally, we can enable fencing on the cluster.

# pcs property set stonith-enabled=true

Fencing Delay

It may be useful to set a delay for the fencing devices. The default delay is 0, which means that the fence will act immediately upon failure. A delay would be useful to deal with momentart loss of communication between nodes, such as those caused by changes in switch topology.

If you wish to set a delay, you can run the following commands.

# pcs cluster cib /tmp/fencing-delay_cfg
# pcs -f /tmp/fencing-delay_cfg stonith update fence_centos-vm1_ipmi delay=15
# pcs -f /tmp/fencing-delay_cfg stonith update fence_centos-vm2_ipmi delay=15
# pcs cluster cib-push /tmp/fencing-delay_cfg

System Testing

For testing, it is useful to run the crm_mon command, which will continuously monitor the cluster state. Use <CTL>-C to exit the program.

Last updated: Wed Sep 16 16:17:14 2015          Last change: Wed Sep 16 16:11:54 2015 by root via cibadmin on centos-vm2-cn
Stack: corosync
Current DC: centos-vm2-cn (version 1.1.13-a14efad) - partition with quorum
4 nodes and 18 resources configured

Online: [ centos-vm1-cn centos-vm2-cn ]
GuestOnline: [ test-centos-1@centos-vm2-cn test-centos-2@centos-vm2-cn ]

ClusterIP	(ocf::heartbeat:IPaddr2):	Started centos-vm1-cn
libvirtd        (systemd:libvirtd):     Started centos-vm2-cn
test-centos-1_vm        (ocf::heartbeat:VirtualDomain): Started centos-vm2-cn
test-centos-2_vm        (ocf::heartbeat:VirtualDomain): Started centos-vm2-cn
FAKE1   (ocf::pacemaker:Dummy): Started test-centos-1
FAKE2   (ocf::pacemaker:Dummy): Started test-centos-2
 Clone Set: gluster-configs-clone [gluster-configs]
     Started: [ centos-vm1-cn centos-vm2-cn ]
 Clone Set: gluster-images-clone [gluster-images]
     Started: [ centos-vm1-cn centos-vm2-cn ]
fence_centos-vm1_ipmi   (stonith:fence_ipmilan):        Started centos-vm2-cn
fence_centos-vm2_ipmi   (stonith:fence_ipmilan):        Started centos-vm1-cn

Pacemaker_Remote Node Failure

To test a remote node failure, simply log into the node and run the following command.

# killall -9 pacemaker_remoted

You will see in crm-mon, where the remote node changed to FAILED state, and was immediately restarted.

Cluster Node Failure

To test a full cluster node, you will want to run the following command.

# killall -9 corosync

You will see in crm_mon where this node will go into "Offline/UNCLEAN" state. If you have fencing enabled without delay, it will reboot the node. If you have fencing disabled, or set for delay, you will see corosync restart, and the node rejoin the cluster.

You can do further testing by isolating the node, however care must be taken with a two-node cluster so that the isolation of the two nodes does not cause the nodes to reboot each other, resulting in a cluster without any nodes.

Shut Down PCS Cluster

You will be able to shut down a single node in a cluster using the following command. Shutting down the node gracefully will ensure that all resources are stopped, and that other nodes will restart those resources, as applicable, without enacting fencing and rebooting the node.

# pcs cluster stop

Other Observations While Testing

It has been observed that some resources will be restarted on the cluster under failure condition, even though those resources are not moving to another node. I have not been able to determine why this happens, only that it happens some of the time. Further research into this should be done.

Converting to Active/Active Cluster

To convert to an active/active cluster, we should only need to convert the libvirtd resource to a cloned resource, and then assign some location preference to the virtual machines.

Configure Libvirtd Clone

You can run the following commands to configure the libvirtd resource for cloning using an offline config file.

# pcs cluster cib /tmp/libvirtd-clone_cfg
# pcs -f /tmp/libvirtd-clone_cfg resource clone libvirtd

Now you can push the config to the cluster.

# pcs cluster cib-push /tmp/libvirtd-clone_cfg
CIB updated

Configure Resource Preference

You can determine where to normally run the resources by configuring location preferences.

We will do this on an offline config file.

# pcs cluster cib /tmp/vm-preferences_cfg
# pcs -f /tmp/vm-preferences_cfg constraint location test-centos-1_vm prefers centos-vm1-cn=200
# pcs -f /tmp/vm-preferences_cfg constraint location test-centos-1_vm prefers centos-vm2-cn=50
# pcs -f /tmp/vm-preferences_cfg constraint location test-centos-2_vm prefers centos-vm2-cn=200
# pcs -f /tmp/vm-preferences_cfg constraint location test-centos-2_vm prefers centos-vm1-cn=50

Now you can push the config to the cluster.

# pcs cluster cib-push /tmp/vm-preferences_cfg
CIB updated

Resource Stickiness

Stickness is the cost of moving a resource from it's present location. The default is unset (equivalent to 0). If the the location preference is higher than the stickiness value, then the resource will move to the preferred location, assuming it is available. If the location preference is lower than the stickiness value, then the resource will remain in the current location until further action is taken on the resource, or the location becomes unavailable due to maintenance or failure.

To set the default value, run the following command.

# pcs resource defaults resource-stickiness=100

To set the value per resource, teh commands are ran as shown below.

# pcs resource meta test-centos-1_vm resource-stickiness=250
# pcs resource meta test-centos-2_vm resource-stickiness=250

To unset stickiness, use one of the following forms.

# pcs resource defaults resource-stickiness=
# pcs resource meta test-centos-1_vm resource-stickiness=
# pcs resource meta test-centos-2_vm resource-stickiness=

Retry Start Failures

One of the frustrating issues that I have seen with Pacemaker is the fatality of failure to start. It appears that this can be partially solved with setting the start-failure-is-fatal default setting. By default, it is set to true so that any failure to start a resource will never be retried.

# pcs resource defaults start-failure-is-fatal=false
# pcs resource update <resource> meta failure-timeout="30s"
# pcs resource op delete <resource> start
# pcs resource op add <resource> start interval=0s timeout=90 on-fail="restart"
# pcs resource op delete <resource> monitor
# pcs resource op add <resource> monitor interval=10 timeout=30 on-fail="restart"

After set, the resource will look something like this:

# pcs resource show test-centos-1_vm
 Resource: test-centos-1_vm (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: hypervisor=qemu:///system config=/var/lib/libvirt/configs/test-centos-1.xml 
  Meta Attrs: remote-node=test-centos-1 resource-stickiness=250 failure-timeout=30s 
  Operations: stop interval=0s timeout=90 (test-centos-1_vm-stop-timeout-90)
              start interval=0s timeout=90 on-fail=restart (test-centos-1_vm-name-start-interval-0s-on-fail-restart-timeout-90)
              monitor interval=10 timeout=30 on-fail=restart (test-centos-1_vm-name-monitor-interval-10-on-fail-restart-timeout-30)

Manually Move Resource

You can manually move resources by running a series of commands.

# pcs constraint location test-centos-1_vm prefers centos-vm1-cn=INFINITY
# pcs constraint location test-centos-1_vm prefers centos-vm1-cn=200

First command sets the location preference to INFINITY, ensuring he move, if the location is available. The second command sets the location preference bact to what it was previously set to.

Adding Additional Nodes

You can add additional nodes using the below procedures, however you should always have an odd number of nodes to ensure that you are able to achieve quorum. Quorum is achieved when there are 50% of the configured nodes present, plus one additional node. The exception to this is the two-node cluster, where a single node will be able to achieve quorum.

Helpful Commands

crm_resource

Move Rsource

To move a resource manually from one node to another node, the following command will need to be set.

# crm_resource --resource  <resource> --move

OR

# crm_resource --resource <resource> --move --node <node>

NOTE: This will set the location preference on the current node to -INFINITY, which will need to be cleared to resume normal operations.

To permit, but not force a resource to return to normal operation, and return to preferred node on it's own.

# crm_resource --resource <resource> --un-move

pcs resource

Show Resources

This shows the defined resources, and their current configuration values.

# pcs resource show --full

pcs constraint

Show Constraints

To show constraints, use the following command:

# pcs constraint show

To show full constraints, including constraint ID, use the following:

# pcs constraint show --full

Additional Resources

Useful web links: