Skip to main content

How To Create a Linux Cluster in Red Hat/Centos 7

During the OS upgrade cycle to version 7 the team over at Red Hat made some changes under the hood that affected not only the basic packages but also how clustering is done in Linux. These changes resulted from the migration to the new Pacemaker/Corosync engine that powers the tokens and heartbeats needed to maintain quorum as well as the consolidation of the userspace programs used to configure and monitor the cluster services behind the newly introduced pcs utility.
Fortunately, the basics and scenarios behind our previous linux clustering tutorials are sound and they can still be followed with only a few modifications. Rather than making the changes directly to Part 1Part 2and Part 3 we will leave those articles unmodified as they are accurate and can be followed verbatim when on Red Hat Enterprise Linux 6 or CentOS Linux 6.
Instead, what is produced here is something similar to what Toki Winter has written on creating an ApacheHA cluster. Below you will find a power user's guide to the commands and surface changes brought to you in RCS version 7.

Installation

Our first note is on the installation steps. There is now an additional package to add this time around pcs.
yum install pcs fence-agents-all
Also, note that ccsricci and luci are deprecated. All cluster configuration, operation and monitoring is either performed from the command line using pcs or from the desktop using pcs-gui. From version 7 onwards all Fedora based distributions including CentOS Linux and Red Hat Enterprise Linux now usedSystemd as the init process and for run control. This means that we no longer use service and chkconfigbut instead rely on systemctl as the manager for our start up processes.
What this really means when clustering is that you no longer have to remember to start the individualcman,clvmd,rgmanager processes and in the correct sequence. Now to start/stop the cluster you just start a single service at the required milestone.
chkconfig cman on;chkconfig clvmd on;chkconfig rgmanager on;
service cman start;service clvmd start;service rgmanager start;
becomes simply.
systemctl enable pcsd.service
systemctl start pcsd.service
pcs cluster start -all
The firewall has now been migrated from pure iptables to firewalld and the rules for the cluster can be added by including the high-availablity service to the rules table.
firewall-cmd --permanent --add-service=high-availability
firewall-cmd --add-service=high-availability
firewall-cmd --list-service

Configuration

The configuration utility ccs is no longer available and has been replaced by the pcs command.
pcs will now do both the synchronization between nodes and the addition of members and services. Before configuration syncing was done with ricci which has been removed. Now you are required to create a common system user called hacluster. This user will be used by both pcs and the pcsd service to manager cluster changes.
Configuration has also been split out from the monolithic cluster.conf XML file into two separate files. Cluster configuration in release 7 is in /etc/corosync/corosync.conf for membership and quorum configuration and /var/lib/heartbeat/crm/cib.xml for cluster node and resource configuration.
There is no need to edit these manually as like in release 6 all modifications can be done in pcs.
pcs cluster setup <option> <member> ...
replaces
ccs -h <master member> --<option> <member> ...
A cluster can also now be destroyed by issuing the command
pcs cluster destroy <cluster>

Operation

To check cluster status do
pcs status
as opposed to clustat. You can also print the full cluster configuration with
pcs config

Resource Relocation

Resources and services are managed using
pcs resource
This replaces the previous clusvcadm command that was use to relocate, enable and disable service groups. The new way to do this is by issuing a move.
pcs resource move <resource>
to move back
pcs resource clear <resource>
Note, resource allocation and movement can also be affected by setting up constraints.
pcs contraint <type> <option>
Starting and Stopping is done by calling the pcs cluster with the start --all flags.

Start/Stop

pcs cluster start --all
or
pcs cluster start <nodename>
While stopping is performed by swapping out start with stop from the above.
Cluster members can be monitored using
pcs status corosync

Maintenance

Putting a node in standby will no longer necessitate its graceful exit from the cluster by doing as in version 6:
service rgmanager stop
service clvmd stop
service cman stop
the correct procedure is now to mark the node as being in standby mode.
pcs cluster standby <node-undergoing-maintenance>
Once the maintenance is complete you simple unstandby the node.
pcs cluster unstandby <node-exiting-maintenance>
For further reading on pacemaker and for more additional examples try reviewing Clusters From Scratchby ClusterLabs.

Comments

Post a Comment

Popular posts from this blog

Solaris. remove unusable scsi lun

Solaris remove unusable or failing scsi lun 1. The removed devices show up as drive not available in the output of the format command: # format Searching for disks...done ................      255. c1t50000974082CCD5Cd249 <drive not available>           /pci@3,700000/SUNW,qlc@0/fp@0,0/ssd@w50000974082ccd5c,f9 ................      529. c3t50000974082CCD58d249 <drive not available>           /pci@7,700000/SUNW,qlc@0/fp@0,0/ssd@w50000974082ccd58,f9 2. After the LUNs are unmapped Solaris displays the devices as either unusable or failing. # cfgadm -al -o show_SCSI_LUN | grep -i unusable # # cfgadm -al -o show_SCSI_LUN | grep -i failing c1::50000974082ccd5c,249       disk         connected    configured   failing c3::50000974082ccd58,249 ...

memory error detect XSCF uboot

If you see something like this when you poweron you server: memory error detect 80000008, address 000002d0 data 55555555 -> fbefaaaa capture_data hi fbefaaaa lo deadbeef ecc 1b1b capture_attributes 01113001 address 000002d0 memory error detect 80000008, address 000002d4 data aaaaaaaa -> deadbeef capture_data hi fbefaaaa lo deadbeef ecc 1b1b capture_attributes 01113001 address 000002d4 memXSCF uboot  01070000  (Feb  8 2008 - 11:12:19) XSCF uboot  01070000  (Feb  8 2008 - 11:12:19) SCF board boot factor = 7180     DDR Real size: 256 MB     DDR: 224 MB Than your XSCF card is broked. Replace it with new one. After that it will ask you for enter chassis number - located at front of the server XSCF promt to enter your chasses number ( is a S/N of your server ): Please input the chassis serial number : XXXXXXX 1:PANEL Please select the number : 1 Restoring data from PANEL to XSCF#0. Please wait for se...

SPARC OBP cheatsheet

Boot PROM Basics Boot PROM(programmable read only memory): It is a firmware (also known as the monitor program) provides: 1. basic hardware testing & initialization before booting. 2. contains a user interface that provide access to many important functions. 3. enables the system to boot from wide range of devices. It controls the system operation before the kernel becomes available. It provides a user interface and firmware utility commands known as FORTH command set. These commands include the boot commands, the diagnostic commands & the commands for modifying the default configuration. Command to determine the version of the Open Boot PROM on the system: # /usr/platform/'uname -m'/sbin/prtdiag -v (output omitted) System PROM revisions: ---------------------- OBP 4.16.4 2004/12/18 05:21 Sun Blade 1500 (Silver) OBDIAG 4.16.4.2004/12/18 05:21 # prtconf -v OBP 4.16.4 2004/12/18 05:21 Open Boot Architectures Standards: It is based on IEEE standard #1275, accord...