Concepts
In this article, we will explore how to configure Pacemaker and STONITH (Shoot The Other Node In The Head) for high availability in an Azure environment for SAP workloads. Pacemaker is an advanced, scalable, and open-source cluster resource manager, while STONITH is a technique used to ensure the isolation of failed nodes in a cluster.
Before we dive into the configuration, let’s understand the importance of using Pacemaker and STONITH in an Azure environment for SAP workloads. SAP workloads are typically critical and require high availability to ensure continuous business operations. Pacemaker helps ensure that the workloads are highly available by managing and monitoring cluster resources, while STONITH ensures that a faulty node is safely fenced off from the cluster, preventing any potential data corruption.
Now, let’s proceed with the step-by-step configuration.
Step 1: Create Azure Virtual Machines
First, we need to create multiple Azure Virtual Machines (VMs) that will form our cluster. These VMs should be located in the same Azure Virtual Network (VNet) and should have the necessary SAP software installed. Ensure that the VMs are running the supported operating system for SAP workloads.
Step 2: Install Required Software
On each VM, install the required software: Pacemaker and STONITH. These packages can be installed using the package manager of your operating system. For example, on a Linux-based distribution such as SUSE Linux Enterprise Server (SLES), you can use the following commands:
sudo zypper install pacemaker
sudo zypper install stonith
Step 3: Configure Pacemaker
Once the software is installed, we need to configure Pacemaker. The configuration is typically done through a configuration file, which can be located at /etc/corosync/corosync.conf
on SLES.
Open the configuration file using a text editor and configure the following parameters:
- Set the cluster name:
totem {
cluster_name:
}
- Specify the cluster interfaces:
nodelist {
node {
ring0_addr:
name:
nodeid: 1
}
node {
ring0_addr:
name:
nodeid: 2
}
}
- Configure the quorum policy:
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}
- Define the resources to be managed by Pacemaker. For example, you can define a resource for the SAP application:
primitive sap_app ocf:sap:
op stop timeout="180" interval="0" \
op start timeout="180" interval="0" \
op monitor timeout="30" interval="60"
These are just examples, and you should refer to the specific documentation for your SAP resource agent and configuration details.
Step 4: Configure STONITH
Now, let’s configure STONITH to ensure safe node isolation. There are various STONITH mechanisms available, such as power fencing, IPMI, or virtual power fencing. In an Azure environment, we can leverage Azure PowerShell and Azure CLI commands to achieve fencing.
For example, you can use Azure PowerShell to configure a power fencing STONITH mechanism:
Add-AzVmssDiskEncryptionSet -ResourceId
Again, please refer to the Azure documentation for the specific STONITH mechanism you want to implement.
Step 5: Start the Cluster
Once the Pacemaker and STONITH configurations are complete, we can start the cluster by starting the corosync
service:
sudo systemctl start corosync
And then, start the pacemaker
service:
sudo systemctl start pacemaker
Step 6: Test High Availability
To ensure high availability, test the failover scenario by simulating a failure on one of the nodes. You can do this by powering off one of the VMs or by intentionally stopping the pacemaker or SAP application on one of the nodes.
Observe if the resources are successfully transferred to the other available node. Monitor the cluster status using commands such as crm_mon
or Pacemaker’s web-based GUI.
Conclusion
By configuring Pacemaker and STONITH in an Azure environment for SAP workloads, you can ensure high availability and fault tolerance. Pacemaker manages and monitors cluster resources, while STONITH isolates failed nodes to prevent data corruption. Follow the steps outlined in this article to properly configure Pacemaker and STONITH, and test the high availability of your SAP workloads in an Azure environment.
Please note that this article provides a high-level overview of the configuration process. For detailed configuration specifics, consult the Microsoft Azure documentation and the specific documentation for your SAP resource agent.
Answer the Questions in Comment Section
Which resource agent is used to configure STONITH for Pacemaker in Azure?
- a) pacemaker_resource
- b) pacemaker-stonith-agent
- c) azure_stonith_agent
- d) azurehdf_stonith
Correct answer: b) pacemaker-stonith-agent
True or False: STONITH stands for “Storage or Network Infrastructure Termination with Host Isolation Technology.”
Correct answer: False
Which command can be used to verify the status of the STONITH resource in Pacemaker?
- a) crm_mon
- b) stonith_status
- c) pacemaker_status
- d) cluster_status
Correct answer: a) crm_mon
True or False: In Azure, STONITH is a built-in feature and requires no additional configuration.
Correct answer: True
When configuring STONITH in Azure, which of the following is a valid fencing mechanism?
- a) Virtual Machine Scale Sets
- b) Azure Load Balancer
- c) Azure Site Recovery
- d) Azure Kubernetes Service
Correct answer: a) Virtual Machine Scale Sets
True or False: Pacemaker automatically configures STONITH for high availability of SAP workloads in Azure.
Correct answer: False
Which component of Azure is commonly used as the STONITH device for Pacemaker?
- a) Azure Load Balancer
- b) Azure Virtual Network
- c) Azure Service Fabric
- d) Azure Blob Storage
Correct answer: a) Azure Load Balancer
True or False: STONITH is primarily used to prevent split-brain scenarios in a Pacemaker cluster.
Correct answer: True
What is the purpose of STONITH fencing in a Pacemaker cluster?
- a) To ensure only authorized access to the cluster resources.
- b) To enforce resource distribution across multiple nodes.
- c) To isolate and take down unresponsive or failed nodes.
- d) To synchronize the time across all cluster nodes.
Correct answer: c) To isolate and take down unresponsive or failed nodes.
Multiple Select: Which of the following actions can be triggered by a STONITH device in Pacemaker? (Select all that apply)
- a) Power off a node
- b) Reboot a node
- c) Reset a node
- d) Migrate a resource to another node
Correct answer: a) Power off a node, b) Reboot a node, c) Reset a node
Thank you for the detailed explanation on configuring Pacemaker and STONITH!
I’m configuring Pacemaker on Ubuntu for my SAP workloads on Azure. Any specific recommendations?
How do you ensure STONITH is correctly configured in a cluster?
Is it possible to use Pacemaker without fencing?
Can someone explain the role of corosync in the Pacemaker setup?
We faced issues with quorum in our two-node Pacemaker cluster. Any advice?
This blog post is very informative about configuring Pacemaker and STONITH for AZ-120. Thanks for sharing!
Can anyone explain the role of STONITH in Pacemaker?