Concepts

In this article, we will explore how to configure Pacemaker and STONITH (Shoot The Other Node In The Head) for high availability in an Azure environment for SAP workloads. Pacemaker is an advanced, scalable, and open-source cluster resource manager, while STONITH is a technique used to ensure the isolation of failed nodes in a cluster.

Before we dive into the configuration, let’s understand the importance of using Pacemaker and STONITH in an Azure environment for SAP workloads. SAP workloads are typically critical and require high availability to ensure continuous business operations. Pacemaker helps ensure that the workloads are highly available by managing and monitoring cluster resources, while STONITH ensures that a faulty node is safely fenced off from the cluster, preventing any potential data corruption.

Now, let’s proceed with the step-by-step configuration.

Step 1: Create Azure Virtual Machines

First, we need to create multiple Azure Virtual Machines (VMs) that will form our cluster. These VMs should be located in the same Azure Virtual Network (VNet) and should have the necessary SAP software installed. Ensure that the VMs are running the supported operating system for SAP workloads.

Step 2: Install Required Software

On each VM, install the required software: Pacemaker and STONITH. These packages can be installed using the package manager of your operating system. For example, on a Linux-based distribution such as SUSE Linux Enterprise Server (SLES), you can use the following commands:

sudo zypper install pacemaker
sudo zypper install stonith

Step 3: Configure Pacemaker

Once the software is installed, we need to configure Pacemaker. The configuration is typically done through a configuration file, which can be located at /etc/corosync/corosync.conf on SLES.

Open the configuration file using a text editor and configure the following parameters:

  • Set the cluster name:

totem {
cluster_name:
}

  • Specify the cluster interfaces:

nodelist {
node {
ring0_addr:
name:
nodeid: 1
}
node {
ring0_addr:
name:
nodeid: 2
}
}

  • Configure the quorum policy:

quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}

  • Define the resources to be managed by Pacemaker. For example, you can define a resource for the SAP application:

primitive sap_app ocf:sap: \
op stop timeout="180" interval="0" \
op start timeout="180" interval="0" \
op monitor timeout="30" interval="60"

These are just examples, and you should refer to the specific documentation for your SAP resource agent and configuration details.

Step 4: Configure STONITH

Now, let’s configure STONITH to ensure safe node isolation. There are various STONITH mechanisms available, such as power fencing, IPMI, or virtual power fencing. In an Azure environment, we can leverage Azure PowerShell and Azure CLI commands to achieve fencing.

For example, you can use Azure PowerShell to configure a power fencing STONITH mechanism:

Add-AzVmssDiskEncryptionSet -ResourceId -DiskEncryptionSetId

Again, please refer to the Azure documentation for the specific STONITH mechanism you want to implement.

Step 5: Start the Cluster

Once the Pacemaker and STONITH configurations are complete, we can start the cluster by starting the corosync service:

sudo systemctl start corosync

And then, start the pacemaker service:

sudo systemctl start pacemaker

Step 6: Test High Availability

To ensure high availability, test the failover scenario by simulating a failure on one of the nodes. You can do this by powering off one of the VMs or by intentionally stopping the pacemaker or SAP application on one of the nodes.

Observe if the resources are successfully transferred to the other available node. Monitor the cluster status using commands such as crm_mon or Pacemaker’s web-based GUI.

Conclusion

By configuring Pacemaker and STONITH in an Azure environment for SAP workloads, you can ensure high availability and fault tolerance. Pacemaker manages and monitors cluster resources, while STONITH isolates failed nodes to prevent data corruption. Follow the steps outlined in this article to properly configure Pacemaker and STONITH, and test the high availability of your SAP workloads in an Azure environment.

Please note that this article provides a high-level overview of the configuration process. For detailed configuration specifics, consult the Microsoft Azure documentation and the specific documentation for your SAP resource agent.

Answer the Questions in Comment Section

Which resource agent is used to configure STONITH for Pacemaker in Azure?

  • a) pacemaker_resource
  • b) pacemaker-stonith-agent
  • c) azure_stonith_agent
  • d) azurehdf_stonith

Correct answer: b) pacemaker-stonith-agent

True or False: STONITH stands for “Storage or Network Infrastructure Termination with Host Isolation Technology.”

Correct answer: False

Which command can be used to verify the status of the STONITH resource in Pacemaker?

  • a) crm_mon
  • b) stonith_status
  • c) pacemaker_status
  • d) cluster_status

Correct answer: a) crm_mon

True or False: In Azure, STONITH is a built-in feature and requires no additional configuration.

Correct answer: True

When configuring STONITH in Azure, which of the following is a valid fencing mechanism?

  • a) Virtual Machine Scale Sets
  • b) Azure Load Balancer
  • c) Azure Site Recovery
  • d) Azure Kubernetes Service

Correct answer: a) Virtual Machine Scale Sets

True or False: Pacemaker automatically configures STONITH for high availability of SAP workloads in Azure.

Correct answer: False

Which component of Azure is commonly used as the STONITH device for Pacemaker?

  • a) Azure Load Balancer
  • b) Azure Virtual Network
  • c) Azure Service Fabric
  • d) Azure Blob Storage

Correct answer: a) Azure Load Balancer

True or False: STONITH is primarily used to prevent split-brain scenarios in a Pacemaker cluster.

Correct answer: True

What is the purpose of STONITH fencing in a Pacemaker cluster?

  • a) To ensure only authorized access to the cluster resources.
  • b) To enforce resource distribution across multiple nodes.
  • c) To isolate and take down unresponsive or failed nodes.
  • d) To synchronize the time across all cluster nodes.

Correct answer: c) To isolate and take down unresponsive or failed nodes.

Multiple Select: Which of the following actions can be triggered by a STONITH device in Pacemaker? (Select all that apply)

  • a) Power off a node
  • b) Reboot a node
  • c) Reset a node
  • d) Migrate a resource to another node

Correct answer: a) Power off a node, b) Reboot a node, c) Reset a node

0 0 votes
Article Rating
Subscribe
Notify of
guest
22 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Suzanne Patterson
1 year ago

Thank you for the detailed explanation on configuring Pacemaker and STONITH!

Brajan Živadinović

I’m configuring Pacemaker on Ubuntu for my SAP workloads on Azure. Any specific recommendations?

Daniel Santos
7 months ago

How do you ensure STONITH is correctly configured in a cluster?

Brayden Caldwell
1 year ago

Is it possible to use Pacemaker without fencing?

Merigley AraĂşjo
11 months ago

Can someone explain the role of corosync in the Pacemaker setup?

Svyatoyar Shimchuk
9 months ago

We faced issues with quorum in our two-node Pacemaker cluster. Any advice?

Sushmitha Nair
1 year ago

This blog post is very informative about configuring Pacemaker and STONITH for AZ-120. Thanks for sharing!

Aida Wenting
9 months ago

Can anyone explain the role of STONITH in Pacemaker?

22
0
Would love your thoughts, please comment.x
()
x