Fencing with switch

dbPrerequisites:

  1. A managed switch supporting SNMP
  2. Write access to the switch through SNMP

The idea behind this method is to either isolate the entire node or isolate the node from shared storage. The way this is done is to call the switch using the proper command to disable one or more port(s) on the switch and doing so effectively avoid the node from being able to start a VM or CT on the shared storage since no route will exists to the shared storage from the node. Restoring the access to the shared storage requires operator intervention on the switch or by running the fence command with the option to open the port(s) again. If the nodes are using bonding you need to disable the bridge aggregation on the switch and not the individual ports which is members of the bridge aggregation.

The shown example here uses SNMPv2c without password but a configured ACL on the switch only allowing members running on the cluster vlan access to the configured fencing group on the switch. The fence_agent supports both an index number or the name for the ports.

See list of known interfaces on the switch:

fence_ifmib -o list -c <community> -a <IP> -n switch

Disable a specific interface on the switch:

fence_ifmib –action=off -c <community> -a <IP> -n <index|name>

Enable a specific interface on the switch:

fence_ifmib –action=on -c <community> -a <IP> -n <index|name>

Example:

<?xml version="1.0"?>
<cluster config_version="74" name="proxmox">
 <cman expected_votes="3" keyfile="/var/lib/pve-cluster/corosync.authkey"/>
 <quorumd allow_kill="0" interval="3" label="proxmox1_qdisk" tko="10" votes="1">
   <heuristic interval="3" program="ping $GATEWAY -c1 -w1" score="1" tko="4"/>
   <heuristic interval="3" program="ip addr | grep eth1 | grep -q UP" score="2" tko="3"/>
 </quorumd>
 <totem token="54000"/>
 <fencedevices>
   <fencedevice agent="fence_ifmib" community="fencing" ipaddr="172.16.3.254" name="hp1910" snmp_version="2c"/>
 </fencedevices>
 <clusternodes>
   <clusternode name="esx1" nodeid="1" votes="1">
     <fence>
       <method name="fence">
         <device action="off" name="hp1910" port="Bridge-Aggregation2"/>
       </method>
     </fence>
   </clusternode>
   <clusternode name="esx2" nodeid="2" votes="1">
     <fence>
       <method name="fence">
         <device action="off" name="hp1910" port="Bridge-Aggregation3"/>
       </method>
     </fence>
   </clusternode>
 </clusternodes>
 <rm>
   <failoverdomains>
     <failoverdomain name="webfailover" ordered="0" restricted="1">
       <failoverdomainnode name="esx1"/>
       <failoverdomainnode name="esx2"/>
     </failoverdomain>
   </failoverdomains>
   <resources>
     <ip address="172.16.3.7" monitor_link="5"/>
   </resources>
   <service autostart="1" domain="webfailover" name="web" recovery="relocate">
     <ip ref="172.16.3.7"/>
   </service>
   <pvevm autostart="1" vmid="109"/>
 </rm>
</cluster>