SRM - vRealize/Aria cluster with NSX-T Load balancer recovery over NSX-T stretched network
vRelaize Operation/Automation cluster can be recovered using SRM without changing IP over NSX-T Federation stretched network along with NSX-T load balancer. The NSX-T Federation doesn’t support the NSX-T load balancer, so I can’t configure load balancer using global manager.
I used location specific local NSX-T manager to create LB service in both location and made primary location LB is enabled and secondary location LB is disabled. While doing the SRM failover I disable LB service in primary location and then enable LB service in secondary site and do the same in revers while doing SRM fallback, this can be configured via script during recovery, please refer my other article about the script .
I did create a vROPS cluster to test this because it’s small appliance compare to vRA and it will fit with my Lab, but the same solution works with vRealize Automation or any other application using NSX-T load balancer over stretched network.
I’m not going to talk about install/configure NSX-T federation or SRM, I just focus on the vRealize component recovery with LB configuration on stretched network.
Fig 1, shows the design of the LB with vRelaize Operation cluster in my lab.
I have two location Site A and Site B in primary and secondary configuration, each site has uplink interface in different VLAN peer with upstream router using BGP.
Site A – NSX-T local Manager named as Site-A-Local
Site B – NSX-T local manager named as Site-B-Local
Site A Uplink – 172.16.50.1 peer with 172.16.50.253 router interface
Site B Uplink – 172.16.60.1 peer with 172.16.60.253 router interface
Site A is primary and Site B is secondary/secondary site. Stretched network 10.10.100.0/24 with segment named Stretched-Application-NW across Site A and Site B, the vRealize Operations nodes are connected with this stretched network 10.10.100.0/24.
Load balancer VIP is 10.10.100.5 and two vROPS nodes IP’s are 10.10.100.6 and 10.10.100.7
Fig 2, The clustered vROPS VM’s are running in Site A and connected with the Stretched-Application-NW overlay segment with the network 10.10.100.0/24.
Fig 3, SRM placeholder VM’s are in Site B with no disk or network attached.
Fig 4, the load balancer virtual ip (VIP) 10.10.100.5 in Site A is reachable via the Site A Uplink 172.16.50.1. The trace command shows the path, we will check the path after SRM failover.
Fig 5, I have Site-A-GM global manager is Active and Site-B-GM is standby mode, two local managers are Site-A-Local and Site-B-Local.
Fig 6, Stretched T0 with Active/Active mode and BPG peering in both location. Site-A is Primary and Site-B is secondary.
Fig 7, Stretched T1 only for distributed routing and no services running. The data plan will continue work after primary site A failure.
Fig 8, A Stretched-Application-NW segment stretched across Site A and Site B where the vROPS nodes are connected and this segment is attached with the my Stretched T1.
Site A Load Balancer Configuration
Fig 9, the standalone T1 named Site-A-T1-LB is created in Site-A-Local manager used to configure Load balancer service in primary site. Site-A-T1-LB is standalone not attached with any T0.
Fig 10, Service interface with the IP 10.10.100.1/24 created in standalone T1 Site-A-T1-LB and Connected to Stretched-Application-NW segment to enable the Load balancer connectivity.
Fig 11, Site-A-T1-LB is configured with a static route to reach any network from LB using the next-hop 10.10.100.253 which is the default gateway for the Stretched-Application-NW segment.
Fig 12, Load balancer service Site-A-Application-LB is created and attached to the standalone T1 named Site-A-T1-LB in primary Site A.
Fig 13,14,15 shows Virtual IP, Profile, pools, monitors for vROPS cluster load balancer are created as per the VMware documentation.
Site -B Load balancer Configuration
Fig 16, Standalone T1 named Site-B-T1-LB is created in Location B same as Site A Standalone Site-A-T1-LB but the service interface and static routes are not created to avoid IP conflict. The service interface and static route will be created in Site-B-T1-LB during the SRM recovery.
Please note, If you want to use different service interface IP in recovery site. you can create and keep it in Site-B-T1-LB then you don't need to remove/create service interface and static route each time during failover/fallback, only detach/attach standalone T1 with LB.
In by case i want to use same service interface IP in both location, so I use to remove service interface from primary site and create in recovery site during failover/fallback.
Fig 17, Load balancer Site-B-Application-LB is created in Location B and not attached with Site-B-T1-LB so the load balancer status is disabled in Site B. I need to attached this after detached Site-A-T1-LB from Site-A-Application-LB load balancer during the recovery.
Fig 18,19,20 Load balancer VIP, pool, profile and monitors are created same like Location A as per the VMware documentation.
Site Recovery Manager Setup
Fig 21, SRM with vSphere replication configured in both location and paired.
Fig 22, vSphere replication enabled for the vROPS cluster nodes to replicate to location B.
Fig 23, Network mapping created and mapped primary network Stretched-Application-NW segment with same Stretched-Application-NW in recovery Site B.
Fig 24, Protection group with vSphere replication from Site A to Site B.
Before run the disaster recovery i need to detach Standalone Site-A-T1-LB from the Sita A Load balancer service named Site-A-Application-LB , also remove static route and service interface from Site-A-T1-LB. Then i create same Service interface and static route in Site-B-T1-LB and attach with Site B load balancer service named Site-B-Application-LB in recovery site. In this article i have done this reconfiguration manually but I have created script and tested by calling the script automatically from SRM appliance during recovery, you can refer my other article.
Fig 25/26, detached the T1 from LB in location A also deleted static route and service interface in location A standalone T1
Fig 27, service interface and static route are created in location B standalone T1 and attached to LB. A script created to reconfigure this and I have included a script in separate article.
Fig 28, Service interface created in Site-B-T1-LB in Site B.
Fig 29,30 Static route created in Site-B-T1-LB in Site B which is my recovery site.
Fig 31, Attached Site-B-T1-LB to Site-B-Application-LB service in recovery site.
To make the Primary site A is down, I have to disabled the uplink in Site A.
Fig 32, Disabled uplink interface in Site A to make primary site down.
Fig 33,34 after the uplink in Primary site A is disabled the BGP neighbour status is not established in Site A and only BGP peer established from Site b, now the network are learned from Recovery Site B only.
Fig 35, disaster recovery plan is running from SRM to recover vROPS nodes in Site B recovery site.
Fig 36, vROPS nodes are recovered in site B and the SRM attached the vROPS VM's on same Stretched-Application-NW segment in site B.
Fig 37, tracert command shows the VIP is reachable via 172.16.60.1 which is the uplink on recovery site B.
Fig 38, the vROPS cluster nodes are up with same IP address in Site-B and it’s accessible from the fqdn vrops.corp.local of the same VIP 10.10.100.5 in Site-B load balancer.