Just 10 months ago we published an article looking for feedback on our plan to make CygNet redundant. We got lots of feedback (thank you!) and have been working hard ever since to lay the groundwork for a top notch feature. As expected, we had to cut lots of scope and weren’t able to do it all in this release, but I’m confident that what we completed will get you to parity (or very close) with the scripted solution many of you are currently using. We will continue working in the next release to deliver even more in 2017, but I wanted to take a moment to describe what you can expect in the coming 8.5 release.
First, lets talk about how our redundancy solution is implemented. The majority of the logic for redundancy has been built into the RSM service. The RSM has historically been a very lightweight service that just existed to make sure other services were running. It now has a database containing information on every service and which role/domain they should be running in. All your redundant RSMs will synchronize with each other (across domains even) to make sure they all agree on their various roles. They will regularly check to make sure their services are running on the right domain and restart them if not. To make this possible, each RSM must be uniquely named so that each one can be available on multiple domains at a time without conflicting with each other. We recommend that you keep the site name, but change the .RSM extension to reflect the host they are running on. So you might have names like CYGNET.RSM_PNP (Production network, primary host) or CYGNET.RSM_PNB (Production network, backup host).
I have a very simple site setup (with only a few services) to demonstrate some of what you can expect to see.
Here you can see that I’m running CExplore on two domains (8331 and 8332). However my RED.RSM1 service is visible on both domains (Note that I don’t have two running services with the same name, this is a single process running that’s visible to multiple CygNet domains). You will also notice a new “Domain” column showing which domain its managed services are running on. And if the domain doesn’t match the ambient domain, the status is “Running on a different domain” instead of just “Running”.
If you were to look at CygNet Host Manager, you would see that the RED.RSM1 service is actually running on three domains.
I did a failover of just the VHS, and now you can see that my RED.RSM1 service is running most of its services on the 8331 domain, but its VHS is now running on the 8332 domain.
The paired RSM in this failover set is RED.RSM2. So if we look at those two services, together they are running the 8331 and 8332 domains which make up a locally redundant set.
You will notice that both RED.RSM1 and RED.RSM2 are visible to both the 8331 and 8332 domains. You will also notice that they both contain the same set of services.
There is a third domain in this example, the 8333 domain, which is configured to be in a different data center. The site running on the 8333 domain is split across two servers under RED.RSM3 and RED.RSM3A. This is to demonstrate that you can split a site across multiple servers in a redundant environment, even when the paired site is configured differently.
The two windows on the left are showing the 8331 and 8332 domains which are locally redundant in my “SLO” data center. The two on the right are both showing the 8333 domain which is running in my “Atascadero” data center on two different servers. To keep the example simple, I don’t have local redundancy configured in my Atascadero data center although that could easily be added.
For a locally redundant set (8331 and 8332), you can failover individual services or the entire site. For a data center redundancy set (8331 and 8333), you have to failover the entire site. So for a locally redundant set, you might choose to always run in a partial failover state (as demonstrated with my VHS) to spread the load across both servers instead of having one always being idle.
If I perform a data center failover, you will see that the services running on the 8331 and 8333 domains swap.
My live domain (8331) is now running in the Atascadero data center instead of the SLO data center. If I have any hardware failure in the SLO data center, I can still do a local failover, but it will now be between the 8332 and 8333 domains.
The actual failover process is pretty simple. You pick which services you want to failover, and the RSMs takes care of the rest. If you are doing a soft failover, it will first make sure the services are fully synchronized. It does this by telling the active service in the set to freeze and stop all work. This basically puts the service into a read-only mode. The standby service is then told to start replicating until the services are an exact match. The next step is to stop both the active and standby services, then restart the standby services on the active domain. Once those are up and running, the previously active services are told to start on the standby domain. If you are experiencing an emergency, you can issue a hard failover which will skip the sync step and complete the failover whether the services are all ready or not.
I could show many more examples and scenarios, but I think this is enough to give you a taste of what to expect. This may seem very complicated, but don’t worry, we have a number of tools to simplify things.
The primary tool is a sample set of dashboard screens that have been implemented in Studio so that you can tweak and modify them to your liking. These screens should work out of the box once you have configured your system for redundancy.
This is an overview screen showing the three failover sets I have configured. The grid up top gives a quick summary of whether they are a local or data center failover set and whether they are failover ready or not. If I select one, I can get a more complete status showing which domains are in the set, whether or not the services are running, the status of replicating (the arrow will be green or red) and where the data is coming from (If this set does not contain my live domain).
To perform a failover, go to the ‘Execute Failover’ tab to pick a failover set, select the services you want to failover and then press the Execute Failover button.
There are a lot more details I could go into, but hopefully this is enough to give you a taste of how the feature works.
Another important piece in this release is that we have replaced the old CConfigFileUpdater utility with a new utility called ConfigFileMgr to manage your configuration files. The utility was demonstrated at users group in April, but has come a long way since then. It allows you to edit local configuration files or remote configuration files in mass and even provides basic validation for your changes. By remote files, I mean that if you can communicate with a CygNet service, you can view and edit its configuration file. I think you will all find it useful for basic CygNet management, even if you don’t want to configure redundancy.
There is also an editor where you can configure the role of each domain and which RSM services are redundant. It is available via the right click menu in CygNet Explorer. It can also be instantiated within Studio from any of your screens.
As I said earlier, this is not a comprehensive list of the changes made to support redundancy, nor is it intended to be a guide for you to configure it. There is a lot of great documentation on this feature that can be found in the CygNet help file under the root “Redundancy” entry. Once 8.5 is released in the coming days, I encourage you to take a look and check this feature out.
If you have more questions than answers at this point, we want to hear them! Just leave a reply and I’ll do my best to respond. You can also join our online training course for redundancy on 1/24/2017. Check out this blog article with details and a registration link.
Thanks for reading!