Network service outages are inevitable. Even cloud platforms and content delivery networks (CDNs) with 100% uptime SLAs experience outages. The key question is: how do you handle a network service outage? Will you be knocked offline due to lack of redundant services, or will you seamlessly failover to another provider to maintain a good user experience? And how will the failover process work on the back-end – automated or manual?
Most midsize and large organizations have redundant systems in place to survive an outage. However, they may not have an automated mechanism to redirect traffic to these redundant systems when a core service goes down.
IBM NS1 Connect Filter Chain™ technology uses DNS to automatically reroute traffic between service providers during a network service disruption. With predefined rules, NS1 Connect monitors the network’s status and switches endpoints as necessary. The rules and priorities are set upfront, and the process thereafter is automated.
On the NS1 platform, filter chain configurations are applied to individual records within DNS zones. These filter chains determine how NS1 handles queries against each record, specifying which answers to return. Different filter chains use unique logic to process queries and can be combined to achieve specific outcomes based on operational or business needs.
To provide guidance on directing failover traffic, we have created a quick guide on building active-active, active-passive, and manual failover systems using filter chains.
Active-active failover
In this scenario, NS1 or third-party data sources monitor the status of individual endpoints in your application delivery infrastructure. When an outage is detected, NS1 automatically routes traffic to secondary systems, which are likely already part of your load balancing system. The first filter in the chain, “Up”, checks whether the service provider’s endpoint is operational. The second filter, “Shuffle” or “Weighted Shuffle”, distributes traffic to other providers if the “Up” filter returns a false answer for any endpoint.
Finally, the “Select First N” filter specifies the number of answers to provide for inbound queries, with the default being one.
Active-passive failover
In this case, NS1 or third-party data sources monitor your application delivery infrastructure’s status and route traffic to secondary systems in the event of a primary system outage. Unlike active-active failover, the secondary systems are only spun up when needed as a redundant option. The filters in this chain include “Up” to check the status of underlying services and “Priority” to prioritize active systems over passive or backup systems.
The “Select First N” filter specifies the number of answers to deliver, typically one.
Manual failover
In situations where failover decisions need to be made based on additional information, the filter chain is used as an implementation mechanism. The first filter, “Up”, requires manual definition of which services are up and down. The second filter, “Priority”, prioritizes active systems over passive or backup systems. The “Select First N” filter specifies the number of answers to deliver, usually one.
Multi-cloud or multi-CDN availability
In scenarios where service availability is more nuanced, the filter chain can be used to steer traffic based on advanced analytics data from NS1 Connect. The filters in this chain include “Pulsar Availability Threshold” to set a percentage value for service usage based on availability metrics, “Weighted Shuffle” to distribute traffic based on weights, and “Pulsar Performance Sort” to direct traffic to the fastest available service.
The “Select First N” filter determines the number of answers to deliver, typically one.
For more information on using filter chains to enhance performance, resilience, and reduce costs, explore further below.
Guard against outages with resilient, redundant network services
Was this article helpful?
YesNo