From a Non Fault Tolerant Architecture to High Availability

The underlying purpose of these projects is related to infrastructure (strongly linked to IP interconnection). The main objective is achieving a high availability platform without affecting and/or changing its functional flow.

Chaos engineering lets you find out how do you cope with abnormal conditions in a safe way, avoiding an outage in the future, and providing a much better user experience. Learn every insight on our blog post Chaos Engineering for Resilient Software.

Below there is a high-level plan based on the benefits of virtualized environments, in which two players interact:

  • Customer: network operator who owns the architecture and need a topology change.
  • Supplier: platform vendor.

High-Level for High-Availability – The Plan

  1. Servers Cloning (Customer)
  2. Prerequisites Delivery. Extended IP assignment for cloned machines (Customer)
  3. Firewall rules adaptation according to the productive environment (Customer)
  4. Prerequisites Delivery (Customer)
  5. Prerequisites Validation (Supplier)
  6. Cloned Pre-production Servers Configuration (Supplier)
  7. Internal interconnectivity tests (Supplier)
  8. External tests (Customer/Supplier)
  9. MOP Method of the procedure (Supplier)
  10. Swap Productive Platform -> Pre-production Platform (Customer/Supplier)
  11. Post activity control process (Supplier)

Detailed Plan

Servers Cloning (Customer)

Executes the virtualization action with current productive environment cloning. This way, the environment is now under the responsibility of the Supplier for its adaptation. 

The action is graphically represented as:

Servers Cloning (Customer)

Prerequisites Delivery (Customer)

Extended IP assignment for cloned machines (Customer)

Firewall rules adaptation according to the productive environment (Customer)

The customer delivers the prerequisites requested by the supplier to proceed with the validation. Within these prerequisites, the application of ALL new Firewall policies must be included. The origins of the connections will change (ex: the connector goes from having as IP source IP1 to IP1.1/IP1.2)

Prerequisites Validation (Supplier)

The supplier proceeds to validate the delivered infrastructure and prerequisites. In the case that a prerequisite is not correctly configured, the supplier raises the claim for analysis and correction.

Cloned Servers Configuration. Pre-Productive Platform (Supplier)

With prerequisites delivered, the supplier proceeds to set up high availability processes. At this stage, additional floating IPs requested will be used. The process will allow high availability (HH) configuration and the interconnection test of all the solution modules. It should be noted that these additionals floating IPs must also be part of those allowed in customer’s firewalls.

Internal tests (Supplier)

The supplier performs high availability tests among cloned servers. Testing not only infrastructure but also business.

External Tests (Customer/Supplier)

In this case, it will be necessary that the customer allows the customer’s BSS order to be sent to this new environment to test the functionality. This way, the customer will be able to check the new infrastructure at a functional level. The customer’s BSS must be interconnected to the additional Floating IP assigned, as shown in the diagram below.

External Tests (Customer/Supplier)

MOP – Method of Procedure (Supplier)

Having all the tests certified, the supplier sends the appropriate method of procedure. The document details the steps needed to start-up the new high availability infrastructure. 

The maintenance window is summarized in environments swap, from a productive platform to a pre-productive platform, changing the current productive IP address to the new infrastructure.

Pros, Cons and Plan Requirements

  1. Infrastructure is based on the main component of the project (strongly related to the IP interconnection). 100% of the task scope is related to the IP addressing changes, high availability rules, and new Firewall rules.
  2. The presented plan benefits the assurance of new configurations at the network level.
  3. All new assigned IPs that can be used for cloned machines must belong to the current network segment as the productive one. This will allow the team (customer/supplier) to test everything before the maintenance window, ensuring the success of the task.

Based on the guidelines shown above, with this plan, we can ensure a change from ta non-fault-tolerant architecture to high availability, while minimizing operational risks.

Chaos engineering lets you find out how do you cope with abnormal conditions in a safe way, avoiding an outage in the future, and providing a much better user experience. Learn every insight on our blog post Chaos Engineering for Resilient Software.

Menu