The purpose of migrating
The most common reason for migrating a Qbox cluster is to change the server size of the nodes. In addition, you can perform a migration to change from one region to another, replace deprecated hardware, or any similar operation in which avoiding downtime is the highest priority.
If you wonder why you would need to increase the node size of a cluster instead of simply adding a node, have a look at this article: Qbox Help Center: Choosing a node size
How migrations work
Since it's typically impracticable to "resize" clusters on a cloud server, it's necessary to replace the cluster. To preserve uptime, Qbox migrations will link the existing nodes in the existing cluster to nodes on the other cluster—in the new size that you specify. This linking increase the speed and reliability in transferring the data shards from the source nodes to the destination.
During the data transfer phase of the migration, the original nodes will continue to serve requests to the end-user application while the process simultaneously transfers shards to the new nodes. The total processing time for this phase depends on the size of the dataset.
Here's a quick example: Consider a 3-node cluster that you want to migrate without changing the number of nodes. A 3-node cluster would temporarily require 6 nodes in total to accommodate the entire migration process.
After all of the data transfers, yet before we unlink and destroy the original nodes, the system must redirect the flow of user requests to the new nodes (as seamlessly as possible). This can be done by either of the following methods:
- A DNS update —for users connecting to node hostnames
- An endpoint change, for users connecting to the node private IP addresses.
NOTE: Each node in a Qbox cluster is a server that maintains both public and private IP addresses. Each of the publicly-addressable hostnames, such as xxxxxxxxxxxxxxxxnnn.qbox.io, will correspond to each a public IP for a node—using a DNS record.
After all of the data transfers—using either hostnames or private IPs—Qbox will update the DNS record for each hostname (on each node).
Following the DNS update, migrations are configured to wait for 24 hours (since DNS propagation is generally unpredictable), or until the user confirms that they've done one of the following:
- Updated the /etc/hosts file on their application servers to force hostnames to resolve to the new nodes, or temporarily pointed to the public IPs of the new nodes.
- Updated their application code to point to the new nodes' private IPs.
Total runtime / expected downtime
The total maximum runtime of any migration is the time necessary for data transfer (the time may vary) + the maximum DNS propagation wait time of 24 hours (which is not necessary). This means that a small migration which skips the DNS wait period could be done within an hour.
The primary benefit of the node migration process is minimal downtime. However, a small amount of downtime is currently unavoidable. Clusters having 3 or more nodes will experience around 10-20 seconds of downtime. Clusters having less than 3 nodes can experience up to 90 seconds of downtime. The downtime is due to process restarts, which are required at various points in the migration. Read the article to learn more about failover and rolling restarts: