High availability / failover

For Qbox clusters that have multiple nodes, requests will reroute to other nodes. For example, if node 1 goes down at the process-level, its requests will route to either to node 2 or node 3.

Node failures or timeouts can also result from entire crashes of the virtual machine or the host hardware. This type of failure is less-common, and is usually the result of resource-strain—such as overloading a cluster that's too small for the request/data volume.

In the case of a full VM timeout, it won't be possible to submit simple request failover through the proxy. So, we recommend that you simply retry the requests. Some Elasticsearch language clients will allow for a request "retry count" option, which will relay any request failures to other available nodes.

In all node-loss scenarios, our automatic systems will immediately begin recovery procedures and send alerts to our support engineers.

NOTE: 3 nodes are necessary for a cluster to have effective failover, and we recommend a minimum of 3 for production environments.

Two nodes are still better than one, but a 2-node cluster will only benefit from data redundancy. In a Qbox cluster, a majority (or quorum) of nodes must be available and responsive to continue serving requests. Since a majority of 2 is 2, 3 nodes is the minimum necessary to prevent downtime during a node failure.

Have more questions? Submit a request

Comments

Powered by Zendesk