How to achieve 100% of the project's availability?
2018-07-25T20:40:11.202Z25 July 2018

How to achieve 100% of the project's availability?

Before trying to ensure the maximum uptime of the project, it is necessary to correlate the costs and the cost of downtime. Usually, this is very important for companies which are depended from other companies - B2B solutions, API-services, delivery services. Inaccessibility for even a few minutes will lead, at least to the load on the call-center from dissatisfied customers. For companies of another type, for example, a small online store, or a company whose clients work from 9 to 18, inaccessibility even for several hours may cost less than a full-fledged backup site. We propose to reflect and protest the following tasks.

Localization of the entire project in one data center / one cloud hosting zone

The cloud hosting has firmly fixed in the minds of people an erroneous concept: cloud hosting is not tied to the hardware and this means that the cloud infrastructure will not fall. Three 24-hour Amazon Web Services crashes, a recent Cloud4y and Digital Ocean accident, data loss of the Cloudmouse showed that the localization of data and the project itself in one data center is a guaranteed way to get a multi-hour downtime without the ability to deploy a project on another site.

Often we can see a client configuration where several servers are reserved in the same DC in case of failure of one of them, however, in our experience, network problems, when several racks in one data center become inaccessible at once or the entire data center as a whole happen much more often than the crashes of particular servers, and this must also be taken into account.

Absence of a switching plan and regular switching to a backup site

Even the best monitoring can not guarantee that the backup pad will be ready for deploying when it really is needed. In our experience, at the first switchover to the reserve there will be an accident, and this will happen several more times. Soma companies said that they took about five reserve switches before they were convinced that now he was completely ready to accept traffic after the accident. After working out and fixing the mechanism of switching in the documentation - it is necessary to switch regularly to the reserve, in order to make sure that everything is still working.

Placement of the same versions on the main and reserve sites

Even using an auditable backup site and using a secondary site in another data center does not guarantee the reserve's willingness to quickly take over the load. This is due to the essence of the reservation: a new version of the code that created a fatal load on the production environment will create exactly the same load on the backup site, and the project will be completely unavailable. As an easy solution, there must be a mechanism for rolling back to the previous version, but in the business race for releases this is not always possible, and then we start thinking about another backup site with the previous version.



Latest news
How to become LIR in 7 days

There is an Internet infrastructure that includes switches, routers, which require a fairly large number of ..

30 August 2018
How to avoid mistakes when choosing a hosting

Everyone says that they learn from mistakes, but sometimes these mistakes can lead to very large losses. The..

17 September 2018
What is the reason for the global increase in the nu..

In April 2017, there were 320 hyper-scalable data centers in the world, and in December their number was 390..

15 November 2018
How likely is it that your site will "fall" during t..

Holidays are a special time for many areas of activity. For some companies it's a dead season, for others it..

5 June 2018

Do you like cookies? 🍪 We use cookies to ensure you get the best experience on our website. By using our website you agree with our policy!