You are here

Blog

Disaster recovery monitoring

Maintaining business as usual is the minimum requirement for any revenue driven organization. You need to make sure all your IT systems are online to maintain customer happiness and of those internally.

The problem

Unfortunately for all of us, things break and disaster strikes whether within our control or completely outside it. The sad fact of the matter is that your customers don’t care if you’ve had an underwater fibre cable damaged like Yahoo suffered in 2014, all they care about is the 11 days of downtime which was the result.

At the same time, customers don’t care if your servers or codebase have gone haywire, they only care when it affects them. If you’re expecting thanks for your hours slaving away, being on-call and dedication to the cause keeping things going, don’t hold your breath.

In everyday terms, common problems can occur as a result of typical configuration changes as these are infrequently tracked or neglected unless thorough revisioning is in place. When tracking fails, even your best laid disaster recovery plans can get old, fast. So what can you do?

The various Strategies behind disaster recovery

Some of you out there may have your own methods for dealing with disaster recovery, so let’s see how they compare to a monitored proactive strategy.

No Strategy

Whilst the dangers of this ‘strategy’ speak for itself there are many out there, particularly in high demand environments and some SMBs that do not have failsafes in place. It’s likely that in these instances a reboot is usually the reliable cause of action. This is great when it works but when it doesn’t hours and days can be lost in the blink of an eye.

Laid-back

The in-the-middle stance to disaster recovery. It’s likely there is a loose plan to recovery here with steps to follow and maybe even some logs, however the detail is limited and this still focusses on retroactive action once a problem has already occurred.

Super Organised

You may think that being the epitome of organisation and planning is the best way to go. However in environments with high availability this is often too much to do. The man/woman hours required can often be excessive. While LogStash and other logging systems have a purpose there is a point where it can just become overkill.

So, Now What?

Your plans for disaster recovery, auditing, and maintenance may be great in principle but it’s often in vain when disaster strikes. Being proactive and proactively monitoring your systems can help you immensely. Those with no strategy can have the added peace of mind that their systems ae being readily scanned and they’ll then be warned of any changes that could affect a recovery. Those who are laid back can benefit from high availability software with an assurance that if the primary server goes down there is always a remote server that the monitoring software will kick in. This means that for your customers, everything critical remains intact and for all intents and purposes your IT health is kept in the finest condition. For the super organised too, monitoring software gives you that single-pane of glass look you require, meaning you never lose sight of your monitored IT systems. Many monitoring systems out there also incorporate logging and revisioning which helps makes life a whole lot easier when it comes to tracking down what went wrong, why, how to fix it.

Closing Thoughts

Every business is vulnerable to experiencing serious problems, whether this is an unforeseen incident or human error, downtime and service degradation will occur. Being pro-active and anticipating disaster gives you a great leg-up in the constant slug-fest against stringent SLAs and owner demands.

Disaster recovery monitoring enables you to see the light, providing you with the increased visibility you need to reach those ever so demanding SLAs. If you’re running a high availability configuration and have a dynamic environment where frequent changes happen with regularity then disaster recovery is what you need.

Get unified insight into your IT operations with Opsview Monitor

webteam's picture
by Opsview Staff,
Administrator

More like this

Dec 15, 2015
Blog
By Paul Walter,

Janet, the publically funded academic computer network, suffered the latest in a long line of DDoS (Distributed Denial of Service) attacks on...

High Availability (HA) and Disaster Recovery (DR)
Sep 14, 2017
Blog
By Bill Bauman, Product Strategist and Content Lead

System and application monitoring is critical to the success of a well-run IT department. During a failure or disaster, it is even more vital. 

Jan 04, 2016
Blog
By Paul Walter,

The top things university IT admins should focus on and look out for come January.