In this technical overview we will look at automation and monitoring, and how they can be deployed to work hand-in...
You are here
User Error: Why Automation Could Save Firms From IT Outages
IT outages are all too common. Google, Instagram, TSB – some of the world’s biggest brands, working across different sectors have fallen foul to downtime in recent years, wreaking havoc for customers, their reputation and the balance sheet.
While many assume that outages can’t be helped (due to an electrical fault, for example), or are malicious in nature, this is not the case. According to Gartner, “the undisputed #1 cause of network outages is human error.” A 2016 report by the Ponemon Institute backs up this assertation, when it found that human error was the second most common cause of system failure – and therefore business downtime – accounting for around 22% of all incidents.
This presents a significant problem for firms. Businesses are built on people, whom are necessary for commercial success. Yet, could it be the case that the workforce is potentially costing firms due to carelessness or user error?
What mistakes are being made?
The biggest issue we see when it comes to human error is consistency. The problem many CIOs face is that their teams are often stretched too thin – they are firefighting and plugging gaps. This reactive approach is costing businesses, as instead of being able to look at issues that might arise (for example, expiring certificates), they are reacting to issues, which means that things get missed. Additionally, for large organizations monitoring thousands of devices, it’s hard to ensure every device gets the same time and attention as the next. Be it changing personnel, or daily priorities, it’s very easy for humans to overlook something which may cause an issue later down the line. This is not necessarily the fault of the human, more that businesses should be thinking about how to get the best value from their workforce, instead of asking them to do repetitive, monotonous tasks.
How can comprehensive monitoring and automation save the day?
Monitoring and automation help to provide a single source of truth and can be used to mitigate risks across IT infrastructure. For a start, they vastly reduce the likelihood of issues being missed by ensuring consistency and reducing tedious manual configuration. Combined, they can help detect, assess and fix the health of a vast array of assets, and learn to pick up and flag any potential anomalies which are lurking across the system. For example, a disk failure may not be a big issue and bring down a server – it has the potential to slow services down, but it is unlikely to cause irreparable damage. Monitoring and automation can work hand-inhand to fix this – first by identifying the issue, and then by implementing an automated response to fix it (by extending it). This sort of cascade effect issue is a perfect example of how identification and remediation at the source is the way forward for businesses. Additionally, it will flag up the error, telling the team exactly what is wrong and why – instead of a small flaw being missed and then escalating into something much bigger. Automation is also more organized. While people can be technically brilliant, they are not infallible. Lost audit trails or documents are not uncommon, whereas automation provides a clear timeline of what happened, when. This accountability also works at scale, whereas with humans there will always be a tipping point between quality and quantity.
What benefits does this provide?
Monitoring and automation provide three clear benefits to firms. Firstly, it saves time. While it is unlikely that automation or humans will ever guarantee a 100% error-free service, automation is quicker as it doesn’t get distracted. It completes task after task to the same standard and monitoring should (if programmed correctly) identify issues before they snowball – leading to significant time savings down the line. Even if these initial problems were missed, automation provides a clear timeline of activity, meaning issues can easily be traced back to the source, which is not always possible when humans are overseeing thousands of jobs per day. This in theory also helps with fix times, meaning firms are exposed to less downtime.
This, of course, has a knock-on effect financially. Less downtime is good for the balance sheet, as operations are not affected and optimizes the chances for trading. Gartner famously calculated that downtime costs in the region of $300,000 per hour, and when you add reputational costs to that, plus fickle customers, then brands face a significant bill for lost IT time. The old adage rings true – time is money, so for brands looking to manage risk, monitoring and automation could help achieve this.
Finally, more efficient monitoring and automation also helps improve the IT team’s user experience. In the age of digital transformation, IT teams can add value by working on more exciting projects which can transform a brand, instead of firefighting issues which cause stress and worry. This more engaging work will also make their role more enjoyable, enticing them to stay and increase productivity.
Time to trust machines?
While dependent configuration, monitoring and automation provides efficiencies which can benefit firms and help them get rid of age-old issues such as tool sprawl, as automation should be able to do everything in one function, stopping others implementing bad practice. Automation can breathe new life into monitoring, cutting down susceptibilities which have plagued businesses for too long. However, the final complement to this risk management is one, single pane of glass monitoring system, which shows up any errors identified in a clear way.
By monitoring all aspects of the devices and systems – hardware and software, on-premise and in the cloud – organizations will have the full picture of system health at all times. It is too difficult to maintain, operate and gain insights from multiple separate tools – it is much easier to have a single tool, which automation feeds into.
Humans should no longer need to interact with servers and organizations processes – they should be adding value elsewhere and automation trusted to take on the repetitive jobs. In today’s world, where skilled workers are sparse and retention rates are dropping, firms need to get the best value from their employees. Automation can help achieve this, taking away the reactive approach and replacing it with a dynamic, value-add culture, assigning the ‘computer says no’ approach to the trash, once and for all.
More like this
Opsview’s Business Service Monitoring tool has had huge benefits by allowing Cisco to easily visualize which service will be impacted and running...
DevOps is about accelerating delivery of new products and services at scale, reliably and affordably. Doing this requires comprehensive IT ...