Gartner predicts that by 2021, 60% of IT monitoring investments will include a focus on business-relevant metrics, up from less than 20% in 2017.
The invisible perimeter
Today’s infrastructure is everywhere! It goes from the database servers and the virtual machines running in your datacenter, containers or even serverless functions running in the cloud or specialized devices running at the edge or in your fog.
Modern IT infrastructure is an extremely complex system of interconnected technologies, each of which has the potential to run into issues or fail outright. And with more components being added to these stacks as technology evolves, new opportunities for outages arise. In fact, between 2017 and 2018, instances of outages or “server service degradation periods” increased from 25% to 31%, and if we look at on-premises data centers, that number rises to 48%. (Source: Uptime Institute 2018 (8th Annual Data Center Survey). What’s more alarming about these outages is that 80% could have been prevented; they were caused principally by human error, power outages, and network and configuration issues.
In this ever-growing more complex system of devices and services, how do you know if a problem occurs, where it occurs, and what’s causing it? How do you organize effective demand and capacity management?
Bridging many sources
Analysts including Gartner, Forrester, and IDC have all developed their own set of essential metrics. The following is a list of observable metrics and events that we have found to be critical when monitoring the infrastructure stack. These sources can be split into three groups:
- METRICS: Numbers describing a particular process or activity measured over intervals of time.
- EVENTS: Immutable records of discrete events that happen over time. Event logs exist in plaintext, structured text, or binary.
- TRACES: Data that shows which line of code is failing to gain better visibility at the individual user level for events that have occurred.
Having a solution that provides a holistic view of the infrastructure alongside detailed views of individual components is vital if an organization wants to proactively tackle infrastructural issues and reduce mean time to failure (MTTF) detection, investigation, and restoration. It’s also an essential piece of future planning. Knowing how the infrastructure has performed in the past and how it is performing in real time, provides invaluable insights that reduce complexity when integrating new technologies and building new experiences for users and employees.
Technology Monitoring - Dealing with volume, velocity, and variety
Our technology monitoring solution is built on two important principles:
- centralized and observable data;
- artificial intelligence and machine learning enabled.
A centralized data-lake that handles metrics, events, and traces of any of your infrastructure components, removes blind spots from the system and, as a result, reduces mean time to resolution because teams can more quickly identify the problem, fix it, and move forward. On the downside of centralizing data, are of course the three Vs of big data, volume, velocity, and variety. In order to assist your teams as much as possible, we rely on machine learning algorithms and automation of manual repetitive tasks. Giving teams back the bandwidth to do the things AI and ML can’t do: creative problem solving, upgrading existing technologies and planning for the future.
Having a solution that provides a holistic view of the infrastructure alongside detailed views of individual components is vital.