Business applications’ outages are proven to be costly, with an estimated average downtime cost ranging from USD 50,000 to 500,000 per hour. As businesses move towards digitization, the complexity of applications is increasing, requiring Site Reliability Engineers (SREs) hours or even days to identify and resolve issues.
To address this challenge, we have introduced the Probable Root Cause feature as part of Intelligent Incident Remediation from Instana®. This feature automatically analyzes call statistics, topology, and surrounding information using Causal AI when incidents are created. It quickly identifies the probable source of application failure, enabling SREs to resolve incidents by addressing the root cause directly, saving time and costs for the business.
The outcomes in this area depend on the data, assumptions, and methods used.
The Data
Instana monitors 100% of every call trace, providing detailed information about infrastructure and application activities. It maintains metrics at one-second granularity, along with events, dynamic topology, and other relevant data points. This level of data granularity enables Instana to use causal AI to identify root causes accurately.
The Assumptions
Many IT management tools assume that the application’s topology is always available at a granular level. However, due to the specialized nature of IT processes, this assumption may not always hold true. Instana’s use of causal AI and a versatile algorithm allows it to identify root causes even with limited data granularity and partial topology information.
The Method
Using causal AI, Instana can identify root causes by combining data from various sources, such as calls, metrics, events, and topology. This approach provides insights into why certain entities are identified as probable causes, enhancing the trustworthiness of the identified problematic components.
An example use case with Stan the SRE
Let’s consider Stan, an SRE at a company using the robot-shop application monitored by Instana. Stan receives an alert about a performance issue in the application and uses the Probable Root Cause feature to quickly identify and resolve the problem, saving time and effort in the incident investigation process.
A vision for the future
Instana plans to enhance its root causing abilities further, focusing on explainability and detailed analysis of application faults to provide actionable insights for remediation.
Learn more about IBM Instana’s probable root cause capabilities and the intelligent remediation pipeline.
Was this article helpful?
YesNo