OBSERVABILITY MATURITY SELF-ASSESMENT Question Title * 1. What is your primary job title? CIO / Chief Information Officer CTO / Chief Technology Officer VP of Engineering VP of IT Operations Director of Engineering / Platform Engineering Director of IT Operations / SRE Manager Cloud Architect / Principal Engineer Other (please specify) Question Title * 2. How many employees does your organization have? 500 to 999 1,000 to 4,999 5,000 to 14,999 15,000 to 49,999 50,000 or more Question Title * 3. Which industry best describes your organization? Financial Services / Insurance Retail / Consumer Goods / eCommerce Healthcare / Life Sciences Technology / Software Manufacturing / Industrial Energy / Utilities Media / Entertainment Government / Public Sector Other (please specify) Question Title * 4. Which cloud platforms does your organization currently use? (Select all that apply) AWS Microsoft Azure Google Cloud Platform On-premises / Private Cloud (primary) Hybrid (cloud + on-premises) Multi-cloud (two or more public cloud providers) Question Title * 5. Which observability platforms is your organization currently using? (Select all that apply) New Relic Datadog Coralogix Dynatrace AppDynamics (Cisco) Elastic Observability Prometheus / Grafana (open-source stack) Splunk AWS CloudWatch (primary tool) Azure Monitor (primary tool) We do not currently use a dedicated observability platform Other Question Title * 6. Approximately what percentage of your production services and applications have observability agents or instrumentation deployed? Less than 25% 25% to 49% 50% to 74% 75% to 89% 90% or more Question Title * 7. Which telemetry types does your organization currently collect? (Select all that apply) Infrastructure metrics (CPU, memory, disk, network) Application Performance Metrics (APM / response times, error rates, throughput) Distributed traces (end-to-end transaction tracing) Structured application logs Unstructured / raw logs Synthetic monitoring (simulated user journeys) Real User Monitoring (RUM / browser and mobile) Business metrics (revenue, conversion, customer experience KPIs) Security events (SIEM integration) Question Title * 8. How would you rate your organization's observability coverage of the following areas? None Limited Adequate Thorough Comprehensive • Core production services • Core production services None • Core production services Limited • Core production services Adequate • Core production services Thorough • Core production services Comprehensive • Supporting / internal services • Supporting / internal services None • Supporting / internal services Limited • Supporting / internal services Adequate • Supporting / internal services Thorough • Supporting / internal services Comprehensive • Third-party integrations and dependencies • Third-party integrations and dependencies None • Third-party integrations and dependencies Limited • Third-party integrations and dependencies Adequate • Third-party integrations and dependencies Thorough • Third-party integrations and dependencies Comprehensive • Mobile and web front-end • Mobile and web front-end None • Mobile and web front-end Limited • Mobile and web front-end Adequate • Mobile and web front-end Thorough • Mobile and web front-end Comprehensive • Data pipelines and batch processing • Data pipelines and batch processing None • Data pipelines and batch processing Limited • Data pipelines and batch processing Adequate • Data pipelines and batch processing Thorough • Data pipelines and batch processing Comprehensive Question Title * 9. Does your organization currently implement Observability-as-Code (defining dashboards, alerts, and SLOs in version-controlled configuration files)? Yes — fully implemented with version control and CI/CD integration Yes — partially (some elements managed as code, others manual) We are planning to implement this in the next 12 months No — all observability configuration is managed manually Question Title * 10. What is the primary barrier to expanding your observability instrumentation coverage? Engineering capacity — we don't have the bandwidth to instrument more services Skills gap — we lack expertise to implement instrumentation correctly Tool cost — licensing costs are limiting our ability to expand coverage Organizational complexity — difficult to get alignment across teams Legacy systems — older systems are difficult to instrument No clear ownership — unclear who is responsible for observability We don't believe we have a coverage gap Question Title * 11. How many active alert rules does your organization currently have configured across all observability tools? Fewer than 50 50 to 199 200 to 499 500 to 999 1,000 or more We do not know Question Title * 12. Approximately what percentage of alerts your team receives in a typical week are genuinely actionable (require an actual response)? Less than 15% — most alerts are noise 15% to 30% 31% to 50% 51% to 75% More than 75% — our alerts are well-tuned Question Title * 13. How does your team currently prioritize incidents and alerts? By technical severity only (CPU usage, error rate thresholds) By service tier (Tier 1/2/3 classification) By customer or business impact (estimated revenue or user impact) By SLO/error budget burn rate We do not have a formal prioritization framework Question Title * 14. Does your organization have formally defined Service Level Objectives (SLOs) with associated error budgets? Yes — defined, actively monitored, and informing engineering decisions Yes — defined, but rarely reviewed or acted upon Partially — some services have SLOs, most do not We are planning to implement SLOs in the next 12 months No — we do not use SLOs Question Title * 15. What is your organization's current mean time to detect (MTTD) for Priority 1 production incidents? We detect before customer impact (proactive) Less than 5 minutes 5 to 15 minutes 15 to 30 minutes More than 30 minutes We typically find out from customers first Question Title * 16. What is your organization's current mean time to resolve (MTTR) for Priority 1 production incidents? Less than 15 minutes 15 to 30 minutes 30 to 60 minutes 1 to 4 hours More than 4 hours Question Title * 17. Do your alert definitions include documented runbooks or remediation guidance? Yes — all alerts link to runbooks Yes — most alerts link to runbooks (more than 75%) Partially — fewer than half of alerts have runbooks No — alerts exist without remediation documentation Question Title * 18. Can your team trace a customer's end-to-end journey through your production environment in real time? Yes — full distributed tracing across all services in the customer journey Partially — we can trace some services but have gaps Only for specific high-priority journeys (e.g., checkout, login) No — we have limited ability to trace customer journeys end-to-end Question Title * 19. When a Priority 1 incident occurs, can your team immediately quantify the business impact (revenue affected, users impacted)? Yes — we have real-time business impact dashboards Approximately — we can estimate within 30 minutes We can calculate it but it takes more than 30 minutes We typically cannot quantify business impact in real time We have never attempted to quantify business impact during incidents Question Title * 20. How are observability dashboards and data consumed by non-engineering business stakeholders? Executive dashboards are regularly reviewed by C-suite and VP-level leaders Business stakeholders have self-service access to relevant metrics Engineering provides on-request reports to business stakeholders Business stakeholders rarely or never see observability data Business stakeholders are not aware we have observability tooling Question Title * 21. Does your organization connect observability data to cloud cost management (FinOps)? Yes — we have integrated observability and FinOps dashboards Partially — some cost data is visible in our observability platform No — cost management and observability are handled separately We do not have a formal FinOps practice Question Title * 22. How would you rate your observability practice's contribution to the following business outcomes? No Contribution Minimal Contribution Moderate Contribution Strong Contribution Significant Contribution • Reducing unplanned downtime costs • Reducing unplanned downtime costs No Contribution • Reducing unplanned downtime costs Minimal Contribution • Reducing unplanned downtime costs Moderate Contribution • Reducing unplanned downtime costs Strong Contribution • Reducing unplanned downtime costs Significant Contribution • Accelerating new feature delivery • Accelerating new feature delivery No Contribution • Accelerating new feature delivery Minimal Contribution • Accelerating new feature delivery Moderate Contribution • Accelerating new feature delivery Strong Contribution • Accelerating new feature delivery Significant Contribution • Improving customer satisfaction scores (CSAT/NPS) • Improving customer satisfaction scores (CSAT/NPS) No Contribution • Improving customer satisfaction scores (CSAT/NPS) Minimal Contribution • Improving customer satisfaction scores (CSAT/NPS) Moderate Contribution • Improving customer satisfaction scores (CSAT/NPS) Strong Contribution • Improving customer satisfaction scores (CSAT/NPS) Significant Contribution • Informing cloud cost optimization decisions • Informing cloud cost optimization decisions No Contribution • Informing cloud cost optimization decisions Minimal Contribution • Informing cloud cost optimization decisions Moderate Contribution • Informing cloud cost optimization decisions Strong Contribution • Informing cloud cost optimization decisions Significant Contribution • Supporting compliance and audit readiness • Supporting compliance and audit readiness No Contribution • Supporting compliance and audit readiness Minimal Contribution • Supporting compliance and audit readiness Moderate Contribution • Supporting compliance and audit readiness Strong Contribution • Supporting compliance and audit readiness Significant Contribution Question Title * 23. How does your organization currently manage on-call responsibilities? Formal on-call rotation with SLA-backed escalation procedures On-call rotation exists but escalation procedures are informal A small core team handles all incidents informally Individual service owners are on-call for their own services only There is no formal on-call process Question Title * 24. Does your organization conduct formal post-incident reviews (blameless retrospectives) after major incidents? Yes — after every P1 incident, with documented findings shared broadly Yes — but inconsistently, only for the most severe incidents Informally, without documented findings or follow-through No — we do not have a formal post-incident review process Question Title * 25. How would you describe the state of observability ownership within your engineering organization? Centralized: a dedicated platform engineering or SRE team owns observability Federated: ownership is distributed with a central team setting standards Siloed: each team manages their own observability independently Unclear: ownership is not formally defined Outsourced: a third party manages our observability platform Question Title * 26. How frequently does your team proactively review observability data outside of incident response? Daily — team reviews dashboards and trends every day Weekly — formal weekly review meetings using observability data Monthly — periodic reviews only On-demand — only when investigating an issue Rarely or never Question Title * 27. What percentage of your team would you estimate is actively using your observability platform at least weekly? More than 75% — widespread adoption across the engineering org 50% to 75% 25% to 49% Less than 25% — limited to a small subset of the team We do not track platform adoption Question Title * 28. How would you rate the observability-related skills and knowledge within your current engineering team? Very Limited Basic Awareness Moderate Capability Strong Capability Advanced / Highly Skilled Instrumentation and agent configuration Instrumentation and agent configuration Very Limited Instrumentation and agent configuration Basic Awareness Instrumentation and agent configuration Moderate Capability Instrumentation and agent configuration Strong Capability Instrumentation and agent configuration Advanced / Highly Skilled Dashboard design and data visualization Dashboard design and data visualization Very Limited Dashboard design and data visualization Basic Awareness Dashboard design and data visualization Moderate Capability Dashboard design and data visualization Strong Capability Dashboard design and data visualization Advanced / Highly Skilled SLO definition and error budget management SLO definition and error budget management Very Limited SLO definition and error budget management Basic Awareness SLO definition and error budget management Moderate Capability SLO definition and error budget management Strong Capability SLO definition and error budget management Advanced / Highly Skilled Distributed tracing implementation Distributed tracing implementation Very Limited Distributed tracing implementation Basic Awareness Distributed tracing implementation Moderate Capability Distributed tracing implementation Strong Capability Distributed tracing implementation Advanced / Highly Skilled Observability platform administration and optimization Observability platform administration and optimization Very Limited Observability platform administration and optimization Basic Awareness Observability platform administration and optimization Moderate Capability Observability platform administration and optimization Strong Capability Observability platform administration and optimization Advanced / Highly Skilled Question Title * 29. How satisfied is your organization with the current ROI from your observability platform investment? Very Dissatisfied - We are not getting value Dissatisfied Neutral / Moderate Value Satisfied Very Satisfied - ROI is clear and measurable How satisfied is your organization with the current ROI from your observability platform investment? How satisfied is your organization with the current ROI from your observability platform investment? Very Dissatisfied - We are not getting value How satisfied is your organization with the current ROI from your observability platform investment? Dissatisfied How satisfied is your organization with the current ROI from your observability platform investment? Neutral / Moderate Value How satisfied is your organization with the current ROI from your observability platform investment? Satisfied How satisfied is your organization with the current ROI from your observability platform investment? Very Satisfied - ROI is clear and measurable Question Title * 30. How much does your organization spend annually on observability platforms and tooling (including licenses, infrastructure, and related tools)? Less than $100K $100K to $499K $500K to $999K $1M to $4.9M $5M or more We do not track observability-specific spend separately Question Title * 31. Approximately what percentage of your purchased observability platform capacity (licenses, ingest, etc.) is actively utilized? Less than 30% — significant unused capacity 30% to 49% 50% to 74% 75% to 89% 90% or more — we are near or at capacity limits Question Title * 32. Has your organization experienced significant observability data ingest cost overruns in the past 12 months? Yes — significant overruns requiring budget reallocation Yes — minor overruns managed within existing budget No — ingest costs are predictable and within budget We do not actively track observability ingest costs Question Title * 33. How many distinct monitoring and observability tools (including APM, logging, infrastructure monitoring, AIOps) does your organization currently use? 1 to 2 tools (consolidated) 3 to 4 tools 5 to 7 tools 8 to 10 tools More than 10 tools Question Title * 34. Which of the following best describes your organization's observability platform strategy over the next 12 to 24 months? Consolidate onto fewer platforms (rationalization in progress) Stay with current platform mix — optimize in place Expand with additional tools or platforms Evaluate and potentially replace primary platform No formal observability platform strategy defined Question Title * 35. How is your observability budget expected to change over the next 12 months? Significant increase (more than 20%) Moderate increase (5% to 20%) Flat — approximately the same Decrease Observability budget is not separately tracked Question Title * 36. Which observability capabilities are your highest investment priorities for the next 12 months? (Rank top 3) Question Title * 37. Is your organization currently evaluating or using AI-powered features in your observability platform? Yes — actively using AI features with measurable value Yes — piloting or evaluating AI features No — planning to evaluate in the next 12 months No — not a current priority We are skeptical about the ROI of AI in IT operations Question Title * 38. Does your organization currently use or plan to use managed observability services (outsourcing platform management and/or incident response)? Yes — we currently use a managed observability service provider Actively evaluating managed services Planning to evaluate in the next 12 months No — we manage observability entirely in-house We were not aware this option existed Question Title * 39. What is the single biggest barrier preventing your organization from advancing its observability maturity? Lack of internal expertise and skills Insufficient engineering bandwidth Tool complexity — existing tools are difficult to use optimally Budget constraints Organizational silos — lack of cross-team alignment No executive sponsor for observability investment Legacy technology limiting instrumentation possibilities Unclear ROI — difficult to justify investment Question Title * 40. How prepared is your organization for AI-powered IT operations that require high-quality observability foundations? Not at All PreparedOur observability foundation is minimal or fragmented, not AI supported. Slightly PreparedSome observability tools or practices exist, but they are inconsistent or limited. Moderately PreparedBasic observability practices (metrics, logs, monitoring) are in place. Well PreparedStrong observability established; consistent telemetry, monitoring & visibility Fully PreparedOur observability maturity is ready to enable AI-powered operations How prepared is your organization for AI-powered IT operations that require high-quality observability foundations? How prepared is your organization for AI-powered IT operations that require high-quality observability foundations? Not at All PreparedOur observability foundation is minimal or fragmented, not AI supported. How prepared is your organization for AI-powered IT operations that require high-quality observability foundations? Slightly PreparedSome observability tools or practices exist, but they are inconsistent or limited. How prepared is your organization for AI-powered IT operations that require high-quality observability foundations? Moderately PreparedBasic observability practices (metrics, logs, monitoring) are in place. How prepared is your organization for AI-powered IT operations that require high-quality observability foundations? Well PreparedStrong observability established; consistent telemetry, monitoring & visibility How prepared is your organization for AI-powered IT operations that require high-quality observability foundations? Fully PreparedOur observability maturity is ready to enable AI-powered operations Question Title * 41. What does your organization wish it had done differently when implementing your current observability practice? Question Title * 42. What observability capability, if you had it today, would have the most immediate business impact? Question Title * 43. Any additional comments on your observability journey that you would like to share? Done