If you’re interested in this topic (agree or disagree), we’d love to have you join the community.
Observability in Modern On-prem Applications
As a vendor, it’s important to ensure an application is highly operable from an observability standpoint. Many SaaS and on-prem software teams are familiar with these concepts, but delivering Modern On-prem applications means more than just exposing metrics, logs, and traces. When the team operating the software (the end user) are not experts in its operation, it’s vital to focus on delivering actionable insights.
Bundled Alerting with Prometheus operator
When end-users deploy a tool like Prometheus Operator, alerting thresholds can be bundled with an application as Kubernetes YAML, enabling teams to codify their recommended alerting thresholds. This is a vast improvement over just exposing metrics, or even publishing “recommended thresholds” in user-facing documentation
Composite Health Checks: Application CRD
In a similar vein, the sig-apps Application CRD can define a distributed health-check that can define the whether an entire distributed application is healthy. This can be included in the CRD, with or without the operator, as a way of codifying what services in an application need to be up for it to be considered “healthy”.
One of the core selling points of Modern On-prem is that end-customers won’t need to become experts in each OTS application they’re deploying. When things go wrong, sometimes exposing metrics, logs, and traces to an end customer won’t be sufficient to enable users to self-diagnose the issue. In these cases, vendors must to give their users a simple way to export the relevant information into a diagnostic bundle to send to their team to perform disconnected troubleshooting.