Resolved -
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.
Nov 18, 22:58 UTC
Monitoring -
Engineering has released a fix and as of 16:47 UTC, customers should no longer experience recent queries not being taken into account when we generate recommendations for Adaptive Metrics.
We will continue to monitor for recurrence and provide updates accordingly.
Nov 18, 16:48 UTC
Investigating -
As of 14:49 UTC, we became aware of an issue with Adaptive Metrics. Users may experience that recent queries are not taken into account when we generate recommendations for Adaptive Metrics.
Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Nov 18, 16:01 UTC
Resolved -
Due to this bug reported in https://github.com/kubernetes/kubernetes/issues/127370, we were affected by an issue causing K8S service endpoints not getting updated when pods are stopped/started if there are more than 1k pods matching the service. This caused a temporary outage in Mimir gossiping services, which further resulted in failures to ingest and query metrics for a short time. This issue has been resolved.
Nov 18, 13:30 UTC
Resolved -
This incident has been resolved.
Nov 18, 11:12 UTC
Monitoring -
We experienced an issue affecting access to Grafana Cloud portal. Our team has identified the root cause and deployed a fix. We are actively monitoring the situation to ensure the issue is fully resolved.
If you encounter any further difficulties, please reach out to our support team. We apologize for the inconvenience and thank you for your patience.
Nov 18, 10:47 UTC
Resolved -
Rollback has been completed as of 17:20 UTC. At this time, we are considering this issue resolved. No further updates.
Nov 15, 17:23 UTC
Identified -
We have introduced an erroneous security policy (CSP) that can break user dashboards. We are rolling back the update now.
Nov 15, 12:21 UTC
Resolved -
This incident has been resolved.
Nov 14, 14:44 UTC
Investigating -
At approximately 13:45 UTC we became aware of an issue currently affecting the prod-us-east 0 region. Users may experience a brief interruption, as well as potential data loss in their Tempo environment. Our team is working towards a resolution and will provide updates as we have more to share.
Nov 14, 14:11 UTC
Resolved -
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.
Nov 7, 20:03 UTC
Monitoring -
Engineering has released a fix and as of 19:40 UTC, customers should no longer experience query or ingestion failures. We will continue to monitor for recurrence and provide updates accordingly.
Nov 7, 19:54 UTC
Identified -
As of 18:40 UTC, our engineering team became aware an issue with some unhealthy ingesters within the us-central2 region of Cloud Metrics. Users in this region may be experiencing ingestion and/or query failures.
Nov 7, 19:54 UTC