Resolved -
This incident has been resolved.
Feb 3, 17:56 UTC
Monitoring -
A fix as implemented, and we are seeing recovery throughout the rollout. We will continue to monitor results.
Feb 3, 17:38 UTC
Identified -
The issue has been identified and we are implementing a fix.
Feb 3, 17:29 UTC
Investigating -
As of 16:40 UTC, we are currently investigating an issue where IRM pages are not accessible. Users may experience errors or be unable to load IRM-related pages during this time.
Our team is actively working to identify the root cause and restore full functionality as quickly as possible. We will provide updates as more information becomes available.
Feb 3, 16:52 UTC
Resolved -
This incident has been resolved.
Jan 28, 20:25 UTC
Monitoring -
A fix has been implemented, and we are monitoring the results.
Jan 28, 18:24 UTC
Investigating -
We are currently investigating an issue impacting dashboards for users in the prod-us-central-3 region. This is preventing impacted dashboards from loading as expected.
This is also impacting a very small subset of users in the prod-us-central-0 region as well.
We will provide more details regarding the scope as they become available.
Jan 28, 17:27 UTC
Resolved -
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.
Jan 28, 00:22 UTC
Monitoring -
As of 22:55 UTC, we have observed marked improvement with the incident impacting IRM and OnCall. We are still investigating and will continue to monitor and provide updates.
Jan 27, 22:56 UTC
Investigating -
We are currently investigating an issue impacting some customers when accessing Grafana Oncall and IRM. Impacted customers may experience long load times, or even time-outs when attempting to access these components. We'll provide more information as it becomes available.
Jan 27, 20:37 UTC
Resolved -
We were experiencing increased write error rate for logs in prod-us-west-0 from 6:55 to 7:15 UTC. We have since observed continued stability and are marking this as resolved.
Jan 27, 07:49 UTC
Resolved -
Engineering has released a fix and as of 00:13 UTC, customers should no longer experience issues upgrading from Free to Pro subscriptions. At this time, we are considering this issue resolved. No further updates.
Jan 27, 00:13 UTC
Identified -
Engineering has identified the issue and is currently exploring remediation options. At this time, users will continue to experience the inability to upgrade from Free to Pro subscriptions.
We will continue to provide updates as more information is shared.
Jan 26, 21:52 UTC
Investigating -
As of 20:05 UTC, our engineering team became aware of an issue related to subscription plan upgrades. Users experiencing this issue will not be able to upgrade from a Free plan to a Pro subscription.
Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Jan 26, 20:53 UTC
Resolved -
This incident has been resolved.
Jan 23, 18:44 UTC
Monitoring -
We are noticing significant improvement, and things are stabilizing as expected. Our engineering teams will continue to monitor progress.
Jan 23, 16:55 UTC
Investigating -
We are currently investigating an issues impacting Email delivery for some Services, including Alert Notifications.
Jan 23, 15:37 UTC
Resolved -
The incident is resolved. We are in contact with customers affected by this change.
Jan 22, 22:29 UTC
Identified -
During the secrets migration in https://status.grafana.com/incidents/47d1q4sphrmj, secrets proxy URLs for some customers updated in the following regions: prod-us-central-0, prod-us-east-0, and prod-eu-west-2. This was an unexpected breaking change affecting a subset of customers.
This will specifically affect customers who are using secrets on private probes behind a firewall.
We are investigating. If your private probes are impacted, we ask you to update firewall rules for the secrets proxy to allow outbound connections to the updated hosts:
Note that this URL change affects only a small subset of customers, the majority of customers will not need to update firewall rules. For affected customers, private probes will show the following error in probe logs, for example: Error during test execution: failed to get secret: Get "https://gsm-proxy-prod-us-east-2.grafana.net/api/v1/secrets/.../decrypt": Forbidden undefined
Jan 21, 21:16 UTC
Completed -
The scheduled maintenance has been completed.
Jan 21, 16:00 UTC
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jan 21, 13:00 UTC
Scheduled -
We will perform planned maintenance to synthetic monitoring secrets on Wednesday January 21st, from 13:00 to 16:00 UTC, in the following regions: prod-us-central-0, prod-us-east-0, and prod-eu-west-2.
During maintenance synthetic monitoring checks using secrets will continue to run normally, but the secrets will be in a read-only state. Attempts to create/modify/delete secrets during maintenance will return an error until the maintenance is complete.
This maintenance is required to ensure the reliability of the secrets management system as we prepare for general availability of the feature. We will provide updates here as the maintenance progresses.
Jan 15, 19:25 UTC
Resolved -
We consider this incident as resolved since the latency hasn't been elevated since the fix was applied. The issue was caused by a latency spike in a downstream dependency, causing an increased backpressure on the Hosted Traces ingestion path, which degraded gateway performance and resulted in an elevated write latency. After clearing the affected gateway services the degraded state went away and normal operation was restored.
Jan 21, 15:21 UTC
Monitoring -
The issue was identified and a fix was applied. After applying the fix, latency went down to a regular and expected value. We're currently monitoring the component's health before resolving the incident.
Jan 21, 13:35 UTC
Investigating -
We're currently investigating an issue with elevated write latency in Hosted Traces prod-us-central-0 region. It's experiencing sustained high write latency since 7:20 AM UTC. Only a small subset of the requests are impacted.
Jan 21, 13:24 UTC