Amazon Web Services (AWS) has confirmed that an application programming interface (API) issue in one of its major US datacentre regions is what led to an outage that took several of its biggest reference customers offline on the evening of Tuesday 7 December.
Based on social media reports, it appears as though the service issues began around 3pm UK time, with users reporting problems when trying to access a wide range of services hosted from the public cloud giant’s US-East-1 region in North Virginia.
The outage is known to have affected users of several of the firm’s major cloud services, including its Elastic Compute Cloud platform, its DynamoDB database offering and its video calling service Chime.
This in turn led to thousands of web users across the US experiencing issues when trying to use the streaming services of Netflix and Disney+, while the firm’s e-commerce arm, Amazon.com, was also affected by the issues.
Additionally, some users reported problems when trying to operate apps for their internet-connected devices, with autonomous vacuum cleaner brand Roomba confirming via a statement on its service status page that its services were being affected by the outage. Users of Amazon-owned smart doorbell company Ring also suffered technical difficulties.
During the first hour of the outage, users of the microblogging social media website Twitter raised complaints that Amazon’s support accounts and service status page were showing no signs of an outage, despite widespread reports online of users being unable to access various sites and services.
The company later confirmed the issues were affecting its monitoring and incident reporting tools on its service status page, which is why it had been delayed in updating customers about the incident.
The company’s service status page suggests the first sign that an outage had occurred could be seen when issues cropped up with “multiple AWS APIs” in the US-East-1 region, which were later upgraded to “API errors” that were blighting multiple AWS services hosted there.
Several hours into the outage, the company published an update that shed further light on the root cause of the issue, with the firm citing an “impairment of several network devices” with the last update – published 12.35am UK time – stating it was now working towards the “recovery of any impaired services.”
The same datacentre region experienced another prolonged outage just over a year ago, in late November 2020, linked to a defect in the API of its real-time data-streaming service, Kinesis Data Streams (KDS).