RCA Report: DDOS Attack against IPs in the 41.79.76.0/22 CloudAfrica subnet announced to upstream peers via BGP on ASN37352 on Friday 26th July 2024
Incident Overview
Date/Time of Incident Window Start: Friday 26th July 2024 at 13:36 (SAST / UTC+2)
Date/Time of Incident Window End: Saturday 27th July at 09h40 (SAST / UTC+2)
Detected By: Automated Monitoring
Initial Symptoms: At 13h36 on Friday 26th July 2024, our networks experienced rapid and sudden loss of Internet connectivity affecting all our Internet peers at the Teraco Isando JB1 and OADC JHB1 data centres. Internal VM, storage and network availability were not impacted.
Summary: A DDOS attack impacted Internet-facing connectivity from our edge firewall and routing devices in Teraco JB1 (Isando) and OADC JHB1 (Isando). Further details of the scale and scope of the DDOS are provided below.
Date of Report: 1st August 2024
Post-Mortem
Incident and Mitigation Details
Incident Timeline
The below were the specific timelines of loss of Internet connectivity during this DDOS attack window:
Friday 26th July 2024
[13:36] – [15:12]
[16:06] – [16:21]
[19:23] – [19:52]
Saturday 27th July 2024
[03:00] – [03:44]
[07:26] – [09:40]
The total downtime during the DDOS attack window was 5 hours and 18minutes.
Our response timeline was as follows:
[13:36]: Initial loss of connectivity generated alarms from monitoring systems and immediate notification of loss of connectivity to CloudAfrica team.
[13:40]: Review of network elements (routers, switches and firewalls) showed almost 100% CPU utilisation on our edge firewall devices. This complicated diagnostics because of significantly reduced responsiveness of these network elements.
[14:00]: DDOS was ascertained at approximately 14:00 after analysis of router connection and traffic flow statistics.
[14:10]: Review of options begun to block DDOS at edge router level in order to decrease/nullify impact on edge firewalls.
[16:00] Installation and configuration of BGP black-holing capability begun on edge routers.
[16:50] Initial implementation of BGP black-holing capability on edge routers completed.
[17:00]-(Sat)[09:40] Tweaking of volumetric attack parameters for BGP black-holing solution on control VM and re-installation of one of our firewalls to deal with a load-associated configuration corruption, with mitigation of DDOS attack at 09:40.
Investigation and Findings
Physical Inspection: N/A
Log Analysis: In excess of 600000 live connections at any one time, and up to 200000 connection requests per second were identified on edge routers during the attack window.
Hardware Diagnostics: Cisco ASA Firepower firewall devices and Cisco edge routers impacted by massive load due to flood of connections as a result of the DDOS, resulting in almost maximum CPU utilisation of these devices.
Environmental Factors: No abnormal environmental conditions (temperature, humidity) were detected.
Vendor Consultation: N/A
Root Cause
Primary Cause: Significant DDOS attack on one of our subnets overwhelmed the capacity of edge routers and firewalls at Teraco JB1 and OADC JHB1.
Secondary Cause: N/A
Background:
Contributing Factors: N/A
Impact Assessment
Service Downtime: 5hours and 18 minutes of loss of Internet connectivity was experienced during the DDOS incident window.
Data Loss: No data loss reported.
Performance Degradation: Significant disruption of Internet-facing services as per the previous timeline review. No internal network, VM or storage availability was impacted.
Services Impacted: Interruption of Internet Connectivity
Corrective and Preventive Measures
Immediate Actions:
We apologize sincerely for the impact to affected customers.
We are continuously taking steps to improve the CloudAfrica Platform and our processes to help ensure such incidents do not occur in the future.
Sincerely,
The CloudAfrica Team.