AWS Virtual Private Gateway Customer VPN Issue
Incident Report for Solve Networks
Postmortem

All times in the following description are reflected in Central US time zone.

As a result of subscriber growth, planned maintenance was scheduled to provision new IP blocks for additional Enhanced SIM customers on Verizon. This Verizon change window occurred 8/24/2023 from 11:00-11:30PM. Testing was carried out prior to and following the network provisioning change to ensure previously connected Verizon Enhanced SIMs remained online following the change. Verizon Enhanced connectivity through the packet gateway and Internet egress was unaffected, thus the change was deemed successful.

Starting in the 5AM hour on 8/25/2023, a single customer report of VPN connectivity issues came in. The Support team replied within 3 minutes of initial customer notification and began triaging the issue. Nearing the 7AM hour, another customer report of VPN issues was provided with a start time within the Verizon change window. At that point all customer VPN contacts were notified individually to confirm the status and reachability through their VPN. All customers reported healthy VPN status and reachability apart from those utilizing AWS Virtual Private Gateway for their VPN endpoint.

The routes added as a result of the change window were automatically propagated by design via BGP for customers using Dynamic Routing. As a result of the team’s investigation, the network engineering team identified an AWS Direct Connect/Virtual Private Gateway prefix count limitation, when exceeded, will put the BGP in a down state for the VPN connection. A workaround was investigated, developed and tested in the 9AM hour to filter the new routes that caused the prefix exception for AWS VPGs.

The issue only affected customers using AWS Virtual Private Gateway for VPN termination.  Customers using AWS Transit Gateway along with a variety of virtual (AWS/Azure) and on-prem appliances from Cisco, Cradlepoint, Check Point, Fortinet, and Palo Alto were not affected.

This incident has been marked resolved. The workaround will remain in place while AWS VPG requirements are considered for a future implementation to support the rapid growth of the Solve Networks customer base.

Posted Aug 25, 2023 - 15:35 CDT

Resolved
This incident has been resolved. The workaround will remain in place while AWS VPG requirements are considered for a future implementation.
Posted Aug 25, 2023 - 13:14 CDT
Monitoring
A fix has been implemented and is being monitored to decrease the prefixes advertised to be below the AWS Virtual Private Gateway limit. Customers reporting VPN re-establishment.
Posted Aug 25, 2023 - 10:57 CDT
Identified
The issue has been identified as specific to AWS Virtual Private Gateways (AWS Transit Gateways are not affected). In a routine private network maintenance, additional IP blocks were provisioned to Solve Networks Enhanced Infrastructure. By design these routes are automatically propagated via BGP. AWS Direct Connect has a prefix limit which will then take the BGP session down. A workaround has been identified and is in development and testing. All other customer VPN appliances/infrastructure did not have an issue with the routing updates.
Posted Aug 25, 2023 - 09:05 CDT
Update
The issue has been identified as specific to AWS Virtual Private Gateways (AWS Transit Gateways are not affected). In a routine private network maintenance, additional IP blocks were provisioned to Solve Networks Enhanced Infrastructure. By design these routes are automatically propagated via BGP. AWS Direct Connect has a prefix limit which will then take the BGP session down. A workaround is being investigated. All other customer VPN appliances/infrastructure did not have an issue with the routing updates.
Posted Aug 25, 2023 - 08:45 CDT
Investigating
An issue is being investigated with Dynamic Route IPsec VPN setups (utilizing BGP) and Enhanced SIM reachability. Enhanced SIMs are still online and passing traffic, but the ability to reach them over customer IPsec is degraded. At this time, the issue is only confirmed to be present with AWS Native VPN setups.
Posted Aug 25, 2023 - 05:20 CDT
This incident affected: Verizon (Verizon Enhanced Network Infrastructure), AT&T (AT&T Enhanced Network Infrastructure), and Solve Networks (IPsec VPN Service).