AWS NAT Gateway Optimization
Written by: Ryan McIntire, Sr. Technical Manager, FinOps, Vega Cloud
In the cloud world, there comes a time when EC2 instances in a private subnet will inevitably need to reach resources on the Internet. That is where the AWS NAT Gateway comes in. This hosted service provides an exit point that allows instances to reach resources on the Internet while keeping safely within a private subnet.
The beauty of a NAT Gateway is that it is an AWS-managed service, so other than the initial setup, it maintains itself. NAT Gateways also scale up to respond to increased throughput from the base rate of 5 Gbps up to 100 Gbps. In contrast, deploying an instance as a NAT device requires additional care and feeding (operating system updates, etc.). To increase the throughput of a NAT instance, you may need to manually scale up the instance to handle additional network traffic.
Since the NAT Gateway is a managed AWS service, offloading the administrative work comes at a cost. The NAT Gateway has a base price of $0.045 per device per hour. Additionally, data processed by the NAT Gateway is billed at $0.045 per GB. If you configure for VPC for maximum redundancy, this means deploying a NAT Gateway in each Availability Zone of your private subnets. As an example, let’s assume you have EC2 instances deployed in a VPC with three private subnets, one per Availability Zone. Let’s also assume that your EC2 workloads generate 1000 GB of traffic per month per AZ. With this in mind, your monthly NAT Gateway costs should break out as follows:
- NGW-AZ1
- Hourly cost $0.045 * 730 hours per month = $32.85
- Data cost $0.045 * 1000 GB per month = $45.00
- Total monthly cost for AZ1 = $77.85
- NGW-AZ2
- Hourly cost $0.045 * 730 hours per month = $32.85
- Data cost $0.045 * 1000 GB per month = $45.00
- Total monthly cost for AZ2 = $77.85
- NGW-AZ3
- Hourly cost $0.045 * 730 hours per month = $32.85
- Data cost $0.045 * 1000 GB per month = $45.00
- Total monthly cost for AZ3 = $77.85
- Total monthly NGW cost for VPC1 = $233.55
The total NGW cost across the three AZs is fine, but we can make some improvements. First, we need to learn more about the traffic itself. To accomplish this, we can use VPC flow logs. VPC flow logs capture information about network requests within your VPC. Logs are captured and sent to S3 for analysis where you can use a tool like AWS Athena to query the raw data. In our scenario, let’s assume that half of the data bound for our NAT Gateways is actually S3 reads and writes. Since AWS S3 uses a public endpoint, traffic bound for S3 needs to leave our AZ via our NGW and then out to the Internet where it is routed to the S3 public endpoint. Luckily for us, there is a better way, AWS PrivateLink!
AWS PrivateLink is a service that allows you to create virtual network endpoints to connect to AWS public services, connect your VPC to a provider VPC, or connect your VPC to services in a third-party VPC. For our scenario, we can deploy a gateway VPC endpoint for S3 traffic. The S3 endpoint essentially is a shortcut from our VPC to S3 which eliminates the need to route traffic out through the NAT Gateway to the Internet. As a bonus, there are no data transfer charges to S3 when using a VPC endpoint.
Let’s update our scenario to reflect half of our NGW traffic being rerouted to S3 via our S3 endpoint.
- NGW-AZ1
- Hourly cost $0.045 * 730 hours per month = $32.85
- Data cost $0.045 * 500 GB per month = $22.50
- Total monthly cost for AZ1 = $55.35
- NGW-AZ2
- Hourly cost $0.045 * 730 hours per month = $32.85
- Data cost $0.045 * 500 GB per month = $22.50
- Total monthly cost for AZ2 = $55.35
- NGW-AZ3
- Hourly cost $0.045 * 730 hours per month = $32.85
- Data cost $0.045 * 500 GB per month = $22.50
- Total monthly cost for AZ3 = $55.35
- S3 Gateway Endpoint = $0.00
- Total monthly NGW cost for VPC1 = $166.05
By deploying the S3 VPC endpoint we end up saving roughly 29% over our original scenario.
- Original cost $233.55
- Cost with S3 endpoint $166.05
- Cost savings $67.50, ~29%
This is definitely an improvement, but I think we can still zone in on some additional savings. Assuming we would like to maximize savings, we can consolidate our three NGWs into a single NGW in AZ1. There are a few caveats that come with consolidating NAT Gateways, though. First, outbound traffic from our EC2 instances in AZ2 and AZ3 is billed at $0.010 per GB, and inbound traffic to AZ1 is billed at $0.010 per GB. In our scenario, we are not generating enough traffic for this to become an issue, but a significant upswing in traffic may affect your bottom line. Also, by consolidating into a single NGW, a routing failure or an issue with NGW-AZ1 could potentially landlock your EC2 instances in AZ2 and AZ3. Assuming these are acceptable risks, let’s take a look at the numbers.
- NGW-AZ1
- Hourly cost $0.045 * 730 hours per month = $32.85
- Data cost $0.045 * 1500 GB per month = $67.50
- Inbound traffic from AZ2 $0.010 * 500 GB = $5.00
- Inbound traffic from AZ3 $0.010 * 500 GB = $5.00
- Total monthly cost for AZ1 = $110.35
- AZ2
- Outbound traffic to AZ1 $0.010 * 500 GB = $5.00
- AZ3
- Outbound traffic to AZ1 $0.010 * 500 GB = $5.00
- S3 Gateway Endpoint = $0.00
- Total monthly NGW cost for VPC1 = $120.35
By consolidating our three NAT Gateways into a single NAT Gateway in AZ1 we make another solid improvement to our bottom line.
Original cost $233.55
Cost with S3 endpoint and consolidated NGW $120.35
Cost savings $113.20, ~48%
Let’s recap the items we’ve covered. First, NAT Gateways have an inherent hourly cost whether they are used or sitting idle. An idle NGW can rack up almost $400 in hourly charges per year, so it is a good idea to keep apprised of any unused NAT Gateways in your VPCs. Second, use netflow tools like VPC flow logs to get an idea of traffic on your VPCs. This gives you visibility into the types of traffic and chatty resources on your network. Third, depending on your network needs, you may be able to find savings by consolidating multiple NAT Gateways into a single NAT Gateway in a transit VPC.
How can Vega help? The FinOps Geniuses at Vega have taken the guesswork out of identifying idle NAT Gateways by building logic into the Vega Optimize engine! Under the Reports section, browse to Vega Optimize > Idle NatGateway. The Optimize engine scours your AWS spend data for any NAT Gateways that have generated cost over the past 30 days but have no network data logged. Any idle NGWs are flagged and added to your dashboard for review.
OK, I’ve cleaned up the idle NGWs, Now how do I identify other possible areas of improvement? You can use the Vega Inform engine to visualize your NAT Gateway spend across all of your AWS account. Under the Reports section, browse to Vega Inform > Cost Navigator. Set your filters as follows:
Group By: Cloud Provider Identifier
Cloud Provider: AWS
Product Detail Category: NatGateway
These filters give you a daily view of your NAT Gateway spend across your AWS accounts. Based on your results, you can further drill down into a single account by updating your Cloud Provider Identifier filter to the AWS account with the largest NAT Gateway spend. Then you can change the Group By filter to Location (Region) to determine which region is the highest contributor to your NGW cost. From there, you can use VPC flow logs from within your AWS account to get a feel for your network traffic needs.
Contact us at info@vegacloud.io or check out our website www.vegacloud.io!