Three peerings, $10 a month: when VPC Peering beats Transit Gateway

aws networking terraform cost-optimization

We run a small AWS Organization. One ops/internal account holds a handful of shared internal services. Three workload accounts (dev, staging, prod) each have their own VPC and need to reach those services. Single region, us-east-1, three AZs each. The AWS reference architectures all point at Transit Gateway. I priced it out against our actual traffic and went with VPC Peering instead. Here’s the reasoning and the numbers.

The problem

Four AWS accounts under one Organization:

  • ops-internal: shared services VPC, 172.16.0.0/16. Hosts a small set of internal tools — observability, secrets management, internal connectivity control plane. Three services in total.
  • dev: 172.17.0.0/16
  • staging: 172.18.0.0/16
  • prod: 172.19.0.0/16

The workload VPCs need to reach the ops VPC. They do not need to reach each other — dev shouldn’t see prod, prod shouldn’t see dev. Hub-and-spoke, ops VPC at the centre.

Constraints:

  • Small team. Whatever I pick, I’m the one maintaining it.
  • Single region, no expansion plans for at least a year.
  • Cost-sensitive. We’re a startup; every recurring line item on the AWS bill needs to justify itself.
  • All four accounts already under AWS Organizations, so cross-account IAM is straightforward.

That’s the setup. The interesting decision is the connectivity layer.

The options I considered

OptionWhat it isSweet spotCost model
VPC PeeringDirect 1:1 link between two VPCsFew VPCs, no transitive routing$0/hr, data transfer only
Transit GatewayRegional router, hub-and-spokeMany VPCs, transitive routing, central inspection$0.05/hr per attachment + $0.02/GB
Site-to-Site VPNIPSec tunnelsHybrid (on-prem ↔ AWS)$0.05/hr per connection + data out
PrivateLinkNLB-fronted service endpointsExposing specific services across accounts$0.01/hr per endpoint per AZ + $0.01/GB
VPC LatticeApplication-layer service meshMany services, identity-aware auth$0.025/hr per service + $0.025/GB
Userspace overlay (Tailscale-style)WireGuard mesh on top of any underlayApp-layer connectivity for participating hostsFree / self-hosted

The overlay option is worth a separate note. A WireGuard mesh doesn’t replace VPC peering; it sits on top of whatever underlay you have, and only hosts that join the mesh can use it. For machine-to-machine traffic between EC2 instances that don’t run the agent — Prometheus scrapes, internal API calls — you still need VPC-level connectivity. Ruling it out as the only connectivity layer was easy: too many things would need to be mesh-aware.

Site-to-Site VPN is for hybrid (your data centre to AWS). Using it intra-AWS is overpriced and wrong-shaped — you’d pay for tunnels and customer-gateway operations to solve a problem AWS already solves with peering. I’m including it for completeness.

That leaves four serious contenders: Peering, TGW, PrivateLink, Lattice.

Why I picked VPC Peering

Four VPCs in hub-and-spoke means three peering connections. That’s it. The n*(n-1)/2 scaling warning everyone repeats only bites when every VPC needs to talk to every other VPC. In hub-and-spoke with one hub, it’s n-1.

Tradeoffs I knowingly accepted:

  • No transitive routing. If dev ever needs to reach staging, I’d have to add a fourth peering or rethink. Today it doesn’t, and today’s problem is the only problem I’m solving.
  • Manual route table entries on both sides. Both VPCs in a peering need explicit routes pointing at the pcx-* ID. Forget one side and you get a silent black-hole (more below).
  • CIDRs must not overlap. I planned the address space up front — adjacent /16s under a /14 supernet — so this was a one-time cost.
  • No central egress, inspection, or firewall hop. Acceptable for us; we don’t have a SecOps team mandating a transit inspection point.

The thing that tipped it: at four VPCs, TGW’s attachment fee alone runs ~$146/month before any data charge. Peering is $0/month at rest. The “TGW scales better” argument is true, but it’s not free — and we’re not at the scale where the operational simplicity is worth $146/month.

The setup walkthrough

Three Terraform highlights. The pattern repeats for each spoke.

Provider aliases for cross-account. The peering connection lives on the requester (ops) side; the accepter resource lives on the workload side. Both need explicit providers — my first cut had the aws_vpc_peering_connection_accepter running under the requester provider by mistake. Terragrunt happily applied; the accepter never ran:

provider "aws" {
  alias  = "ops"
  region = "us-east-1"
  assume_role { role_arn = "arn:aws:iam::${var.ops_account_id}:role/TerragruntExec" }
}

provider "aws" {
  alias  = "workload"
  region = "us-east-1"
  assume_role { role_arn = "arn:aws:iam::${var.workload_account_id}:role/TerragruntExec" }
}

Requester side (ops account). Don’t set peer_region for same-region peerings — that argument is for inter-region only, and setting it switches AWS into cross-region mode (different pricing, different option-block rules). The auto_accept flag here only matters when both VPCs are in the same account; for cross-account you must declare a separate aws_vpc_peering_connection_accepter resource under the accepter’s provider:

resource "aws_vpc_peering_connection" "ops_to_workload" {
  provider      = aws.ops
  vpc_id        = aws_vpc.ops.id
  peer_vpc_id   = var.workload_vpc_id
  peer_owner_id = var.workload_account_id
  auto_accept   = false
  # no peer_region — same-region peering

  tags = { Name = "ops-to-${var.workload_env}" }
}

Accepter side (workload account):

resource "aws_vpc_peering_connection_accepter" "workload_from_ops" {
  provider                  = aws.workload
  vpc_peering_connection_id = aws_vpc_peering_connection.ops_to_workload.id
  auto_accept               = true

  tags = { Name = "from-ops" }
}

Route table entries — both sides:

resource "aws_route" "ops_to_workload" {
  provider                  = aws.ops
  route_table_id            = aws_route_table.ops_private.id
  destination_cidr_block    = var.workload_vpc_cidr
  vpc_peering_connection_id = aws_vpc_peering_connection.ops_to_workload.id
}

resource "aws_route" "workload_to_ops" {
  provider                  = aws.workload
  route_table_id            = var.workload_private_rt_id
  destination_cidr_block    = aws_vpc.ops.cidr_block
  vpc_peering_connection_id = aws_vpc_peering_connection.ops_to_workload.id
}

The gotcha that ate 30 minutes of my life. First peering went ACTIVE. Security groups were configured. Traffic black-holed. No error, no ICMP unreachable, just silent packet drops. I’d forgotten the route table entry on the accepter side. The state of the peering and the state of routing are independent; AWS will happily report the peering healthy while none of the actual traffic has anywhere to go.

Second gotcha: DNS. Internal hostnames resolved to public IPs from the workload accounts until I enabled remote DNS resolution. For cross-account peerings the option has to be set from each side’s account — the requester can’t set the accepter-side flag and vice versa. That means two separate resources, each under the right provider, each referencing the right peering ID (the accepter resource references the accepter’s id, not the requester’s):

resource "aws_vpc_peering_connection_options" "requester" {
  provider                  = aws.ops
  vpc_peering_connection_id = aws_vpc_peering_connection.ops_to_workload.id
  requester { allow_remote_vpc_dns_resolution = true }
}

resource "aws_vpc_peering_connection_options" "accepter" {
  provider                  = aws.workload
  vpc_peering_connection_id = aws_vpc_peering_connection_accepter.workload_from_ops.id
  accepter  { allow_remote_vpc_dns_resolution = true }
}

In hindsight, I should have built a peering-pair module on day one — requester, accepter, both route entries, both DNS option blocks, all behind one set of inputs. I inlined the first because “it’s just one peering,” copy-pasted for the second and third, and now I’m three peerings in with the duplication still sitting there. Refactoring would mean a state-move dance I haven’t prioritised. Classic.

Cost comparison with real numbers

Scenario: 3 workload VPCs ↔ 1 shared services VPC, us-east-1, ~500 GB/month cross-VPC traffic, 3 AZs. All prices link to AWS pricing pages so you can re-derive when (not if) they change.

VPC Peering. $0/hr for the connection itself. Data transfer is the only line item. AWS bills cross-AZ transfer twice — $0.01/GB on the sender’s account (out) plus $0.01/GB on the receiver’s account (in), so $0.02/GB combined per GB transferred. At ~500 GB/month across all three peerings: ~$10/month. Lower if you pin services to AZs, but I’ll keep the worst case. (VPC pricing)

Transit Gateway. 4 attachments × $0.05/hr × 730 hr = $146/month for attachments alone. Plus $0.02/GB processed × 500 GB = $10. ~$156/month before anything else. The same ~$10 cross-AZ data transfer applies to TGW too — the real differential is the $146/month in attachment fees, not the data layer. (TGW pricing)

Site-to-Site VPN. $0.05/hr per connection × 730 hr × 3 connections = $109.50/month, plus data transfer out and the operational burden of customer gateways. ~$110/month and the wrong tool for intra-AWS. (VPN pricing)

PrivateLink. Per-endpoint pricing escalates fast at multiple services. Three services × 3 AZs × $0.01/hr × 730 = $65.70/month per consumer VPC. Three workload VPCs consuming = $197/month for endpoints alone, plus $0.01/GB × 500 GB = $5. ~$202/month. (PrivateLink pricing)

VPC Lattice. $0.025/hr per service × 3 services × 730 = $54.75/month, plus $0.025/GB × 500 GB = $12.50. ~$67/month. (Lattice pricing)

OptionMonthly cost (this scenario)
VPC Peering~$10
VPC Lattice~$67
Site-to-Site VPN~$110
Transit Gateway~$156
PrivateLink~$202

Break-even with TGW. Data-transfer per GB is roughly the same either way (~$0.02/GB). The decision swings entirely on the attachment fee vs. the operational pain of n-1 peerings (hub-and-spoke) or n*(n-1)/2 (full mesh). My rule of thumb: above ~5 hub-and-spoke VPCs, or any full-mesh requirement, switch to TGW. Below that, peering wins on cost by an order of magnitude and the operational delta is “two more route table entries.”

When you should NOT use peering

  • More than ~5 VPCs. The n*(n-1)/2 curve makes management painful even in a hub-and-spoke if any spoke needs to reach another spoke.
  • Transitive routing. Peerings don’t transit. If A↔B and B↔C, A still can’t reach C without A↔C. TGW solves this natively.
  • Frequent CIDR changes. Every change is a coordination event across accounts.
  • Central egress or inspection. If compliance requires an inspection VPC or central NAT, you need TGW or a transit VPC pattern.
  • Cross-region at scale. Inter-region peering works, but data costs and operational overhead climb fast. TGW peering across regions is the better shape.
  • Service-level access control instead of network-level. That’s PrivateLink or Lattice territory.

Takeaways

  • Default-to-TGW is a cargo-cult choice for small teams. The AWS reference architectures are written for enterprises with dozens of accounts and dedicated network teams. You are (probably) not them.
  • The “peering doesn’t scale” warning is real but mis-stated. It doesn’t scale past ~5 VPCs in a full mesh. At 4 VPCs in hub-and-spoke, it’s three connections — boring, cheap, done.
  • Always compute connectivity cost on your actual traffic profile. The break-even point is dominated by attachment hours, not data — and AWS’s calculator defaults will steer you toward TGW even when you’d save $140/month with peering.
  • Build the peering-pair as a reusable Terraform module on day one. Don’t inline the first one because “it’s just one.” I did, and I’m still paying for it in copy-paste.
  • The silent failure modes are the real cost. Peering plus missing route table entries plus missing DNS option flags will burn an afternoon. Capture those in the module and never think about them again.