Skip to content

AWS VPC Internals

Scope

This document explains AWS VPC the way an infrastructure engineer should understand it:

  • address space
  • subnets
  • route tables
  • internet gateway
  • NAT gateway
  • security groups
  • NACLs
  • public vs private subnet reality
  • common path analyses

Reference anchors: - https://docs.aws.amazon.com/vpc/latest/userguide/how-it-works.html - https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html - https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html - https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html


Big Picture

A VPC is your logically isolated virtual network in AWS.

It gives you: - IP space - subnets - routing domains - attachment points to gateways and other networks - security boundaries

The clean mental model is:

VPC = address space + subnets + route policy + attachments + security controls

CIDR and Address Planning

A VPC starts with a CIDR block.

Subnets carve that space into smaller segments.

This is not just bookkeeping. Address design affects: - scaling headroom - availability-zone layout - service placement - peering/Transit Gateway compatibility - on-prem route integration

Bad CIDR decisions age like milk.


Subnets

A subnet lives in exactly one Availability Zone.

Important implication: subnets are both a routing boundary and an AZ-scoped placement construct.

People say "public subnet" and "private subnet" as if AWS has those object types. It does not.

A subnet is effectively called "public" if its route table allows internet-routable egress via an internet gateway and instances have appropriate addressing/security. Otherwise it is effectively private.

So: public/private is behavior, not a special subnet species.


Route Tables

Each subnet is associated with one route table. A route table tells traffic where to go based on destination prefixes.

Common targets: - local VPC routing - internet gateway - NAT gateway - VPC peering - Transit Gateway - VPN/Direct Connect path - VPC endpoints in some patterns

This is the heart of VPC behavior: routing is policy attached to subnet context.


Internet Gateway (IGW)

The IGW is the path for internet-routable traffic.

For a typical public-instance path, you need: - subnet route to IGW - instance with public IPv4 or suitable public exposure model - security rules allowing the traffic - relevant host firewall/app listening state

If one of those is missing, "but it is in a public subnet" means nothing.


NAT Gateway

NAT gateway allows instances in private subnets to initiate outbound connectivity without exposing them directly for inbound initiation from the internet.

Classic pattern: - private subnet default route -> NAT gateway - NAT gateway placed in public subnet with IGW path

Why it exists: - patching - package pulls - API access - safer outbound-only internet access model for internal instances

Why it annoys people: - cost - AZ design considerations - hidden dependency during outages


Security Groups vs NACLs

Security Groups

Stateful. Attached to ENIs/instances. Usually your primary control plane for instance traffic policy.

Network ACLs

Stateless. Subnet-level filtering. Lower-level, coarser, and easier to misuse.

A good default answer: use security groups for most intent; use NACLs when you have a clear reason.


East-West vs North-South Traffic

East-west

Traffic within VPC or between private network segments.

North-south

Traffic entering/leaving the VPC toward internet or other external domains.

This distinction matters because troubleshooting paths differ: - route table - SG - NACL - gateway - DNS - endpoint path - load balancer involvement


Public and Private Instance Path Examples

Public web server

Need: - subnet route to IGW - public addressability - security group ingress - app listening - return path intact

Private app server needing outbound updates

Need: - route to NAT gateway - NAT gateway in public subnet - IGW for NAT subnet - SG egress policy - DNS resolution


High Availability Considerations

Subnets are AZ-scoped. NAT gateways are also AZ-scoped resources operationally.

That means resilient design often means: - multiple subnets across AZs - route planning per AZ - avoiding accidental single-AZ egress dependency

People forget this and then discover "private internet access" depended on one zone's NAT.


Common Failure Patterns

Route table wrong

Classic.

Security group wrong

Also classic.

NACL denies return path

Because stateless filters enjoy causing suffering.

No public IP / wrong exposure assumption

Public subnet alone is not enough.

NAT gateway route missing

Private instances cannot reach out.

Overlapping CIDRs

Peering/transit/on-prem pain.

DNS issue blamed on network or vice versa

The ancient tradition continues.


Useful Checks

In AWS: - subnet association - route table entries - IGW attachment - NAT gateway state - SG rules - NACL rules - ENI/IP assignment

On host: - local routes - app listen state - host firewall - DNS resolution


Interview-Level Things to Explain

You should be able to explain:

  • what a VPC is
  • why a subnet is AZ-scoped
  • why "public subnet" is behavioral shorthand
  • how IGW differs from NAT gateway
  • SG vs NACL
  • why route tables are central to traffic path reasoning

Fast Mental Model

A VPC is an isolated IP and routing domain where subnet associations, route tables, gateway attachments, and security controls jointly determine whether packets can move between instances, other networks, and the public internet.

Wiki Navigation

Prerequisites