- networking
- l2
- deep-dive
- routing
- linux-networking --- Portal | Level: L2: Operations | Topics: Routing, Linux Networking Tools | Domain: Networking
AWS VPC Internals¶
Scope¶
This document explains AWS VPC the way an infrastructure engineer should understand it:
- address space
- subnets
- route tables
- internet gateway
- NAT gateway
- security groups
- NACLs
- public vs private subnet reality
- common path analyses
Reference anchors: - https://docs.aws.amazon.com/vpc/latest/userguide/how-it-works.html - https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html - https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html - https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html
Big Picture¶
A VPC is your logically isolated virtual network in AWS.
It gives you: - IP space - subnets - routing domains - attachment points to gateways and other networks - security boundaries
The clean mental model is:
CIDR and Address Planning¶
A VPC starts with a CIDR block.
Subnets carve that space into smaller segments.
This is not just bookkeeping. Address design affects: - scaling headroom - availability-zone layout - service placement - peering/Transit Gateway compatibility - on-prem route integration
Bad CIDR decisions age like milk.
Subnets¶
A subnet lives in exactly one Availability Zone.
Important implication: subnets are both a routing boundary and an AZ-scoped placement construct.
People say "public subnet" and "private subnet" as if AWS has those object types. It does not.
A subnet is effectively called "public" if its route table allows internet-routable egress via an internet gateway and instances have appropriate addressing/security. Otherwise it is effectively private.
So: public/private is behavior, not a special subnet species.
Route Tables¶
Each subnet is associated with one route table. A route table tells traffic where to go based on destination prefixes.
Common targets: - local VPC routing - internet gateway - NAT gateway - VPC peering - Transit Gateway - VPN/Direct Connect path - VPC endpoints in some patterns
This is the heart of VPC behavior: routing is policy attached to subnet context.
Internet Gateway (IGW)¶
The IGW is the path for internet-routable traffic.
For a typical public-instance path, you need: - subnet route to IGW - instance with public IPv4 or suitable public exposure model - security rules allowing the traffic - relevant host firewall/app listening state
If one of those is missing, "but it is in a public subnet" means nothing.
NAT Gateway¶
NAT gateway allows instances in private subnets to initiate outbound connectivity without exposing them directly for inbound initiation from the internet.
Classic pattern: - private subnet default route -> NAT gateway - NAT gateway placed in public subnet with IGW path
Why it exists: - patching - package pulls - API access - safer outbound-only internet access model for internal instances
Why it annoys people: - cost - AZ design considerations - hidden dependency during outages
Security Groups vs NACLs¶
Security Groups¶
Stateful. Attached to ENIs/instances. Usually your primary control plane for instance traffic policy.
Network ACLs¶
Stateless. Subnet-level filtering. Lower-level, coarser, and easier to misuse.
A good default answer: use security groups for most intent; use NACLs when you have a clear reason.
East-West vs North-South Traffic¶
East-west¶
Traffic within VPC or between private network segments.
North-south¶
Traffic entering/leaving the VPC toward internet or other external domains.
This distinction matters because troubleshooting paths differ: - route table - SG - NACL - gateway - DNS - endpoint path - load balancer involvement
Public and Private Instance Path Examples¶
Public web server¶
Need: - subnet route to IGW - public addressability - security group ingress - app listening - return path intact
Private app server needing outbound updates¶
Need: - route to NAT gateway - NAT gateway in public subnet - IGW for NAT subnet - SG egress policy - DNS resolution
High Availability Considerations¶
Subnets are AZ-scoped. NAT gateways are also AZ-scoped resources operationally.
That means resilient design often means: - multiple subnets across AZs - route planning per AZ - avoiding accidental single-AZ egress dependency
People forget this and then discover "private internet access" depended on one zone's NAT.
Common Failure Patterns¶
Route table wrong¶
Classic.
Security group wrong¶
Also classic.
NACL denies return path¶
Because stateless filters enjoy causing suffering.
No public IP / wrong exposure assumption¶
Public subnet alone is not enough.
NAT gateway route missing¶
Private instances cannot reach out.
Overlapping CIDRs¶
Peering/transit/on-prem pain.
DNS issue blamed on network or vice versa¶
The ancient tradition continues.
Useful Checks¶
In AWS: - subnet association - route table entries - IGW attachment - NAT gateway state - SG rules - NACL rules - ENI/IP assignment
On host: - local routes - app listen state - host firewall - DNS resolution
Interview-Level Things to Explain¶
You should be able to explain:
- what a VPC is
- why a subnet is AZ-scoped
- why "public subnet" is behavioral shorthand
- how IGW differs from NAT gateway
- SG vs NACL
- why route tables are central to traffic path reasoning
Fast Mental Model¶
A VPC is an isolated IP and routing domain where subnet associations, route tables, gateway attachments, and security controls jointly determine whether packets can move between instances, other networks, and the public internet.
Wiki Navigation¶
Prerequisites¶
- Networking Deep Dive (Topic Pack, L1)
Related Content¶
- Case Study: Source Routing Policy Miss (Case Study, L2) — Linux Networking Tools, Routing
- Networking Deep Dive (Topic Pack, L1) — Linux Networking Tools, Routing
- Scenario: Asymmetric Routing (Scenario, L2) — Linux Networking Tools, Routing
- Case Study: API Latency Spike — BGP Route Leak, Fix Is Network ACL (Case Study, L2) — Linux Networking Tools
- Case Study: ARP Flux Duplicate IP (Case Study, L2) — Linux Networking Tools
- Case Study: Asymmetric Routing One Direction (Case Study, L2) — Routing
- Case Study: BGP Peer Flapping (Case Study, L2) — Routing
- Case Study: DHCP Relay Broken (Case Study, L1) — Linux Networking Tools
- Case Study: Duplex Mismatch Symptoms (Case Study, L1) — Linux Networking Tools
- Case Study: IPTables Blocking Unexpected (Case Study, L2) — Linux Networking Tools
Pages that link here¶
- ARP Flux / Duplicate IP
- AWS Troubleshooting - Street-Level Ops
- Asymmetric Routing / One-Direction Failure
- BGP Peer Flapping
- Cloud Domain
- DHCP Not Working on Remote VLAN
- Duplex Mismatch
- Jumbo Frames Enabled But Some Paths Failing
- Linux Network Packet Flow
- Networking Deep Dive
- OSPF Adjacency Stuck in ExStart/Exchange State
- Primer
- Proxy ARP Causing Unexpected Routing Behavior
- Routing - Primer
- Scenario: Asymmetric Routing Through a Stateful Firewall