- networking
- l2
- topic-pack
- tailscale
- vpn --- Portal | Level: L2: Operations | Topics: Tailscale & Zero Trust Networking, VPN & Tunneling | Domain: Networking
Tailscale & Zero Trust Networking - Primer¶
Why This Matters¶
WireGuard-based mesh networking that makes zero trust practical. Replaces traditional VPNs with identity-based access, no open ports, and automatic key rotation. Every node connects directly to every other node — no hub-and-spoke bottleneck, no central VPN concentrator to maintain. You install it, authenticate, and your machines can reach each other as if they were on the same LAN, regardless of NATs, firewalls, or cloud boundaries.
For DevOps work, Tailscale solves the "how do I reach that machine" problem cleanly: development laptops, cloud VMs, on-prem servers, Kubernetes clusters, and CI runners all join the same tailnet with zero port forwarding, zero firewall rules, and identity-based access control.
Fun fact: Tailscale was founded in 2019 by Avery Pennarun, David Crawshaw, and Brad Fitzpatrick (creator of memcached and LiveJournal). It builds on WireGuard, which was merged into the Linux kernel in March 2020 (version 5.6). WireGuard itself is remarkably small — about 4,000 lines of code, compared to OpenVPN's ~100,000 lines. This simplicity is a security advantage: less code means fewer bugs.
Tailscale vs traditional VPN: Traditional VPNs use a hub-and-spoke model — all traffic routes through a central VPN concentrator, creating a bottleneck and single point of failure. Tailscale creates a peer-to-peer mesh where nodes connect directly. Traffic between two nodes in the same datacenter never leaves the datacenter, even if the Tailscale coordination server is down.
Core Concepts¶
1. How Tailscale Works¶
Tailscale builds on WireGuard (kernel-level VPN protocol) and adds a coordination layer:
- Coordination server: Tailscale's control plane distributes public keys and facilitates NAT traversal. It never sees your traffic — only metadata (which nodes exist, their public keys, endpoints).
- Direct connections: Nodes connect peer-to-peer using WireGuard. Tailscale uses STUN/TURN-like relay servers (DERPs) only as a fallback when direct connections fail.
- DERP relays: If both nodes are behind restrictive NATs, traffic routes through Tailscale's relay network. This is encrypted end-to-end — the relay sees only ciphertext.
- MagicDNS: Every node gets a DNS name:
hostname.tailnet-name.ts.net. No manual DNS configuration needed.
2. Installation¶
# Linux (Debian/Ubuntu)
curl -fsSL https://tailscale.com/install.sh | sh
# Start and authenticate (opens browser for SSO)
sudo tailscale up
# macOS: brew install --cask tailscale
# Docker: requires NET_ADMIN, NET_RAW caps + /dev/net/tun + state volume
# Kubernetes: deploy the Tailscale operator for service exposure
3. Basic Operations¶
# Check status
tailscale status
# Shows all nodes in your tailnet, their IPs, and connection state
# Check connection to a specific node
tailscale ping hostname
# Shows whether connection is direct or via DERP relay
# See your Tailscale IP
tailscale ip -4
tailscale ip -6
# Connect with specific options
sudo tailscale up --hostname=my-server --accept-routes --accept-dns
# Disconnect (keeps auth, removes from network)
sudo tailscale down
# Log out (deauthenticates)
sudo tailscale logout
# Debug connectivity
tailscale netcheck
# Shows NAT type, latency to DERP relays, UDP connectivity
4. ACL Configuration¶
Name origin: HuJSON stands for "Human JSON" — a superset of JSON that allows comments (
//and/* */) and trailing commas. It was created by the Tailscale team specifically because standard JSON is painful for human-edited config files. The Tailscale ACL file is one of the most common real-world uses of HuJSON.
Tailscale ACLs (Access Control Lists) define who can reach what. They are written in HuJSON (JSON with comments and trailing commas) and managed in the Tailscale admin console or via GitOps.
{
// Groups define collections of users
"groups": {
"group:engineering": ["alice@example.com", "bob@example.com"],
"group:ops": ["carol@example.com"],
},
// Tag owners define who can assign tags to nodes
"tagOwners": {
"tag:server": ["group:ops"],
"tag:k8s-node": ["group:ops"],
"tag:ci-runner": ["group:engineering"],
},
// ACL rules: src -> dst:port
"acls": [
// Ops can reach everything
{"action": "accept", "src": ["group:ops"], "dst": ["*:*"]},
// Engineers can SSH to servers
{"action": "accept", "src": ["group:engineering"], "dst": ["tag:server:22"]},
// Engineers can reach k8s API
{"action": "accept", "src": ["group:engineering"], "dst": ["tag:k8s-node:6443"]},
// CI runners can reach servers on deploy ports
{"action": "accept", "src": ["tag:ci-runner"], "dst": ["tag:server:22,443"]},
// Everyone can reach DNS
{"action": "accept", "src": ["*"], "dst": ["*:53"]},
],
}
Key ACL concepts:
- Default deny: If no rule matches, traffic is blocked.
- Tags: Nodes can be tagged (e.g., tag:server). Tags act as machine identity, independent of the user who authenticated the node.
- Autogroups: autogroup:member (all human users), autogroup:admin (admins), autogroup:owner (owners).
- Tests: ACLs support inline tests to validate rules before applying.
Under the hood: Tailscale's NAT traversal works using a technique called "hole punching." Both nodes behind NATs send UDP packets to each other's public IP:port (discovered via a STUN-like mechanism through the coordination server). When both sides' NAT tables have entries for each other, direct peer-to-peer traffic flows. This works for ~90% of NAT configurations. The remaining ~10% fall back to DERP relays.
5. Subnet Routers¶
Gotcha: ACL default-deny means a new node can reach nothing until a rule matches. If you add a new group or tag and forget to add an ACL rule, users will report "Tailscale is broken" when the network is working correctly — their traffic is just being dropped by policy. Always test ACL changes using the built-in
testsblock before deploying.
A subnet router advertises routes to a non-Tailscale network, allowing tailnet nodes to reach devices that cannot run Tailscale (printers, IoT, legacy servers).
# On the subnet router node:
# Enable IP forwarding
echo 'net.ipv4.ip_forward = 1' | sudo tee /etc/sysctl.d/99-tailscale.conf
sudo sysctl -p /etc/sysctl.d/99-tailscale.conf
# Advertise the subnet
sudo tailscale up --advertise-routes=10.0.1.0/24,192.168.1.0/24
# On other nodes, accept the advertised routes:
sudo tailscale up --accept-routes
# In the admin console: approve the subnet routes
# (or use auto-approvers in ACLs for tagged nodes)
ACL auto-approvers for subnet routes:
{
"autoApprovers": {
"routes": {
"10.0.1.0/24": ["tag:subnet-router"],
"192.168.1.0/24": ["tag:subnet-router"],
},
},
}
6. Exit Nodes¶
An exit node routes all internet traffic from a tailnet node through another node. Use case: route your laptop's internet through a cloud VM for security or geo-location purposes.
# On the exit node (e.g., a cloud VM):
sudo tailscale up --advertise-exit-node
# On the client:
sudo tailscale up --exit-node=exit-vm
# Stop using exit node:
sudo tailscale up --exit-node=
# Check current exit node
tailscale status
Approve exit nodes in the admin console or via auto-approvers in ACLs.
7. MagicDNS¶
MagicDNS gives every node a stable DNS name: hostname.tailnet-name.ts.net. Enabled by default. Short names work within the tailnet (ssh server1). Split DNS routes specific domains to specific resolvers — configure in the admin console under DNS settings.
8. Tailscale Funnel and Serve¶
Serve exposes a local service to your tailnet. Funnel exposes it to the public internet with automatic HTTPS.
# Serve a local port to your tailnet
tailscale serve https / http://localhost:3000
# Funnel to the internet (public HTTPS endpoint)
tailscale funnel https / http://localhost:3000
# Stop serving
tailscale serve reset
Funnel requires enabling in ACLs via nodeAttrs with the funnel attribute.
9. Taildrop (File Transfer)¶
Send files directly between tailnet nodes, no intermediary, no cloud storage.
# Send a file to another node
tailscale file cp report.pdf server1:
# Receive files (check the inbox)
tailscale file get .
# Downloads received files to current directory
Works between any nodes in the tailnet — laptops, phones, servers. Files transfer peer-to-peer over WireGuard.
Debug clue: When a Tailscale connection is slow or unreliable,
tailscale ping <hostname>is the first diagnostic. If it says "pong fromvia DERP(nyc)" instead of "pong from via :41641", your traffic is going through a relay instead of direct. Check if UDP port 41641 is blocked by a firewall on either side — that is the most common cause of DERP fallback.
10. Key Management and Auth¶
# Register a node with a pre-auth key (no browser needed)
sudo tailscale up --authkey=tskey-auth-xxxxx
# Ephemeral nodes auto-remove when offline — use for CI, containers, short-lived VMs
# Key expiry: default 180 days, disable for long-lived servers in admin console
# Rotate node keys: sudo tailscale up --force-reauth
11. Operational Patterns¶
- Bastion-free SSH: Install Tailscale on every server. SSH directly via MagicDNS names. ACLs replace firewall rules. No public SSH ports needed.
- Multi-cloud connectivity: Nodes in AWS, GCP, Azure, and on-prem join the same tailnet. Peer-to-peer without VPN gateways or transit costs. Subnet routers bridge to non-Tailscale networks.
- Developer access to staging: Tag staging servers, ACL grants engineers access on specific ports. No VPN client or split-tunnel configuration needed.
- CI/CD pipeline access: CI runners join with ephemeral pre-auth keys, deploy to tagged servers, and auto-deregister when the job completes.
- Kubernetes API access: Advertise the API server's subnet via a Tailscale node. Engineers
kubectldirectly over the tailnet.
What Experienced People Know¶
- Tailscale is a coordination layer on top of WireGuard. The data plane is WireGuard — proven, fast, kernel-level crypto. Tailscale adds key distribution, NAT traversal, and access control.
- Direct connections are the norm. DERP relay is the fallback. Run
tailscale pingto verify — if it says "via DERP," check NATs and firewalls. - ACLs are default-deny. Start with minimal access and widen as needed. Test ACL changes with the built-in test framework before applying.
- Subnet routers are the bridge to legacy infrastructure. They do not require installing Tailscale on every device in the subnet.
- Exit nodes are not a VPN replacement for privacy — Tailscale coordinates the connection, and the exit node sees your traffic. Use them for routing, not anonymity.
- Ephemeral nodes are essential for CI/CD and containers. They auto-deregister, keeping the tailnet clean.
-
MagicDNS short names can collide with local DNS. If you have a machine named
dbon the tailnet anddb.internalin corporate DNS, things get confusing. Use explicit naming conventions.Analogy: Think of Tailscale's coordination server like a phone book — it tells nodes how to find each other (public keys and network addresses), but it never carries the phone calls themselves. Even if the coordination server goes down, existing connections keep working because WireGuard tunnels are peer-to-peer. You just cannot add new nodes or update ACLs until it comes back.
-
Key expiry is a feature, not a bug. Servers that must stay connected permanently should have key expiry disabled explicitly in the admin console.
- Funnel gives you a public HTTPS endpoint without port forwarding, DNS records, or TLS certificate management. Good for webhooks, demos, and quick public services.
Wiki Navigation¶
Prerequisites¶
- Networking Deep Dive (Topic Pack, L1)
Related Content¶
- VPN & Tunneling (Topic Pack, L2) — VPN & Tunneling
- VPN Flashcards (CLI) (flashcard_deck, L1) — VPN & Tunneling