Tailscale Footguns¶

Mistakes that cause outages or wasted hours.

1. Routes advertised but not approved — subnet router silently does nothing¶

You run tailscale up --advertise-routes=10.0.0.0/24 and expect peers to reach that subnet. Nothing works and there's no error message. The route is in "advertised" state but hasn't been approved in the admin console. Tailscale separates advertising (node opt-in) from approval (admin opt-in) by design. Fix: Go to admin.tailscale.com -> Machines -> the router node -> Edit route settings -> approve each route. Verify with tailscale status --json | jq '.Peer[].PrimaryRoutes' — only approved+active routes appear here, not just advertised ones.

2. Subnet router missing IP forwarding — packets arrive and die¶

The subnet router is approved, clients have --accept-routes, but devices behind the router are unreachable. The issue is net.ipv4.ip_forward=0 — the Linux kernel discards packets not destined for a local address. This is the default on most distributions and is easy to forget. Fix: sysctl -w net.ipv4.ip_forward=1 for immediate effect. Make it permanent: echo "net.ipv4.ip_forward = 1" > /etc/sysctl.d/99-tailscale.conf && sysctl -p /etc/sysctl.d/99-tailscale.conf. Also check net.ipv6.conf.all.forwarding if using IPv6 routes.

3. Key expiry takes down production servers at 3am¶

Tailscale auth keys expire after 180 days by default, and machine keys also expire. When a key expires on a server, tailscaled can no longer authenticate, the node goes offline in the tailnet, and anything routing through it breaks. There's no built-in alerting. Fix: For servers and infrastructure nodes, disable key expiry in the admin console (Machines -> node -> "Disable key expiry"). Set up external monitoring: tailscale status --json | jq '.Peer[] | select(.KeyExpiry != null) | .KeyExpiry' and alert if any expiry is within 30 days. For ephemeral nodes use ephemeral auth keys instead.

Gotcha: Tagged devices (those with ACL tags) have key expiry disabled by default. Untagged devices expire after 180 days. If you use Tailscale for infrastructure, tag your nodes in ACL policy — this gives you both expiry protection and ACL-based access control. When a key expires, the node's advertised routes stay configured on peers but become unreachable (fail-close).

4. ACL blocks traffic with no error message — looks like the service is down¶

You change an ACL policy and suddenly two services can't talk. The connection attempts silently timeout instead of getting a refused error. There's no log on either side that says "ACL blocked." Teams spend time restarting services and checking firewalls before realizing the tailnet ACL is the issue. Fix: When diagnosing connectivity, always run tailscale debug access <dst-ip> <port> tcp to check ACL policy first. Timeouts on tailscale IPs are almost always ACLs. Maintain an "allow" test case in your ACL policy file's tests: block for each critical service path — this makes regressions detectable during policy saves.

5. `--accept-routes` not set on client — subnet routes are invisible¶

A subnet router is fully set up and approved, but a specific client can't reach the subnet. The reason: that client was started with tailscale up without --accept-routes, so it ignores all advertised subnet routes. This flag is opt-in on the client side and the default is false. Fix: tailscale up --accept-routes on the client. To make it persistent across reboots add it to your startup script or systemd unit. Verify: tailscale status --json | jq '.Self.AcceptedRoutes' — should list the approved subnet routes.

6. MagicDNS split DNS conflict with internal DNS — breaks existing resolution¶

You enable MagicDNS and tailscale's DNS takes over. Queries for your internal domain (e.g., corp.internal) that previously went to an on-prem DNS server now go to Tailscale's 100.100.100.100 resolver and fail. Services that relied on internal DNS break silently for all nodes in the tailnet. Fix: Configure split DNS in the Tailscale admin console (DNS tab) — specify which domains should resolve via which nameservers. This lets Tailscale handle .ts.net queries while routing corp.internal to your internal resolver. Test with dig @100.100.100.100 service.corp.internal vs dig @your-dns service.corp.internal.

7. Taildrop files go to wrong device when multiple sessions are open¶

You run tailscale file cp ./data.tar.gz laptop: but the file lands on the wrong device because you have multiple nodes registered under "laptop" (old machine, new machine with same hostname). The sender picks one, not necessarily the one you're sitting at. Fix: Use tailscale IPs directly for file transfers when hostname is ambiguous: tailscale file cp ./data.tar.gz 100.x.x.x:. Run tailscale status first to confirm the target IP. Deregister old machines from admin.tailscale.com to prevent duplicate hostnames.

8. Running tailscale up with new flags resets previous flags¶

You run tailscale up --advertise-exit-node to add the exit node flag, but this silently removes --advertise-routes that was previously configured, because tailscale up sets the full desired state rather than merging with existing config. Fix: Always specify all flags together: tailscale up --advertise-routes=10.0.0.0/24 --advertise-exit-node. Before changing flags, inspect current state: tailscale status --json | jq '.Self'. Consider using a startup script that always sets the full desired state so it's clear and idempotent.

Default trap: tailscale up uses "desired state" semantics, not "merge" semantics. Every invocation sets the complete configuration. Omitting a flag is the same as explicitly disabling it. This is the opposite of how most CLI tools work and catches everyone at least once. Use tailscale set (added in newer versions) for incremental changes.

9. Funnel exposes port before the service is ready — brief public window¶

You run tailscale funnel 8080 to test a service publicly, then stop the service and restart it. The funnel stays active. Anyone who has the Funnel URL can reach an open port on your machine, or get connection refused errors that reveal the port exists. Funnel is not automatically tied to the service lifecycle. Fix: tailscale funnel --https=443 off when you're done, not just when you stop your service. Check what's currently exposed: tailscale funnel status. Treat Funnel like a public DNS record — clean it up explicitly when no longer needed.

10. Symmetric NAT on both sides forces relay — direct connection never established¶

Two nodes can't make a direct WireGuard connection because both are behind symmetric NAT (home routers, some cloud NAT gateways). All traffic routes through a DERP relay, adding 50-200ms of latency. tailscale netcheck shows MappingVariesByDestIP: true on both nodes. People assume this is a Tailscale bug. Fix: This is a fundamental NAT limitation. To get direct connections, ensure at least one side has a stable UDP endpoint: open UDP port 41641 inbound on a firewall, use a cloud VM, or configure a persistent public IP. Alternatively, accept relay and focus on choosing the lowest-latency DERP region — check tailscale debug derp-map and tailscale netcheck to see which relays are closest.