Python: Zero to Script for the Terminal Native
- lesson
- python-basics
- bash-comparison
- scripting
- text-processing
- data-structures
- error-handling ---# Python — Zero to Script for the Terminal Native
Topics: Python basics, Bash comparison, scripting, text processing, data structures, error handling Strategy: Build-up + parallel (Bash comparison throughout) Level: L0–L1 (Zero → Foundations) Time: 60–90 minutes Prerequisites: None (but you'll need 15 years of Bash muscle memory to appreciate the punchlines)
The Mission¶
You have a syslog file. Messages look like this:
Mar 23 04:12:01 web-prod-03 CRON[28401]: (root) CMD (/usr/bin/certbot renew)
Mar 23 04:12:03 web-prod-03 sshd[28410]: Failed password for invalid user admin from 203.0.113.42 port 55142 ssh2
Mar 23 04:12:05 web-prod-03 kernel: [UFW BLOCK] IN=eth0 OUT= MAC=... SRC=198.51.100.7 DST=10.0.1.5
Mar 23 04:12:07 web-prod-03 systemd[1]: Starting Daily apt download activities...
Mar 23 04:12:09 web-prod-03 sshd[28415]: Accepted publickey for deploy from 10.0.1.200 port 42310 ssh2
Your job: write a Python script that reads this file, counts messages by program (sshd,
CRON, kernel, systemd, etc.), identifies the top talkers, and flags any SSH brute-force
attempts. In Bash you'd reach for awk, sort, uniq -c, and a tangle of pipes. Today
you'll do it in Python — and halfway through, you'll realize the Python version is shorter.
By the end of this lesson, you can write a useful script from scratch. Not a toy. A script you'd actually commit to a repo and hand to a colleague.
Part 1: Running the Thing¶
You already know how to make a script executable. Python isn't different.
The REPL — Your New Scratch Terminal¶
# You already do this all day:
$ echo "hello"
hello
# Python has the same thing:
$ python3
>>> print("hello")
hello
>>> 2 + 2
4
>>> exit()
The >>> prompt is Python's interactive shell — the REPL (Read-Eval-Print Loop). It's
your bash -c equivalent for testing one-liners. You'll use it constantly in this lesson.
Scripts and Shebangs¶
Same pattern. chmod +x script.py, run it with ./script.py. The env trick in the
shebang finds whichever python3 is in your $PATH — important when you have multiple
Python versions (and you will).
Gotcha: On most systems,
pythonis Python 2 (or doesn't exist). Always usepython3. Python 2 reached end of life on January 1, 2020. If you typepythonand get a>>>prompt showing version 2.7, back away slowly.
Try this now — open a terminal:
That's the -c flag, same as bash -c. You just verified your Python version without
entering the REPL.
Part 2: Variables and Types — Not Everything Is a String¶
This is the first real shift. In Bash, everything is a string:
# Bash: these are all strings
name="webserver"
count="42"
is_running="true"
# Want to do math? You need special syntax:
result=$((count + 1))
# Forgot the $((...))? You get string concatenation or an error.
In Python, data has types:
# Python: these are different things
name = "webserver" # str (string)
count = 42 # int (integer)
uptime = 99.7 # float (decimal)
is_running = True # bool (boolean — note: capital T)
last_error = None # NoneType (like null — "nothing here")
No $ prefix. No quoting disasters. No declare -i to make a variable numeric. The
value itself tells Python what type it is.
# Python knows what these are
>>> type("hello")
<class 'str'>
>>> type(42)
<class 'int'>
>>> type(3.14)
<class 'float'>
>>> type(True)
<class 'bool'>
>>> type(None)
<class 'NoneType'>
Type Conversion — Explicit Is Better¶
# Bash: invisible conversion, invisible bugs
port="8080"
echo $((port + 1)) # Works... but port is still a string
# Python: you say what you mean
port = "8080" # It's a string (from a config file, say)
port_int = int(port) # Now it's an integer
print(port_int + 1) # 8081
# What if it's not a number?
int("not_a_number") # ValueError: invalid literal for int()
Python yells at you immediately instead of silently doing the wrong thing. This is a feature, not a bug.
Mental Model: Bash is a text-in, text-out pipeline. Python is a typed data pipeline. In Bash, you convert between types by sending strings through commands (
bc,awk,printf). In Python, the data carries its type with it, and you convert explicitly withint(),str(),float().
Try this now:
Part 3: f-Strings — printf That Doesn't Hate You¶
String formatting in Bash is a minefield of quoting rules. Python f-strings are the fix.
# Bash: the quoting gauntlet
host="web-03"
port=8080
echo "Connecting to ${host}:${port}" # Easy case
echo "Status: $(curl -s http://${host}:${port}/health)" # Subshell
printf "%-20s %5d\n" "$host" "$port" # printf for formatting
# Python: f-strings (the f is for "formatted")
host = "web-03"
port = 8080
print(f"Connecting to {host}:{port}") # Variables
print(f"Status: {port + 1}") # Expressions
print(f"{'HOST':<20} {'PORT':>5}") # Alignment
print(f"Uptime: {99.734:.1f}%") # Decimal places
print(f"Size: {1048576:,} bytes") # Thousands separator
Output:
The key insight: anything inside {} in an f-string is a Python expression. Variables,
math, function calls, method calls — all valid. No more escaping nested quotes inside
$() inside double quotes inside heredocs.
Name Origin: f-strings were introduced in Python 3.6 (2016) via PEP 498, authored by Eric V. Smith. The older formats —
%formatting (borrowed from C'sprintf) and.format()— still work but are more verbose. f-strings won because they put the value right next to where it appears in the string, the same way${var}works in Bash.
Try this now:
Part 4: Lists — Arrays That Actually Work¶
Bash arrays are... let's be honest.
# Bash arrays: a masterclass in footguns
servers=("web-01" "web-02" "db-01")
echo ${#servers[@]} # Length: 3
echo ${servers[0]} # First element
servers+=("cache-01") # Append
# Iterate
for server in "${servers[@]}"; do
echo "$server"
done
# Forgot the quotes? Words with spaces split. Forgot [@]? Get first element only.
# Python lists: what arrays should have been
servers = ["web-01", "web-02", "db-01"]
print(len(servers)) # Length: 3
print(servers[0]) # First element: "web-01"
print(servers[-1]) # Last element: "db-01" (negative indexing!)
servers.append("cache-01") # Append
# Iterate
for server in servers:
print(server)
# Slice — get a range
print(servers[1:3]) # ["web-02", "db-01"]
Lists Do What You Expect¶
# Things that are painful or impossible in Bash
ports = [80, 443, 8080, 8443, 9090]
# Sort
sorted_ports = sorted(ports) # New sorted list
ports.sort() # Sort in place
# Filter
high_ports = [p for p in ports if p > 1024] # [8080, 8443, 9090]
# Transform
port_strings = [str(p) for p in ports] # ["80", "443", ...]
# Check membership
if 443 in ports:
print("HTTPS is configured")
# Combine
all_ports = ports + [3306, 5432] # Concatenate
That [p for p in ports if p > 1024] is called a list comprehension. It's Python's
version of a pipeline filter — reads left to right, same as cmd | grep condition. You'll
use these constantly.
Gotcha: Python lists are zero-indexed, same as Bash arrays. But negative indices count from the end:
servers[-1]is the last element,servers[-2]is second to last. Bash doesn't have this. Once you get used to it, you'll wish every language had it.
Try this now:
python3 -c "
logs = ['INFO', 'ERROR', 'INFO', 'WARN', 'ERROR', 'ERROR', 'INFO']
errors = [x for x in logs if x == 'ERROR']
print(f'Found {len(errors)} errors out of {len(logs)} messages')
"
Part 5: Dicts — This Is the One That Changes Everything¶
If you take one thing from this entire lesson, let it be this section.
In Bash, associative arrays exist but they're awkward:
# Bash associative arrays: declare or die
declare -A counts
counts[sshd]=47
counts[cron]=12
counts[kernel]=8
# Iterate (order not guaranteed before Bash 5.2)
for key in "${!counts[@]}"; do
echo "$key: ${counts[$key]}"
done
# No nesting. No default values. No easy serialization.
# Want a dict of dicts? Good luck.
# Python dicts: the data structure that replaces half your scripts
counts = {
"sshd": 47,
"cron": 12,
"kernel": 8,
}
# Access
print(counts["sshd"]) # 47
print(counts.get("nginx", 0)) # 0 (default if key doesn't exist)
# Iterate
for program, count in counts.items():
print(f"{program}: {count}")
# Add / update
counts["nginx"] = 23
counts["sshd"] += 10 # Increment: now 57
# Sort by value (the thing that takes 5 pipes in Bash)
for prog, n in sorted(counts.items(), key=lambda x: x[1], reverse=True):
print(f"{prog:<15} {n:>5}")
The "Aha Moment": Counting Things¶
Here's the Bash version of "count messages by program in a log file":
# Bash: the classic pipeline
awk '{print $5}' /var/log/syslog | cut -d'[' -f1 | sort | uniq -c | sort -rn | head -20
Five commands. One pipeline. Breaks if the log format changes. No error handling. No way to do anything more with the data after printing it.
Here's Python:
from collections import Counter
counts = Counter()
with open("/var/log/syslog") as f:
for line in f:
parts = line.split()
if len(parts) >= 5:
program = parts[4].split("[")[0].rstrip(":")
counts[program] += 1
for program, n in counts.most_common(20):
print(f"{program:<20} {n:>6}")
Same result. But now counts is a data structure you can query, filter, serialize to
JSON, send to an API, or combine with another Counter. The Bash pipeline gave you printed
text. Python gave you data.
War Story: A team had a 120-line Bash script that monitored log volume across 15 services. It used nested
forloops,declare -Afor per-service counts,declare -Afor per-severity counts, and a thirddeclare -Afor per-hour counts. The script had 9 bugs related to uninitialized array keys (Bash returns an empty string for missing keys, which silently breaks arithmetic). The Python rewrite used a dict of Counters — literally{service: Counter()}— and was 40 lines. The three hardest bugs in the Bash version were impossible in Python becauseCounter()initializes missing keys to zero automatically.
Flashcard Check #1¶
| Question | Answer |
|---|---|
| How do you get a default value from a Python dict without crashing on a missing key? | d.get("key", default_value) — returns the default instead of raising KeyError. |
| What's the Bash equivalent of a Python dict? | declare -A mydict — an associative array. But it can't nest, has no .get(), and requires explicit declaration. |
What does Counter.most_common(10) return? |
A list of (key, count) tuples, sorted by count descending. The 10 highest. |
In the log-counting pipeline awk | cut | sort | uniq -c | sort -rn, which step does Counter replace? |
All of them. Counter does the counting (uniq -c) and .most_common() does the sorting (sort -rn). |
Try this now:
python3 -c "
from collections import Counter
words = 'the cat sat on the mat the cat ate the rat'.split()
print(Counter(words).most_common(3))
"
Part 6: Conditionals — No More Bracket Roulette¶
Bash conditionals are a syntax trivia quiz:
# Bash: which brackets? How many? Spaces inside? Outside?
if [ "$status" = "running" ]; then # Single bracket: POSIX, spaces required
if [[ "$status" == "running" ]]; then # Double bracket: Bash extension, glob/regex
if (( count > 10 )); then # Arithmetic context
if [ -f "/etc/config" ]; then # File test
if [ -z "$var" ]; then # Empty string test
# Python: one way. Obvious.
if status == "running":
print("Service is up")
elif count > 10:
print("Above threshold")
else:
print("Something else")
No then. No fi. No semicolons. No brackets. Indentation defines the block — four
spaces per level, enforced by the language.
Truthiness — What Counts as False¶
In Bash, empty strings are the closest thing to "false," and everything else is messy. Python has a clear rule:
# These are all "falsy" (treated as False in an if statement)
False
0
0.0
"" # empty string
[] # empty list
{} # empty dict
None
# Everything else is "truthy"
This means:
servers = []
if servers:
print("We have servers") # Skipped — empty list is falsy
else:
print("No servers found") # This runs
name = ""
if name:
print(f"Hello, {name}") # Skipped — empty string is falsy
Compare that to Bash, where you need [ -z "$var" ] to check for empty, [ -n "$var" ]
for non-empty, and [[ -z "${array[@]}" ]] doesn't even do what you think.
Remember: In Python, empty collections and zero are falsy. Everything else is truthy. If you're checking "does this have anything in it?", just use
if thing:.
Try this now:
python3 -c "
for val in [0, 1, '', 'hello', [], [1,2], {}, {'a': 1}, None, True]:
print(f'{str(val):<15} → {bool(val)}')
"
Part 7: Loops — for, while, and Friends¶
for Loops¶
# Bash: iterate over a list
for server in web-01 web-02 db-01; do
echo "Checking $server"
done
# Bash: iterate over a range (C-style)
for ((i=0; i<10; i++)); do
echo "$i"
done
# Bash: iterate over command output
for file in $(find /var/log -name "*.log"); do
echo "$file"
done # BUG: breaks on filenames with spaces
# Python: iterate over a list
for server in ["web-01", "web-02", "db-01"]:
print(f"Checking {server}")
# Python: iterate over a range
for i in range(10): # 0 through 9
print(i)
# Python: iterate with index (no manual counter needed)
servers = ["web-01", "web-02", "db-01"]
for i, server in enumerate(servers):
print(f"{i}: {server}")
# Python: iterate over two lists in parallel
hosts = ["web-01", "web-02", "db-01"]
ports = [80, 80, 5432]
for host, port in zip(hosts, ports):
print(f"{host}:{port}")
enumerate() and zip() are the two loop tools you'll use most. enumerate gives you
the index without maintaining a counter variable. zip walks two lists in lockstep —
try doing that cleanly in Bash.
while Loops and File Reading¶
# Bash: read a file line by line (the correct way)
while IFS= read -r line; do
echo "$line"
done < /var/log/syslog
# Python: read a file line by line
with open("/var/log/syslog") as f:
for line in f:
print(line, end="") # end="" because line already has \n
That with keyword is a context manager — it automatically closes the file when
you're done, even if an error occurs. It's Python's version of trap cleanup EXIT,
except it's scoped to one block instead of the whole script.
Try this now:
python3 -c "
for i, color in enumerate(['red', 'green', 'blue']):
print(f'{i}: {color}')
for name, age in zip(['Alice', 'Bob'], [30, 25]):
print(f'{name} is {age}')
"
Part 8: Functions — def, return, and No More Subshell Surprises¶
Bash functions communicate by echoing text and return only exit codes (0–255):
# Bash: function that "returns" a value
get_disk_usage() {
df -h / | tail -1 | awk '{print $5}' | tr -d '%'
}
usage=$(get_disk_usage) # Capture stdout — the only way to "return" data
echo "Usage: ${usage}%"
# Bash: function with arguments
check_port() {
local host="$1"
local port="$2"
nc -z "$host" "$port" 2>/dev/null
return $? # Exit code: 0 = open, 1 = closed
}
# Python: functions return actual values
def get_disk_usage(path="/"):
import shutil
total, used, free = shutil.disk_usage(path)
return round(used / total * 100, 1) # Returns a float, not a string
usage = get_disk_usage() # No subshell. No text parsing.
print(f"Usage: {usage}%")
usage_var = get_disk_usage("/var") # Different path — default args!
# Python: function with arguments and a default
def check_port(host, port, timeout=3):
import socket
try:
sock = socket.create_connection((host, port), timeout=timeout)
sock.close()
return True # Returns a boolean, not an exit code
except (ConnectionRefusedError, TimeoutError):
return False
Key differences:
| Feature | Bash | Python |
|---|---|---|
| Return data | echo + capture with $() |
return value |
| Return type | Always a string (via stdout) | Any type: int, str, list, dict, bool |
| Arguments | Positional only: $1, $2 |
Named, with defaults |
| Scope | Global by default (need local) |
Local by default |
| Multiple returns | Impossible (one stdout stream) | return a, b, c (returns a tuple) |
# Multiple return values — try this in Bash
def analyze_log(path):
errors = 0
warnings = 0
total = 0
with open(path) as f:
for line in f:
total += 1
if "ERROR" in line:
errors += 1
elif "WARN" in line:
warnings += 1
return total, errors, warnings
total, errors, warnings = analyze_log("/var/log/syslog")
print(f"Total: {total}, Errors: {errors}, Warnings: {warnings}")
Under the Hood: In Bash,
result=$(my_function)runs the function in a subshell. That meansmy_functioncan't modify variables in the caller's scope — variable changes inside the$()disappear when the subshell exits. This is the #1 source of "why didn't my variable update?" bugs in Bash. Python functions share the same process and can return data directly. No subshells. No surprises.
Try this now:
python3 -c "
def greet(name, greeting='Hello'):
return f'{greeting}, {name}!'
print(greet('ops team'))
print(greet('SRE team', greeting='Hey'))
"
Part 9: File I/O — open(), with, and No More Redirects¶
Reading Files¶
# Bash: read entire file into a variable
content=$(cat /etc/hostname)
# Bash: process line by line
while IFS= read -r line; do
# do something with $line
done < /etc/passwd
# Python: read entire file
with open("/etc/hostname") as f:
content = f.read().strip()
# Python: process line by line (memory-efficient, even for huge files)
with open("/etc/passwd") as f:
for line in f:
parts = line.strip().split(":")
username = parts[0]
shell = parts[-1]
print(f"{username:<20} {shell}")
Writing Files¶
# Python: write to a file
with open("/tmp/config.txt", "w") as f:
f.write("server=web-03\n")
f.write("port=8080\n")
# Or build and write at once
lines = [
"server=web-03",
"port=8080",
"timeout=30",
]
with open("/tmp/config.txt", "w") as f:
f.write("\n".join(lines) + "\n")
The with Statement — Why It Matters¶
# Bash: if the script dies, the file descriptor might not close.
# You rely on the OS to clean up when the process exits.
exec 3>/tmp/output.txt
echo "writing..." >&3
# ... script crashes here ...
exec 3>&- # Never reached
# Python: the with statement guarantees cleanup
with open("/tmp/output.txt", "w") as f:
f.write("writing...\n")
# Even if an exception occurs here, the file is closed
# File is closed here, guaranteed. Period.
The with pattern works for anything that needs cleanup: files, network connections,
database handles, locks. It's trap EXIT but scoped, composable, and impossible to forget.
| Mode | Meaning | Bash equivalent |
|---|---|---|
"r" |
Read (default) | < file |
"w" |
Write (truncates!) | > file |
"a" |
Append | >> file |
"r+" |
Read and write | <> file (rare in Bash) |
Gotcha:
"w"mode truncates the file immediately on open — before you write anything. If your script crashes betweenopen()andwrite(), you have an empty file. For atomic writes, write to a temp file then rename it.
Try this now:
python3 -c "
with open('/tmp/pytest.txt', 'w') as f:
for i in range(5):
f.write(f'Line {i}: this is a test\n')
with open('/tmp/pytest.txt') as f:
print(f.read())
"
Part 10: Error Handling — Why Exceptions Beat Exit Codes¶
In Bash, error handling is set -e and prayer:
set -e
# Script exits on first non-zero exit code
# Unless... it's in a pipe. Or an if condition. Or a subshell.
# set -e has so many exceptions it barely qualifies as error handling.
# "Handle" a specific error? Good luck:
output=$(some_command 2>/dev/null) || {
echo "Failed!"
exit 1
}
Python has try/except — structured, specific, and reliable:
# Catch a specific error
try:
with open("/var/log/syslog") as f:
content = f.read()
except FileNotFoundError:
print("Syslog not found (are you on macOS?)")
content = ""
except PermissionError:
print("Permission denied — run with sudo?")
content = ""
# Catch, log, and continue (the thing set -e can't do)
servers = ["web-01", "web-02", "web-03"]
results = {}
for server in servers:
try:
# pretend this is an API call
results[server] = check_server(server)
except ConnectionError as e:
print(f"WARN: {server} unreachable: {e}")
results[server] = None
# Script continues to the next server instead of dying
The Exception Hierarchy You Actually Need¶
BaseException
└── Exception
├── FileNotFoundError # File doesn't exist
├── PermissionError # Can't read/write
├── ValueError # Wrong value (int("abc"))
├── KeyError # Dict key doesn't exist
├── IndexError # List index out of range
├── TypeError # Wrong type (len(42))
├── ConnectionError # Network failure
├── TimeoutError # Operation timed out
└── KeyboardInterrupt # Ctrl+C (actually under BaseException)
You don't need to memorize these. You need to know that specific exceptions exist and you
can catch them individually. This is the fundamental difference from exit codes: exit code
1 means "something failed." FileNotFoundError means "this specific file doesn't exist."
try / except / else / finally¶
try:
f = open("/var/log/syslog")
data = f.read()
except FileNotFoundError:
print("File missing")
data = ""
else:
# Only runs if NO exception occurred
print(f"Read {len(data)} bytes")
finally:
# ALWAYS runs — cleanup goes here
print("Done, whether it worked or not")
Mental Model:
set -eis a fire alarm — when something goes wrong, everybody evacuates.try/exceptis a fire extinguisher — you identify what's burning, put it out, and keep working. Real ops scripts need extinguishers, not just alarms.
Flashcard Check #2¶
| Question | Answer |
|---|---|
| What happens when you access a dict key that doesn't exist? | KeyError is raised. Use .get(key, default) to avoid it. |
| How do you catch multiple exception types in one block? | except (TypeError, ValueError) as e: — tuple of exception types. |
What runs in a finally block? |
Everything — it runs whether the try succeeded or failed. Use it for cleanup. |
What's the Bash equivalent of try/except? |
cmd || handle_error or if ! cmd; then handle; fi. But these only see exit codes, not error types. |
Try this now:
python3 -c "
for val in ['42', 'hello', '3.14', '']:
try:
result = int(val)
print(f'int({val!r}) = {result}')
except ValueError as e:
print(f'int({val!r}) failed: {e}')
"
Part 11: String Methods — Your sed/awk/cut Replacement Kit¶
You've been reaching for sed, awk, cut, tr, and grep your whole career. Python
strings have methods that replace most of those one-liners.
line = " Mar 23 04:12:03 web-prod-03 sshd[28410]: Failed password for admin "
# strip — like sed 's/^[[:space:]]*//;s/[[:space:]]*$//'
line.strip() # Removes leading/trailing whitespace
line.lstrip() # Left strip only
line.rstrip() # Right strip only
# split — like awk '{print $N}' or cut -d' ' -f3
parts = line.split() # Split on whitespace (like awk's default)
parts[4] # "sshd[28410]:" — 5th field (0-indexed)
line.split(":") # Split on colons (like cut -d':')
# join — the reverse of split (no Bash equivalent that isn't painful)
", ".join(["web-01", "web-02", "db-01"]) # "web-01, web-02, db-01"
":".join(["usr", "local", "bin"]) # "usr:local:bin"
# startswith / endswith — like grep ^pattern or grep pattern$
"sshd[28410]:".startswith("sshd") # True
"/var/log/syslog".endswith(".log") # False (it's "syslog", no .log)
# replace — like sed 's/old/new/g'
"Hello World".replace("World", "Ops") # "Hello Ops"
# in — like grep (for simple substring matching)
"Failed password" in line # True
"Accepted" in line # False
# upper, lower — like tr '[:lower:]' '[:upper:]'
"warning".upper() # "WARNING"
"CRITICAL".lower() # "critical"
Chaining Methods — The Python Pipeline¶
# Extract program name from a syslog line
# Bash: echo "$line" | awk '{print $5}' | cut -d'[' -f1 | tr -d ':'
# Python:
program = line.split()[4].split("[")[0].rstrip(":")
# "sshd"
One line, no pipes, no subprocesses, and it returns a Python string you can use directly in a dict, a comparison, or an f-string.
Trivia: Python strings are immutable — every method returns a new string. The original is unchanged. This is the opposite of Bash's
sed -i(in-place edit). It seems wasteful, but it means you can never accidentally corrupt data by modifying it in two places at once. Immutability is a safety feature.
Try this now:
python3 -c "
csv_line = ' web-prod-03, 192.168.1.50, 8080, running '
host, ip, port, status = [field.strip() for field in csv_line.split(',')]
print(f'Host {host} ({ip}:{port}) is {status.upper()}')
"
Part 12: The Import System — The Standard Library Is Huge¶
In Bash, external tools live in /usr/bin/ and you call them by name. In Python,
libraries are imported.
# Standard library — ships with Python, no install required
import os # OS interactions (env vars, paths, PIDs)
import sys # Python runtime (args, exit, stdin/stdout)
import json # JSON parsing/writing
import re # Regular expressions
import pathlib # File path operations
import subprocess # Run shell commands
import datetime # Dates and times
import collections # Counter, defaultdict, OrderedDict
import socket # Low-level networking
import shutil # File copying, disk usage
import csv # CSV file reading/writing
import hashlib # Hashing (md5, sha256)
import argparse # CLI argument parsing
import logging # Structured logging
import tempfile # Temporary files and directories
# Third-party — install with pip
# pip3 install requests pyyaml
import requests # HTTP requests (the better curl)
import yaml # YAML parsing
Import Styles¶
# Import the whole module
import os
os.environ["HOME"] # Use with prefix
# Import specific things
from collections import Counter
Counter(["a", "b", "a"]) # Use directly, no prefix
# Import with alias
from pathlib import Path as P
P("/etc").exists()
Remember:
from X import *is the Python equivalent ofsource script.sh— it dumps everything into your namespace. Don't do it. Use explicit imports so you always know where a function came from.
Try this now:
python3 -c "
import sys, os
print(f'Python: {sys.version.split()[0]}')
print(f'PID: {os.getpid()}')
print(f'User: {os.environ.get(\"USER\", \"unknown\")}')
print(f'CWD: {os.getcwd()}')
"
Part 13: The Main Guard — Scripts vs. Modules¶
#!/usr/bin/env python3
def count_programs(path):
"""Count syslog messages by program name."""
from collections import Counter
counts = Counter()
with open(path) as f:
for line in f:
parts = line.split()
if len(parts) >= 5:
program = parts[4].split("[")[0].rstrip(":")
counts[program] += 1
return counts
def main():
import sys
path = sys.argv[1] if len(sys.argv) > 1 else "/var/log/syslog"
counts = count_programs(path)
for program, n in counts.most_common(20):
print(f"{program:<20} {n:>6}")
if __name__ == "__main__":
main()
That if __name__ == "__main__": block runs only when you execute the file directly
(python3 script.py or ./script.py). If someone imports your file as a module
(from script import count_programs), main() doesn't run automatically.
This is like building a Bash library that sources cleanly: you can reuse the functions without triggering the "main logic" at the bottom.
Under the Hood: When Python runs a file directly, it sets the special variable
__name__to the string"__main__". When the file is imported,__name__is set to the module's name (e.g.,"script"). This one-line idiom is the entire module system's entry-point mechanism. Every serious Python script uses it.
Part 14: The Mission — Putting It All Together¶
Time to build the syslog analyzer from the mission. Every concept from this lesson goes in.
#!/usr/bin/env python3
"""Syslog analyzer — count messages by program, flag SSH brute-force attempts."""
import sys
from collections import Counter
def parse_syslog_line(line):
"""Extract program name and message from a syslog line.
Returns (program, message) or (None, None) if unparseable.
"""
parts = line.split()
if len(parts) < 6:
return None, None
# Field 5 is "program[PID]:" or "program:"
raw_program = parts[4]
program = raw_program.split("[")[0].rstrip(":")
message = " ".join(parts[5:])
return program, message
def analyze_syslog(path):
"""Read a syslog file and return analysis results as a dict."""
program_counts = Counter()
ssh_failures = []
total_lines = 0
parse_errors = 0
try:
with open(path) as f:
for line in f:
total_lines += 1
program, message = parse_syslog_line(line.strip())
if program is None:
parse_errors += 1
continue
program_counts[program] += 1
# Flag SSH brute-force indicators
if program == "sshd" and "Failed password" in line:
# Extract source IP: "... from 203.0.113.42 port ..."
parts = line.split()
for i, part in enumerate(parts):
if part == "from" and i + 1 < len(parts):
ssh_failures.append(parts[i + 1])
break
except FileNotFoundError:
print(f"ERROR: {path} not found", file=sys.stderr)
sys.exit(1)
except PermissionError:
print(f"ERROR: Permission denied reading {path}", file=sys.stderr)
sys.exit(1)
return {
"total_lines": total_lines,
"parse_errors": parse_errors,
"program_counts": program_counts,
"ssh_failure_ips": Counter(ssh_failures),
}
def print_report(results):
"""Print a formatted summary report."""
print("=" * 50)
print("SYSLOG ANALYSIS REPORT")
print("=" * 50)
print(f"\nTotal lines: {results['total_lines']:>8,}")
print(f"Parse errors: {results['parse_errors']:>8,}")
print(f"Programs seen: {len(results['program_counts']):>8}")
print(f"\n{'--- Top Programs by Message Count ---':^50}")
print(f"{'Program':<25} {'Count':>8} {'%':>7}")
print("-" * 42)
total = results["total_lines"]
for program, count in results["program_counts"].most_common(15):
pct = count / total * 100 if total > 0 else 0
print(f"{program:<25} {count:>8,} {pct:>6.1f}%")
ssh_ips = results["ssh_failure_ips"]
if ssh_ips:
print(f"\n{'--- SSH Brute-Force Suspects ---':^50}")
print(f"{'Source IP':<20} {'Failed Attempts':>15}")
print("-" * 37)
for ip, count in ssh_ips.most_common(10):
flag = " *** ALERT" if count >= 10 else ""
print(f"{ip:<20} {count:>15,}{flag}")
else:
print("\nNo SSH authentication failures detected.")
print("\n" + "=" * 50)
def main():
path = sys.argv[1] if len(sys.argv) > 1 else "/var/log/syslog"
results = analyze_syslog(path)
print_report(results)
if __name__ == "__main__":
main()
Save it as syslog_analyzer.py, chmod +x, and run it:
Sample output:
==================================================
SYSLOG ANALYSIS REPORT
==================================================
Total lines: 12,847
Parse errors: 3
Programs seen: 14
--- Top Programs by Message Count ---
Program Count %
------------------------------------------
sshd 4,231 32.9%
CRON 3,102 24.1%
systemd 2,445 19.0%
kernel 1,893 14.7%
NetworkManager 412 3.2%
snapd 287 2.2%
sudo 198 1.5%
--- SSH Brute-Force Suspects ---
Source IP Failed Attempts
-------------------------------------
203.0.113.42 847 *** ALERT
198.51.100.7 312 *** ALERT
192.0.2.100 23 *** ALERT
10.0.1.200 4
Count the concepts: f-strings, dicts, Counter, with statement, try/except, string methods
(split, strip, rstrip), for loops, enumerate, functions with return values, the main guard,
and sys.argv for CLI arguments. That's this whole lesson in one script.
Exercises¶
Exercise 1: Quick Win (2 minutes)¶
Open a Python REPL and build a dict that maps HTTP status codes to their meanings. Look up at least 5 codes from memory.
Solution
Exercise 2: Port Scanner (10 minutes)¶
Write a script that takes a hostname and a comma-separated list of ports from the command
line, checks which ports are open using socket.create_connection(), and prints results
in a table.
Usage: ./portscan.py web-01 22,80,443,8080
Hints
- `sys.argv[1]` for hostname, `sys.argv[2].split(",")` for ports - `int(port)` to convert string to integer - `socket.create_connection((host, port), timeout=2)` — wrap in try/except - `ConnectionRefusedError` for closed ports, `TimeoutError` for filteredSolution
#!/usr/bin/env python3
import socket
import sys
def scan_port(host, port, timeout=2):
try:
sock = socket.create_connection((host, port), timeout=timeout)
sock.close()
return "open"
except ConnectionRefusedError:
return "closed"
except TimeoutError:
return "filtered"
except OSError as e:
return f"error: {e}"
def main():
if len(sys.argv) < 3:
print(f"Usage: {sys.argv[0]} <host> <port1,port2,...>")
sys.exit(1)
host = sys.argv[1]
ports = [int(p) for p in sys.argv[2].split(",")]
print(f"Scanning {host}...")
print(f"{'PORT':<8} {'STATE':<12}")
print("-" * 20)
for port in sorted(ports):
state = scan_port(host, port)
print(f"{port:<8} {state:<12}")
if __name__ == "__main__":
main()
Exercise 3: Translate a Bash Pipeline (15 minutes)¶
Translate this Bash one-liner to Python. The script should produce identical output.
This finds all users with real login shells.
Solution
#!/usr/bin/env python3
results = []
with open("/etc/passwd") as f:
for line in f:
line = line.strip()
if line.startswith("#") or not line:
continue
parts = line.split(":")
if len(parts) >= 7:
user, shell = parts[0], parts[6]
if "nologin" not in shell and "false" not in shell:
results.append((user, shell))
for user, shell in sorted(results):
print(f"{user} {shell}")
Cheat Sheet¶
| Bash | Python | Notes |
|---|---|---|
$var |
var |
No prefix, no quoting needed |
echo "text" |
print("text") |
Or print(f"...") for interpolation |
echo "$var" |
print(f"{var}") |
f-strings handle formatting |
${#string} |
len(string) |
Works on lists, dicts, strings |
$((x + 1)) |
x + 1 |
Math is native, not a special mode |
[[ $a == $b ]] |
a == b |
No brackets, no quoting |
[ -f file ] |
Path(file).is_file() |
Returns bool, import pathlib |
${arr[@]} |
mylist |
Just use it — no special syntax |
${#arr[@]} |
len(mylist) |
Length of anything |
declare -A |
mydict = {} |
Or dict() |
for x in ...; do |
for x in ...: |
Colon, not semicolon-do |
while read line |
for line in f: |
File iteration |
func() { ... } |
def func(): ... |
Indentation, not braces |
$1, $2 |
Named params | def f(host, port): |
$(cmd) |
subprocess.run(...) |
But prefer native Python |
cmd \| grep \| awk |
for/if/split |
Data stays in-process |
set -e |
try/except |
Per-operation, not global |
exit 1 |
sys.exit(1) |
Or raise an exception |
source file.sh |
import module |
Namespaced, no pollution |
cat file |
open(f).read() |
Use with for safety |
>> (append) |
open(f, "a") |
"a" = append mode |
sort \| uniq -c |
Counter() |
from collections import Counter |
Takeaways¶
-
Python has types. Strings are strings, ints are ints, bools are bools. No more invisible string-to-number conversions. Errors happen loudly and immediately.
-
Dicts replace entire pipelines. Counting, grouping, looking things up by key — the things that take
sort | uniq -c | sort -rnin Bash are one-liners withCounteranddict. -
f-strings are better than echo/printf. Any expression inside
{}, with formatting options. No quoting nightmares. -
try/exceptis error handling.set -eis error detection. Python lets you catch specific failures, recover, and continue. Bash lets you crash or ignore everything. -
withguarantees cleanup. Files close, connections drop, locks release — even when exceptions occur. It'strap EXITthat actually works every time. -
The standard library replaces most CLI tools.
json,csv,re,pathlib,collections,socket— you can do most ops tasks without installing anything.
What's Next¶
This lesson got you from zero to "I can write a useful script." The next step is learning Python's replacements for the Bash tools you already use — subprocess for shell commands, pathlib for file operations, requests for curl, argparse for getopts, and logging for structured output. That's covered in Python for Ops — The Bash Expert's Bridge.