Skip to content

Python: Zero to Script for the Terminal Native

  • lesson
  • python-basics
  • bash-comparison
  • scripting
  • text-processing
  • data-structures
  • error-handling ---# Python — Zero to Script for the Terminal Native

Topics: Python basics, Bash comparison, scripting, text processing, data structures, error handling Strategy: Build-up + parallel (Bash comparison throughout) Level: L0–L1 (Zero → Foundations) Time: 60–90 minutes Prerequisites: None (but you'll need 15 years of Bash muscle memory to appreciate the punchlines)


The Mission

You have a syslog file. Messages look like this:

Mar 23 04:12:01 web-prod-03 CRON[28401]: (root) CMD (/usr/bin/certbot renew)
Mar 23 04:12:03 web-prod-03 sshd[28410]: Failed password for invalid user admin from 203.0.113.42 port 55142 ssh2
Mar 23 04:12:05 web-prod-03 kernel: [UFW BLOCK] IN=eth0 OUT= MAC=... SRC=198.51.100.7 DST=10.0.1.5
Mar 23 04:12:07 web-prod-03 systemd[1]: Starting Daily apt download activities...
Mar 23 04:12:09 web-prod-03 sshd[28415]: Accepted publickey for deploy from 10.0.1.200 port 42310 ssh2

Your job: write a Python script that reads this file, counts messages by program (sshd, CRON, kernel, systemd, etc.), identifies the top talkers, and flags any SSH brute-force attempts. In Bash you'd reach for awk, sort, uniq -c, and a tangle of pipes. Today you'll do it in Python — and halfway through, you'll realize the Python version is shorter.

By the end of this lesson, you can write a useful script from scratch. Not a toy. A script you'd actually commit to a repo and hand to a colleague.


Part 1: Running the Thing

You already know how to make a script executable. Python isn't different.

The REPL — Your New Scratch Terminal

# You already do this all day:
$ echo "hello"
hello

# Python has the same thing:
$ python3
>>> print("hello")
hello
>>> 2 + 2
4
>>> exit()

The >>> prompt is Python's interactive shell — the REPL (Read-Eval-Print Loop). It's your bash -c equivalent for testing one-liners. You'll use it constantly in this lesson.

Scripts and Shebangs

#!/bin/bash
echo "I am a bash script"
#!/usr/bin/env python3
print("I am a python script")

Same pattern. chmod +x script.py, run it with ./script.py. The env trick in the shebang finds whichever python3 is in your $PATH — important when you have multiple Python versions (and you will).

Gotcha: On most systems, python is Python 2 (or doesn't exist). Always use python3. Python 2 reached end of life on January 1, 2020. If you type python and get a >>> prompt showing version 2.7, back away slowly.

Try this now — open a terminal:

python3 -c "import sys; print(sys.version)"

That's the -c flag, same as bash -c. You just verified your Python version without entering the REPL.


Part 2: Variables and Types — Not Everything Is a String

This is the first real shift. In Bash, everything is a string:

# Bash: these are all strings
name="webserver"
count="42"
is_running="true"

# Want to do math? You need special syntax:
result=$((count + 1))
# Forgot the $((...))? You get string concatenation or an error.

In Python, data has types:

# Python: these are different things
name = "webserver"       # str (string)
count = 42               # int (integer)
uptime = 99.7            # float (decimal)
is_running = True        # bool (boolean — note: capital T)
last_error = None        # NoneType (like null — "nothing here")

No $ prefix. No quoting disasters. No declare -i to make a variable numeric. The value itself tells Python what type it is.

# Python knows what these are
>>> type("hello")
<class 'str'>
>>> type(42)
<class 'int'>
>>> type(3.14)
<class 'float'>
>>> type(True)
<class 'bool'>
>>> type(None)
<class 'NoneType'>

Type Conversion — Explicit Is Better

# Bash: invisible conversion, invisible bugs
port="8080"
echo $((port + 1))    # Works... but port is still a string
# Python: you say what you mean
port = "8080"          # It's a string (from a config file, say)
port_int = int(port)   # Now it's an integer
print(port_int + 1)    # 8081

# What if it's not a number?
int("not_a_number")    # ValueError: invalid literal for int()

Python yells at you immediately instead of silently doing the wrong thing. This is a feature, not a bug.

Mental Model: Bash is a text-in, text-out pipeline. Python is a typed data pipeline. In Bash, you convert between types by sending strings through commands (bc, awk, printf). In Python, the data carries its type with it, and you convert explicitly with int(), str(), float().

Try this now:

python3 -c "print(type(42), type('42'), type(42.0), type(True))"

Part 3: f-Strings — printf That Doesn't Hate You

String formatting in Bash is a minefield of quoting rules. Python f-strings are the fix.

# Bash: the quoting gauntlet
host="web-03"
port=8080
echo "Connecting to ${host}:${port}"          # Easy case
echo "Status: $(curl -s http://${host}:${port}/health)"  # Subshell
printf "%-20s %5d\n" "$host" "$port"          # printf for formatting
# Python: f-strings (the f is for "formatted")
host = "web-03"
port = 8080
print(f"Connecting to {host}:{port}")                     # Variables
print(f"Status: {port + 1}")                              # Expressions
print(f"{'HOST':<20} {'PORT':>5}")                        # Alignment
print(f"Uptime: {99.734:.1f}%")                           # Decimal places
print(f"Size: {1048576:,} bytes")                         # Thousands separator

Output:

Connecting to web-03:8080
Status: 8081
HOST                  PORT
Uptime: 99.7%
Size: 1,048,576 bytes

The key insight: anything inside {} in an f-string is a Python expression. Variables, math, function calls, method calls — all valid. No more escaping nested quotes inside $() inside double quotes inside heredocs.

Name Origin: f-strings were introduced in Python 3.6 (2016) via PEP 498, authored by Eric V. Smith. The older formats — % formatting (borrowed from C's printf) and .format() — still work but are more verbose. f-strings won because they put the value right next to where it appears in the string, the same way ${var} works in Bash.

Try this now:

python3 -c "
name = 'sshd'
count = 1847
print(f'{name:<15} {count:>6,} messages')
"

Part 4: Lists — Arrays That Actually Work

Bash arrays are... let's be honest.

# Bash arrays: a masterclass in footguns
servers=("web-01" "web-02" "db-01")
echo ${#servers[@]}               # Length: 3
echo ${servers[0]}                # First element
servers+=("cache-01")             # Append

# Iterate
for server in "${servers[@]}"; do
    echo "$server"
done

# Forgot the quotes? Words with spaces split. Forgot [@]? Get first element only.
# Python lists: what arrays should have been
servers = ["web-01", "web-02", "db-01"]
print(len(servers))               # Length: 3
print(servers[0])                 # First element: "web-01"
print(servers[-1])                # Last element: "db-01" (negative indexing!)
servers.append("cache-01")        # Append

# Iterate
for server in servers:
    print(server)

# Slice — get a range
print(servers[1:3])               # ["web-02", "db-01"]

Lists Do What You Expect

# Things that are painful or impossible in Bash
ports = [80, 443, 8080, 8443, 9090]

# Sort
sorted_ports = sorted(ports)                    # New sorted list
ports.sort()                                    # Sort in place

# Filter
high_ports = [p for p in ports if p > 1024]     # [8080, 8443, 9090]

# Transform
port_strings = [str(p) for p in ports]          # ["80", "443", ...]

# Check membership
if 443 in ports:
    print("HTTPS is configured")

# Combine
all_ports = ports + [3306, 5432]                # Concatenate

That [p for p in ports if p > 1024] is called a list comprehension. It's Python's version of a pipeline filter — reads left to right, same as cmd | grep condition. You'll use these constantly.

Gotcha: Python lists are zero-indexed, same as Bash arrays. But negative indices count from the end: servers[-1] is the last element, servers[-2] is second to last. Bash doesn't have this. Once you get used to it, you'll wish every language had it.

Try this now:

python3 -c "
logs = ['INFO', 'ERROR', 'INFO', 'WARN', 'ERROR', 'ERROR', 'INFO']
errors = [x for x in logs if x == 'ERROR']
print(f'Found {len(errors)} errors out of {len(logs)} messages')
"

Part 5: Dicts — This Is the One That Changes Everything

If you take one thing from this entire lesson, let it be this section.

In Bash, associative arrays exist but they're awkward:

# Bash associative arrays: declare or die
declare -A counts
counts[sshd]=47
counts[cron]=12
counts[kernel]=8

# Iterate (order not guaranteed before Bash 5.2)
for key in "${!counts[@]}"; do
    echo "$key: ${counts[$key]}"
done

# No nesting. No default values. No easy serialization.
# Want a dict of dicts? Good luck.
# Python dicts: the data structure that replaces half your scripts
counts = {
    "sshd": 47,
    "cron": 12,
    "kernel": 8,
}

# Access
print(counts["sshd"])              # 47
print(counts.get("nginx", 0))     # 0 (default if key doesn't exist)

# Iterate
for program, count in counts.items():
    print(f"{program}: {count}")

# Add / update
counts["nginx"] = 23
counts["sshd"] += 10              # Increment: now 57

# Sort by value (the thing that takes 5 pipes in Bash)
for prog, n in sorted(counts.items(), key=lambda x: x[1], reverse=True):
    print(f"{prog:<15} {n:>5}")

The "Aha Moment": Counting Things

Here's the Bash version of "count messages by program in a log file":

# Bash: the classic pipeline
awk '{print $5}' /var/log/syslog | cut -d'[' -f1 | sort | uniq -c | sort -rn | head -20

Five commands. One pipeline. Breaks if the log format changes. No error handling. No way to do anything more with the data after printing it.

Here's Python:

from collections import Counter

counts = Counter()
with open("/var/log/syslog") as f:
    for line in f:
        parts = line.split()
        if len(parts) >= 5:
            program = parts[4].split("[")[0].rstrip(":")
            counts[program] += 1

for program, n in counts.most_common(20):
    print(f"{program:<20} {n:>6}")

Same result. But now counts is a data structure you can query, filter, serialize to JSON, send to an API, or combine with another Counter. The Bash pipeline gave you printed text. Python gave you data.

War Story: A team had a 120-line Bash script that monitored log volume across 15 services. It used nested for loops, declare -A for per-service counts, declare -A for per-severity counts, and a third declare -A for per-hour counts. The script had 9 bugs related to uninitialized array keys (Bash returns an empty string for missing keys, which silently breaks arithmetic). The Python rewrite used a dict of Counters — literally {service: Counter()} — and was 40 lines. The three hardest bugs in the Bash version were impossible in Python because Counter() initializes missing keys to zero automatically.

Flashcard Check #1

Question Answer
How do you get a default value from a Python dict without crashing on a missing key? d.get("key", default_value) — returns the default instead of raising KeyError.
What's the Bash equivalent of a Python dict? declare -A mydict — an associative array. But it can't nest, has no .get(), and requires explicit declaration.
What does Counter.most_common(10) return? A list of (key, count) tuples, sorted by count descending. The 10 highest.
In the log-counting pipeline awk | cut | sort | uniq -c | sort -rn, which step does Counter replace? All of them. Counter does the counting (uniq -c) and .most_common() does the sorting (sort -rn).

Try this now:

python3 -c "
from collections import Counter
words = 'the cat sat on the mat the cat ate the rat'.split()
print(Counter(words).most_common(3))
"

Part 6: Conditionals — No More Bracket Roulette

Bash conditionals are a syntax trivia quiz:

# Bash: which brackets? How many? Spaces inside? Outside?
if [ "$status" = "running" ]; then       # Single bracket: POSIX, spaces required
if [[ "$status" == "running" ]]; then    # Double bracket: Bash extension, glob/regex
if (( count > 10 )); then               # Arithmetic context
if [ -f "/etc/config" ]; then            # File test
if [ -z "$var" ]; then                   # Empty string test
# Python: one way. Obvious.
if status == "running":
    print("Service is up")
elif count > 10:
    print("Above threshold")
else:
    print("Something else")

No then. No fi. No semicolons. No brackets. Indentation defines the block — four spaces per level, enforced by the language.

Truthiness — What Counts as False

In Bash, empty strings are the closest thing to "false," and everything else is messy. Python has a clear rule:

# These are all "falsy" (treated as False in an if statement)
False
0
0.0
""          # empty string
[]          # empty list
{}          # empty dict
None

# Everything else is "truthy"

This means:

servers = []
if servers:
    print("We have servers")      # Skipped — empty list is falsy
else:
    print("No servers found")     # This runs

name = ""
if name:
    print(f"Hello, {name}")       # Skipped — empty string is falsy

Compare that to Bash, where you need [ -z "$var" ] to check for empty, [ -n "$var" ] for non-empty, and [[ -z "${array[@]}" ]] doesn't even do what you think.

Remember: In Python, empty collections and zero are falsy. Everything else is truthy. If you're checking "does this have anything in it?", just use if thing:.

Try this now:

python3 -c "
for val in [0, 1, '', 'hello', [], [1,2], {}, {'a': 1}, None, True]:
    print(f'{str(val):<15}{bool(val)}')
"

Part 7: Loops — for, while, and Friends

for Loops

# Bash: iterate over a list
for server in web-01 web-02 db-01; do
    echo "Checking $server"
done

# Bash: iterate over a range (C-style)
for ((i=0; i<10; i++)); do
    echo "$i"
done

# Bash: iterate over command output
for file in $(find /var/log -name "*.log"); do
    echo "$file"
done  # BUG: breaks on filenames with spaces
# Python: iterate over a list
for server in ["web-01", "web-02", "db-01"]:
    print(f"Checking {server}")

# Python: iterate over a range
for i in range(10):      # 0 through 9
    print(i)

# Python: iterate with index (no manual counter needed)
servers = ["web-01", "web-02", "db-01"]
for i, server in enumerate(servers):
    print(f"{i}: {server}")

# Python: iterate over two lists in parallel
hosts = ["web-01", "web-02", "db-01"]
ports = [80, 80, 5432]
for host, port in zip(hosts, ports):
    print(f"{host}:{port}")

enumerate() and zip() are the two loop tools you'll use most. enumerate gives you the index without maintaining a counter variable. zip walks two lists in lockstep — try doing that cleanly in Bash.

while Loops and File Reading

# Bash: read a file line by line (the correct way)
while IFS= read -r line; do
    echo "$line"
done < /var/log/syslog
# Python: read a file line by line
with open("/var/log/syslog") as f:
    for line in f:
        print(line, end="")     # end="" because line already has \n

That with keyword is a context manager — it automatically closes the file when you're done, even if an error occurs. It's Python's version of trap cleanup EXIT, except it's scoped to one block instead of the whole script.

Try this now:

python3 -c "
for i, color in enumerate(['red', 'green', 'blue']):
    print(f'{i}: {color}')

for name, age in zip(['Alice', 'Bob'], [30, 25]):
    print(f'{name} is {age}')
"

Part 8: Functions — def, return, and No More Subshell Surprises

Bash functions communicate by echoing text and return only exit codes (0–255):

# Bash: function that "returns" a value
get_disk_usage() {
    df -h / | tail -1 | awk '{print $5}' | tr -d '%'
}

usage=$(get_disk_usage)    # Capture stdout — the only way to "return" data
echo "Usage: ${usage}%"

# Bash: function with arguments
check_port() {
    local host="$1"
    local port="$2"
    nc -z "$host" "$port" 2>/dev/null
    return $?              # Exit code: 0 = open, 1 = closed
}
# Python: functions return actual values
def get_disk_usage(path="/"):
    import shutil
    total, used, free = shutil.disk_usage(path)
    return round(used / total * 100, 1)    # Returns a float, not a string

usage = get_disk_usage()          # No subshell. No text parsing.
print(f"Usage: {usage}%")

usage_var = get_disk_usage("/var") # Different path — default args!

# Python: function with arguments and a default
def check_port(host, port, timeout=3):
    import socket
    try:
        sock = socket.create_connection((host, port), timeout=timeout)
        sock.close()
        return True               # Returns a boolean, not an exit code
    except (ConnectionRefusedError, TimeoutError):
        return False

Key differences:

Feature Bash Python
Return data echo + capture with $() return value
Return type Always a string (via stdout) Any type: int, str, list, dict, bool
Arguments Positional only: $1, $2 Named, with defaults
Scope Global by default (need local) Local by default
Multiple returns Impossible (one stdout stream) return a, b, c (returns a tuple)
# Multiple return values — try this in Bash
def analyze_log(path):
    errors = 0
    warnings = 0
    total = 0
    with open(path) as f:
        for line in f:
            total += 1
            if "ERROR" in line:
                errors += 1
            elif "WARN" in line:
                warnings += 1
    return total, errors, warnings

total, errors, warnings = analyze_log("/var/log/syslog")
print(f"Total: {total}, Errors: {errors}, Warnings: {warnings}")

Under the Hood: In Bash, result=$(my_function) runs the function in a subshell. That means my_function can't modify variables in the caller's scope — variable changes inside the $() disappear when the subshell exits. This is the #1 source of "why didn't my variable update?" bugs in Bash. Python functions share the same process and can return data directly. No subshells. No surprises.

Try this now:

python3 -c "
def greet(name, greeting='Hello'):
    return f'{greeting}, {name}!'

print(greet('ops team'))
print(greet('SRE team', greeting='Hey'))
"

Part 9: File I/O — open(), with, and No More Redirects

Reading Files

# Bash: read entire file into a variable
content=$(cat /etc/hostname)

# Bash: process line by line
while IFS= read -r line; do
    # do something with $line
done < /etc/passwd
# Python: read entire file
with open("/etc/hostname") as f:
    content = f.read().strip()

# Python: process line by line (memory-efficient, even for huge files)
with open("/etc/passwd") as f:
    for line in f:
        parts = line.strip().split(":")
        username = parts[0]
        shell = parts[-1]
        print(f"{username:<20} {shell}")

Writing Files

# Bash: write to a file
echo "server=web-03" > /tmp/config.txt
echo "port=8080" >> /tmp/config.txt
# Python: write to a file
with open("/tmp/config.txt", "w") as f:
    f.write("server=web-03\n")
    f.write("port=8080\n")

# Or build and write at once
lines = [
    "server=web-03",
    "port=8080",
    "timeout=30",
]
with open("/tmp/config.txt", "w") as f:
    f.write("\n".join(lines) + "\n")

The with Statement — Why It Matters

# Bash: if the script dies, the file descriptor might not close.
# You rely on the OS to clean up when the process exits.
exec 3>/tmp/output.txt
echo "writing..." >&3
# ... script crashes here ...
exec 3>&-   # Never reached
# Python: the with statement guarantees cleanup
with open("/tmp/output.txt", "w") as f:
    f.write("writing...\n")
    # Even if an exception occurs here, the file is closed
# File is closed here, guaranteed. Period.

The with pattern works for anything that needs cleanup: files, network connections, database handles, locks. It's trap EXIT but scoped, composable, and impossible to forget.

Mode Meaning Bash equivalent
"r" Read (default) < file
"w" Write (truncates!) > file
"a" Append >> file
"r+" Read and write <> file (rare in Bash)

Gotcha: "w" mode truncates the file immediately on open — before you write anything. If your script crashes between open() and write(), you have an empty file. For atomic writes, write to a temp file then rename it.

Try this now:

python3 -c "
with open('/tmp/pytest.txt', 'w') as f:
    for i in range(5):
        f.write(f'Line {i}: this is a test\n')

with open('/tmp/pytest.txt') as f:
    print(f.read())
"

Part 10: Error Handling — Why Exceptions Beat Exit Codes

In Bash, error handling is set -e and prayer:

set -e
# Script exits on first non-zero exit code
# Unless... it's in a pipe. Or an if condition. Or a subshell.
# set -e has so many exceptions it barely qualifies as error handling.

# "Handle" a specific error? Good luck:
output=$(some_command 2>/dev/null) || {
    echo "Failed!"
    exit 1
}

Python has try/except — structured, specific, and reliable:

# Catch a specific error
try:
    with open("/var/log/syslog") as f:
        content = f.read()
except FileNotFoundError:
    print("Syslog not found (are you on macOS?)")
    content = ""
except PermissionError:
    print("Permission denied — run with sudo?")
    content = ""

# Catch, log, and continue (the thing set -e can't do)
servers = ["web-01", "web-02", "web-03"]
results = {}
for server in servers:
    try:
        # pretend this is an API call
        results[server] = check_server(server)
    except ConnectionError as e:
        print(f"WARN: {server} unreachable: {e}")
        results[server] = None
        # Script continues to the next server instead of dying

The Exception Hierarchy You Actually Need

BaseException
 └── Exception
      ├── FileNotFoundError     # File doesn't exist
      ├── PermissionError       # Can't read/write
      ├── ValueError            # Wrong value (int("abc"))
      ├── KeyError              # Dict key doesn't exist
      ├── IndexError            # List index out of range
      ├── TypeError             # Wrong type (len(42))
      ├── ConnectionError       # Network failure
      ├── TimeoutError          # Operation timed out
      └── KeyboardInterrupt     # Ctrl+C (actually under BaseException)

You don't need to memorize these. You need to know that specific exceptions exist and you can catch them individually. This is the fundamental difference from exit codes: exit code 1 means "something failed." FileNotFoundError means "this specific file doesn't exist."

try / except / else / finally

try:
    f = open("/var/log/syslog")
    data = f.read()
except FileNotFoundError:
    print("File missing")
    data = ""
else:
    # Only runs if NO exception occurred
    print(f"Read {len(data)} bytes")
finally:
    # ALWAYS runs — cleanup goes here
    print("Done, whether it worked or not")

Mental Model: set -e is a fire alarm — when something goes wrong, everybody evacuates. try/except is a fire extinguisher — you identify what's burning, put it out, and keep working. Real ops scripts need extinguishers, not just alarms.

Flashcard Check #2

Question Answer
What happens when you access a dict key that doesn't exist? KeyError is raised. Use .get(key, default) to avoid it.
How do you catch multiple exception types in one block? except (TypeError, ValueError) as e: — tuple of exception types.
What runs in a finally block? Everything — it runs whether the try succeeded or failed. Use it for cleanup.
What's the Bash equivalent of try/except? cmd || handle_error or if ! cmd; then handle; fi. But these only see exit codes, not error types.

Try this now:

python3 -c "
for val in ['42', 'hello', '3.14', '']:
    try:
        result = int(val)
        print(f'int({val!r}) = {result}')
    except ValueError as e:
        print(f'int({val!r}) failed: {e}')
"

Part 11: String Methods — Your sed/awk/cut Replacement Kit

You've been reaching for sed, awk, cut, tr, and grep your whole career. Python strings have methods that replace most of those one-liners.

line = "  Mar 23 04:12:03 web-prod-03 sshd[28410]: Failed password for admin  "

# strip — like sed 's/^[[:space:]]*//;s/[[:space:]]*$//'
line.strip()           # Removes leading/trailing whitespace
line.lstrip()          # Left strip only
line.rstrip()          # Right strip only

# split — like awk '{print $N}' or cut -d' ' -f3
parts = line.split()   # Split on whitespace (like awk's default)
parts[4]               # "sshd[28410]:" — 5th field (0-indexed)

line.split(":")         # Split on colons (like cut -d':')

# join — the reverse of split (no Bash equivalent that isn't painful)
", ".join(["web-01", "web-02", "db-01"])   # "web-01, web-02, db-01"
":".join(["usr", "local", "bin"])           # "usr:local:bin"

# startswith / endswith — like grep ^pattern or grep pattern$
"sshd[28410]:".startswith("sshd")     # True
"/var/log/syslog".endswith(".log")     # False (it's "syslog", no .log)

# replace — like sed 's/old/new/g'
"Hello World".replace("World", "Ops")  # "Hello Ops"

# in — like grep (for simple substring matching)
"Failed password" in line              # True
"Accepted" in line                     # False

# upper, lower — like tr '[:lower:]' '[:upper:]'
"warning".upper()     # "WARNING"
"CRITICAL".lower()    # "critical"

Chaining Methods — The Python Pipeline

# Extract program name from a syslog line
# Bash: echo "$line" | awk '{print $5}' | cut -d'[' -f1 | tr -d ':'
# Python:
program = line.split()[4].split("[")[0].rstrip(":")
# "sshd"

One line, no pipes, no subprocesses, and it returns a Python string you can use directly in a dict, a comparison, or an f-string.

Trivia: Python strings are immutable — every method returns a new string. The original is unchanged. This is the opposite of Bash's sed -i (in-place edit). It seems wasteful, but it means you can never accidentally corrupt data by modifying it in two places at once. Immutability is a safety feature.

Try this now:

python3 -c "
csv_line = '  web-prod-03, 192.168.1.50, 8080, running  '
host, ip, port, status = [field.strip() for field in csv_line.split(',')]
print(f'Host {host} ({ip}:{port}) is {status.upper()}')
"

Part 12: The Import System — The Standard Library Is Huge

In Bash, external tools live in /usr/bin/ and you call them by name. In Python, libraries are imported.

# Standard library — ships with Python, no install required
import os                    # OS interactions (env vars, paths, PIDs)
import sys                   # Python runtime (args, exit, stdin/stdout)
import json                  # JSON parsing/writing
import re                    # Regular expressions
import pathlib               # File path operations
import subprocess            # Run shell commands
import datetime              # Dates and times
import collections           # Counter, defaultdict, OrderedDict
import socket                # Low-level networking
import shutil                # File copying, disk usage
import csv                   # CSV file reading/writing
import hashlib               # Hashing (md5, sha256)
import argparse              # CLI argument parsing
import logging               # Structured logging
import tempfile              # Temporary files and directories
# Third-party — install with pip
# pip3 install requests pyyaml
import requests              # HTTP requests (the better curl)
import yaml                  # YAML parsing

Import Styles

# Import the whole module
import os
os.environ["HOME"]           # Use with prefix

# Import specific things
from collections import Counter
Counter(["a", "b", "a"])     # Use directly, no prefix

# Import with alias
from pathlib import Path as P
P("/etc").exists()

Remember: from X import * is the Python equivalent of source script.sh — it dumps everything into your namespace. Don't do it. Use explicit imports so you always know where a function came from.

Try this now:

python3 -c "
import sys, os
print(f'Python: {sys.version.split()[0]}')
print(f'PID:    {os.getpid()}')
print(f'User:   {os.environ.get(\"USER\", \"unknown\")}')
print(f'CWD:    {os.getcwd()}')
"

Part 13: The Main Guard — Scripts vs. Modules

#!/usr/bin/env python3

def count_programs(path):
    """Count syslog messages by program name."""
    from collections import Counter
    counts = Counter()
    with open(path) as f:
        for line in f:
            parts = line.split()
            if len(parts) >= 5:
                program = parts[4].split("[")[0].rstrip(":")
                counts[program] += 1
    return counts

def main():
    import sys
    path = sys.argv[1] if len(sys.argv) > 1 else "/var/log/syslog"
    counts = count_programs(path)
    for program, n in counts.most_common(20):
        print(f"{program:<20} {n:>6}")

if __name__ == "__main__":
    main()

That if __name__ == "__main__": block runs only when you execute the file directly (python3 script.py or ./script.py). If someone imports your file as a module (from script import count_programs), main() doesn't run automatically.

This is like building a Bash library that sources cleanly: you can reuse the functions without triggering the "main logic" at the bottom.

Under the Hood: When Python runs a file directly, it sets the special variable __name__ to the string "__main__". When the file is imported, __name__ is set to the module's name (e.g., "script"). This one-line idiom is the entire module system's entry-point mechanism. Every serious Python script uses it.


Part 14: The Mission — Putting It All Together

Time to build the syslog analyzer from the mission. Every concept from this lesson goes in.

#!/usr/bin/env python3
"""Syslog analyzer — count messages by program, flag SSH brute-force attempts."""

import sys
from collections import Counter

def parse_syslog_line(line):
    """Extract program name and message from a syslog line.

    Returns (program, message) or (None, None) if unparseable.
    """
    parts = line.split()
    if len(parts) < 6:
        return None, None
    # Field 5 is "program[PID]:" or "program:"
    raw_program = parts[4]
    program = raw_program.split("[")[0].rstrip(":")
    message = " ".join(parts[5:])
    return program, message

def analyze_syslog(path):
    """Read a syslog file and return analysis results as a dict."""
    program_counts = Counter()
    ssh_failures = []
    total_lines = 0
    parse_errors = 0

    try:
        with open(path) as f:
            for line in f:
                total_lines += 1
                program, message = parse_syslog_line(line.strip())

                if program is None:
                    parse_errors += 1
                    continue

                program_counts[program] += 1

                # Flag SSH brute-force indicators
                if program == "sshd" and "Failed password" in line:
                    # Extract source IP: "... from 203.0.113.42 port ..."
                    parts = line.split()
                    for i, part in enumerate(parts):
                        if part == "from" and i + 1 < len(parts):
                            ssh_failures.append(parts[i + 1])
                            break

    except FileNotFoundError:
        print(f"ERROR: {path} not found", file=sys.stderr)
        sys.exit(1)
    except PermissionError:
        print(f"ERROR: Permission denied reading {path}", file=sys.stderr)
        sys.exit(1)

    return {
        "total_lines": total_lines,
        "parse_errors": parse_errors,
        "program_counts": program_counts,
        "ssh_failure_ips": Counter(ssh_failures),
    }

def print_report(results):
    """Print a formatted summary report."""
    print("=" * 50)
    print("SYSLOG ANALYSIS REPORT")
    print("=" * 50)
    print(f"\nTotal lines:   {results['total_lines']:>8,}")
    print(f"Parse errors:  {results['parse_errors']:>8,}")
    print(f"Programs seen: {len(results['program_counts']):>8}")

    print(f"\n{'--- Top Programs by Message Count ---':^50}")
    print(f"{'Program':<25} {'Count':>8} {'%':>7}")
    print("-" * 42)
    total = results["total_lines"]
    for program, count in results["program_counts"].most_common(15):
        pct = count / total * 100 if total > 0 else 0
        print(f"{program:<25} {count:>8,} {pct:>6.1f}%")

    ssh_ips = results["ssh_failure_ips"]
    if ssh_ips:
        print(f"\n{'--- SSH Brute-Force Suspects ---':^50}")
        print(f"{'Source IP':<20} {'Failed Attempts':>15}")
        print("-" * 37)
        for ip, count in ssh_ips.most_common(10):
            flag = " *** ALERT" if count >= 10 else ""
            print(f"{ip:<20} {count:>15,}{flag}")
    else:
        print("\nNo SSH authentication failures detected.")

    print("\n" + "=" * 50)

def main():
    path = sys.argv[1] if len(sys.argv) > 1 else "/var/log/syslog"
    results = analyze_syslog(path)
    print_report(results)

if __name__ == "__main__":
    main()

Save it as syslog_analyzer.py, chmod +x, and run it:

chmod +x syslog_analyzer.py
./syslog_analyzer.py /var/log/syslog

Sample output:

==================================================
SYSLOG ANALYSIS REPORT
==================================================

Total lines:     12,847
Parse errors:         3
Programs seen:       14

   --- Top Programs by Message Count ---
Program                      Count       %
------------------------------------------
sshd                         4,231   32.9%
CRON                         3,102   24.1%
systemd                      2,445   19.0%
kernel                       1,893   14.7%
NetworkManager                 412    3.2%
snapd                          287    2.2%
sudo                           198    1.5%

      --- SSH Brute-Force Suspects ---
Source IP             Failed Attempts
-------------------------------------
203.0.113.42                    847 *** ALERT
198.51.100.7                    312 *** ALERT
192.0.2.100                      23 *** ALERT
10.0.1.200                        4

Count the concepts: f-strings, dicts, Counter, with statement, try/except, string methods (split, strip, rstrip), for loops, enumerate, functions with return values, the main guard, and sys.argv for CLI arguments. That's this whole lesson in one script.


Exercises

Exercise 1: Quick Win (2 minutes)

Open a Python REPL and build a dict that maps HTTP status codes to their meanings. Look up at least 5 codes from memory.

Solution
codes = {
    200: "OK",
    301: "Moved Permanently",
    404: "Not Found",
    500: "Internal Server Error",
    503: "Service Unavailable",
}
for code, meaning in sorted(codes.items()):
    print(f"{code}: {meaning}")

Exercise 2: Port Scanner (10 minutes)

Write a script that takes a hostname and a comma-separated list of ports from the command line, checks which ports are open using socket.create_connection(), and prints results in a table.

Usage: ./portscan.py web-01 22,80,443,8080

Hints - `sys.argv[1]` for hostname, `sys.argv[2].split(",")` for ports - `int(port)` to convert string to integer - `socket.create_connection((host, port), timeout=2)` — wrap in try/except - `ConnectionRefusedError` for closed ports, `TimeoutError` for filtered
Solution
#!/usr/bin/env python3
import socket
import sys

def scan_port(host, port, timeout=2):
    try:
        sock = socket.create_connection((host, port), timeout=timeout)
        sock.close()
        return "open"
    except ConnectionRefusedError:
        return "closed"
    except TimeoutError:
        return "filtered"
    except OSError as e:
        return f"error: {e}"

def main():
    if len(sys.argv) < 3:
        print(f"Usage: {sys.argv[0]} <host> <port1,port2,...>")
        sys.exit(1)

    host = sys.argv[1]
    ports = [int(p) for p in sys.argv[2].split(",")]

    print(f"Scanning {host}...")
    print(f"{'PORT':<8} {'STATE':<12}")
    print("-" * 20)
    for port in sorted(ports):
        state = scan_port(host, port)
        print(f"{port:<8} {state:<12}")

if __name__ == "__main__":
    main()

Exercise 3: Translate a Bash Pipeline (15 minutes)

Translate this Bash one-liner to Python. The script should produce identical output.

cat /etc/passwd | grep -v '^#' | awk -F: '$7 !~ /nologin|false/ {print $1, $7}' | sort

This finds all users with real login shells.

Solution
#!/usr/bin/env python3
results = []
with open("/etc/passwd") as f:
    for line in f:
        line = line.strip()
        if line.startswith("#") or not line:
            continue
        parts = line.split(":")
        if len(parts) >= 7:
            user, shell = parts[0], parts[6]
            if "nologin" not in shell and "false" not in shell:
                results.append((user, shell))

for user, shell in sorted(results):
    print(f"{user} {shell}")

Cheat Sheet

Bash Python Notes
$var var No prefix, no quoting needed
echo "text" print("text") Or print(f"...") for interpolation
echo "$var" print(f"{var}") f-strings handle formatting
${#string} len(string) Works on lists, dicts, strings
$((x + 1)) x + 1 Math is native, not a special mode
[[ $a == $b ]] a == b No brackets, no quoting
[ -f file ] Path(file).is_file() Returns bool, import pathlib
${arr[@]} mylist Just use it — no special syntax
${#arr[@]} len(mylist) Length of anything
declare -A mydict = {} Or dict()
for x in ...; do for x in ...: Colon, not semicolon-do
while read line for line in f: File iteration
func() { ... } def func(): ... Indentation, not braces
$1, $2 Named params def f(host, port):
$(cmd) subprocess.run(...) But prefer native Python
cmd \| grep \| awk for/if/split Data stays in-process
set -e try/except Per-operation, not global
exit 1 sys.exit(1) Or raise an exception
source file.sh import module Namespaced, no pollution
cat file open(f).read() Use with for safety
>> (append) open(f, "a") "a" = append mode
sort \| uniq -c Counter() from collections import Counter

Takeaways

  • Python has types. Strings are strings, ints are ints, bools are bools. No more invisible string-to-number conversions. Errors happen loudly and immediately.

  • Dicts replace entire pipelines. Counting, grouping, looking things up by key — the things that take sort | uniq -c | sort -rn in Bash are one-liners with Counter and dict.

  • f-strings are better than echo/printf. Any expression inside {}, with formatting options. No quoting nightmares.

  • try/except is error handling. set -e is error detection. Python lets you catch specific failures, recover, and continue. Bash lets you crash or ignore everything.

  • with guarantees cleanup. Files close, connections drop, locks release — even when exceptions occur. It's trap EXIT that actually works every time.

  • The standard library replaces most CLI tools. json, csv, re, pathlib, collections, socket — you can do most ops tasks without installing anything.


What's Next

This lesson got you from zero to "I can write a useful script." The next step is learning Python's replacements for the Bash tools you already use — subprocess for shell commands, pathlib for file operations, requests for curl, argparse for getopts, and logging for structured output. That's covered in Python for Ops — The Bash Expert's Bridge.