Portal | Level: L3: Advanced | Topics: K8s Ecosystem | Domain: Kubernetes
Kubernetes Operators & CRDs Drills¶
Remember: An operator is a CRD + a controller. The CRD defines "what you want" (desired state), the controller does "how to get there" (reconciliation loop). The controller watches for changes to the custom resource and takes action to make reality match the spec. This is the same pattern Kubernetes itself uses — Deployments have a controller that ensures the right number of pods exist.
Gotcha: Deleting a CRD deletes ALL custom resources of that type — including their data. If you uninstall an operator that manages databases, deleting the CRD can cascade-delete all your database resources. Always check
kubectl get <crd-name> --all-namespacesbefore removing a CRD, and use finalizers to prevent accidental deletion.
Drill 1: What Is a CRD?¶
Difficulty: Easy
Q: Explain what a CRD is and how it extends the Kubernetes API. Give an example.
Answer
A **Custom Resource Definition (CRD)** extends the Kubernetes API with new resource types. Once a CRD is created, you can `kubectl get`, `create`, `delete` the custom resource just like built-in resources.apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
group: example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
enum: ["postgres", "mysql"]
version:
type: string
replicas:
type: integer
scope: Namespaced
names:
plural: databases
singular: database
kind: Database
shortNames: ["db"]
Drill 2: Reconciliation Loop¶
Difficulty: Medium
Q: Explain the reconciliation loop pattern that all Kubernetes operators follow. What triggers reconciliation?
Answer
┌──────────────┐
│ Watch │
│ (informer) │
└──────┬───────┘
│ event
┌──────▼───────┐
│ Queue │
│ (work queue) │
└──────┬───────┘
│ dequeue
┌──────▼───────┐
┌────────►│ Reconcile │
│ │ (your code) │
│ └──────┬───────┘
│ │
│ ┌──────▼───────┐
│ │ Desired == │
│ No │ Actual? │
├─────────┤ │
│ └──────┬───────┘
│ │ Yes
│ ┌──────▼───────┐
└─────────│ Requeue │
│ (periodic) │
└──────────────┘
Drill 3: Owner References¶
Difficulty: Medium
Q: What are owner references and why are they critical for operators?
Answer
Owner references create a parent-child relationship between resources. When the parent is deleted, Kubernetes garbage-collects all children. Why they matter: 1. **Garbage collection**: Delete the Database CR → all Pods, Services, PVCs are cleaned up 2. **Event propagation**: Changes to owned resources trigger reconciliation of the owner 3. **Prevents orphans**: No leaked resources when CRs are deleted Without owner references, deleting a CR would leave orphaned pods, services, and PVCs.Drill 4: Finalizers¶
Difficulty: Medium
Q: What is a finalizer and when would you use one in an operator?
Answer
A finalizer is a string on a resource's `metadata.finalizers` list that prevents deletion until the operator removes it. Deletion flow: 1. User runs `kubectl delete database my-db` 2. Kubernetes sets `deletionTimestamp` but does NOT delete the resource 3. Operator's reconcile function is called 4. Operator performs cleanup (e.g., drop database, remove external resources, revoke credentials) 5. Operator removes the finalizer from the list 6. Kubernetes deletes the resource Use finalizers when: - You create external resources (cloud databases, DNS records, IAM roles) - You need to run cleanup logic before deletion - You need to coordinate with external systems// Add finalizer
controllerutil.AddFinalizer(database, "databases.example.com/cleanup")
// Check if being deleted
if !database.DeletionTimestamp.IsZero() {
// Run cleanup
cleanupExternalResources(database)
// Remove finalizer
controllerutil.RemoveFinalizer(database, "databases.example.com/cleanup")
}
Drill 5: Status Subresource¶
Difficulty: Medium
Q: Why should operators use the status subresource instead of updating the entire CR?
Answer
The status subresource allows separate RBAC and update semantics for `.spec` (desired state) vs `.status` (observed state). Benefits: 1. **RBAC separation**: Users can update spec, only the operator updates status 2. **No conflict**: Updating status doesn't conflict with spec updates (different API endpoints) 3. **Convention**: spec = user intent, status = operator observationsDrill 6: Kubebuilder Scaffold¶
Difficulty: Easy
Q: How do you scaffold a new operator project using Kubebuilder?
Answer
# Initialize project
kubebuilder init --domain example.com --repo github.com/org/db-operator
# Create API (CRD + Controller)
kubebuilder create api --group database --version v1 --kind Database
# Key files created:
# api/v1/database_types.go — CRD spec/status structs
# controllers/database_controller.go — Reconcile logic
# config/crd/bases/ — Generated CRD YAML
# Edit the types
vi api/v1/database_types.go
# Regenerate manifests after type changes
make manifests
# Run locally (against current kubeconfig)
make run
# Build and push image
make docker-build docker-push IMG=registry.example.com/db-operator:v1
# Deploy to cluster
make deploy IMG=registry.example.com/db-operator:v1
Drill 7: Operator Maturity Levels¶
Difficulty: Easy
Q: What are the 5 operator capability levels? Give an example of each.
Answer
| Level | Name | Capabilities | Example | |-------|------|-------------|---------| | 1 | Basic Install | Automated install, CRD, operator lifecycle | Operator deploys app via CR | | 2 | Seamless Upgrades | Version upgrades, patch management | Operator upgrades Postgres 15→16 | | 3 | Full Lifecycle | Backup, restore, failure recovery | Automated backup to S3, PITR | | 4 | Deep Insights | Metrics, alerts, log processing | Custom Grafana dashboards, SLO monitoring | | 5 | Auto Pilot | Auto-scaling, auto-tuning, anomaly detection | Auto-adjusts buffer pool, auto-failover | Most open-source operators are Level 2-3. Fully automated (Level 5) is rare.Drill 8: Debug a Stuck Operator¶
Difficulty: Hard
Q: Your custom operator is running but CRs stay in "Pending" state and never transition to "Running". How do you debug?
Answer
# 1. Check operator pod logs
kubectl logs -n operator-system deploy/my-operator-controller-manager -f
# Look for errors, panics, or RBAC denied messages
# 2. Check if operator is watching the right namespace
kubectl get deploy -n operator-system -o yaml | grep -A5 WATCH_NAMESPACE
# 3. Check RBAC — does the operator SA have the right permissions?
kubectl auth can-i get databases --as=system:serviceaccount:operator-system:my-operator-sa
kubectl auth can-i create pods --as=system:serviceaccount:operator-system:my-operator-sa
kubectl auth can-i update databases/status --as=system:serviceaccount:operator-system:my-operator-sa
# 4. Check events on the CR
kubectl describe database my-db
# Look at Events section
# 5. Check if the reconcile function is being called
# Add debug logging in the reconcile function
# Or check controller-runtime metrics:
kubectl port-forward -n operator-system svc/my-operator-metrics 8443:8443
curl -k https://localhost:8443/metrics | grep reconcile
# 6. Common issues:
# - Missing RBAC for status subresource updates
# - Reconcile returning error but not logging it
# - Watching wrong group/version/kind
# - Leader election stuck (previous pod still holds lease)
Wiki Navigation¶
Prerequisites¶
- K8s Ecosystem (Topic Pack, L0)
Related Content¶
- K8s Ecosystem (Topic Pack, L0) — K8s Ecosystem
- Skillcheck: Kubernetes Operators (Assessment, L3) — K8s Ecosystem