Skip to content

Troubleshooting

Application Issues

Pod stuck in CrashLoopBackOff

kubectl logs -n grokdevops deploy/grokdevops --previous
kubectl describe pod -n grokdevops -l app.kubernetes.io/name=grokdevops

Common causes: - Image not imported into k3s (docker save | sudo k3s ctr images import -) - Wrong image tag in values file - Port conflict

/metrics returns empty or errors

Verify prometheus-client is installed in the container:

kubectl exec -n grokdevops deploy/grokdevops -- pip list | grep prometheus

Test the endpoint directly:

kubectl port-forward -n grokdevops svc/grokdevops 8000:80
curl http://localhost:8000/metrics

Observability Issues

ServiceMonitor not picked up by Prometheus

  1. Check the ServiceMonitor exists:

    kubectl get servicemonitor -n grokdevops
    

  2. Verify Prometheus is configured to watch all namespaces:

    kubectl get prometheus -n monitoring -o yaml | grep serviceMonitorSelector
    
    The install script sets serviceMonitorSelectorNilUsesHelmValues=false to match all ServiceMonitors.

  3. Check Service labels match ServiceMonitor selector:

    kubectl get svc -n grokdevops --show-labels
    kubectl get servicemonitor -n grokdevops -o yaml
    

Loki not receiving logs

  1. Check Promtail is running:

    kubectl get pods -n monitoring -l app.kubernetes.io/name=promtail
    

  2. Check Promtail logs:

    kubectl logs -n monitoring -l app.kubernetes.io/name=promtail
    

  3. Verify Loki endpoint:

    kubectl port-forward -n monitoring svc/loki 3100:3100
    curl http://localhost:3100/ready
    

Grafana can't connect to data sources

Verify the data source URLs in Grafana match the service names: - Prometheus: http://kube-prometheus-stack-prometheus:9090 - Loki: http://loki:3100 - Tempo: http://tempo:3200

Helm Issues

Helm template rendering fails

helm template grokdevops devops/helm/grokdevops -f devops/helm/values-dev.yaml --debug

CRDs not found (ServiceMonitor)

The kube-prometheus-stack must be installed before deploying with ServiceMonitor enabled:

# Install observability stack first
./devops/scripts/install-observability.sh

# Then deploy application
./devops/scripts/deploy-local.sh

k3s Issues

k3s service won't start

sudo systemctl status k3s
sudo journalctl -u k3s -f

kubectl commands fail

# Check kubeconfig
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
kubectl cluster-info

CI Issues

Shellcheck failures

Fix shell script issues locally:

# Install shellcheck
apt-get install shellcheck  # or brew install shellcheck

# Run on all scripts
shellcheck devops/**/*.sh

Ansible syntax check fails

cd devops/ansible
ansible-playbook playbooks/bootstrap-k3s.yml --syntax-check

Cleanup Commands

Remove application

./devops/scripts/deploy-local.sh --uninstall

Remove monitoring stack

./devops/scripts/install-observability.sh --uninstall

Remove k3s entirely

/usr/local/bin/k3s-uninstall.sh