K8S Ops¶

110 cards — 🟢 21 easy | 🟡 51 medium | 🔴 23 hard

🟢 Easy (21)¶

1. What is Helm and how does it help manage Kubernetes applications?

Show answer

Package manager for Kubernetes. Basically the ability to package YAML files and distribute them to other users and apply them in the cluster(s).

As a concept it's quite common and can be found in many platforms and services. Think for example on package managers in operating systems. If you use Fedora/RHEL that would be dnf. If you use Ubuntu then, apt. If you don't use Linux, then a different question should be asked and it's why? but that's another topic :)

2. What is the role of Helm and how does it simplify Kubernetes deployments?

Show answer

* Role of Helm: Helm is a package manager for Kubernetes applications, simplifying deployment and management.
* It uses charts (packages of pre-configured Kubernetes resources) to define, install, and upgrade applications.
* Helm streamlines the release process and promotes reusability.

Remember: Helm = K8s package manager. Chart=package, Release=instance, Repo=store.

Example: `helm install my-app bitnami/nginx --set service.type=LoadBalancer`

3. What are some use cases for using Helm template file?

Show answer

* Deploy the same application across multiple different environments
* CI/CD

Remember: Helm = K8s package manager. Chart=package, Release=instance, Repo=store.

Example: `helm install my-app bitnami/nginx --set service.type=LoadBalancer`

4. What is Heapster in Kubernetes?

Show answer

Heapster is a performance monitoring and metrics collection system for data collected by the Kubelet. This aggregator is natively supported and runs like any other pod within a Kubernetes cluster, which allows it to discover and query usage data from all nodes within the cluster. Note: Heapster has been deprecated and replaced by the Metrics Server.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

5. Which command lists all Pods in a Kubernetes cluster?

Show answer

kubectl get pods (for current namespace) or kubectl get pods -A (for all namespaces) will list all pods and their status.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

6. What is kubectl and how is it used to manage Kubernetes clusters?

Show answer

Kubectl is a CLI (command-line interface) that is used to run commands against Kubernetes clusters. As such, it controls the Kubernetes cluster manager through different create and manage commands on the Kubernetes components. It is the primary tool for interacting with Kubernetes clusters.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

7. What is the Operator Framework?

Show answer

open source toolkit used to manage k8s native applications, called operators, in an automated and efficient way.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

8. Describe shortly and in high-level, what happens when you run kubectl get nodes

Show answer

1. Your user is getting authenticated
2. Request is validated by the kube-apiserver
3. Data is retrieved from etcd

Remember: get=summary list, describe=detailed+events. "get=glance, describe=deep dive."

9. What is Container resource monitoring?

Show answer

Container resource monitoring refers to the activity that collects the metrics and tracks the health of containerized applications and microservices environments. It helps to improve health and performance and also makes sure that they operate smoothly. Common tools include Prometheus, Grafana, and the Kubernetes Metrics Server.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

10. How do you delete a pod in Kubernetes and what happens after?

Show answer

`kubectl delete pod pod_name`

Gotcha: delete sends SIGTERM, 30s grace, then SIGKILL. `--grace-period=0 --force` skips.

Remember: Deleting a Deployment also removes its ReplicaSets and Pods.

11. Which command will list all the object types in a cluster?

Show answer

`kubectl api-resources` lists all resource types (pods, services, deployments, etc.) available in the cluster. Shows NAME, SHORTNAMES, APIVERSION, NAMESPACED, and KIND columns. Use `--namespaced=true` to filter namespaced resources only. Combine with `kubectl explain ` to explore field schemas for any listed resource.

12. What does '/50%' mean in HPA status?

Show answer

It means the HPA cannot read CPU metrics. Usually because metrics-server is not installed, not healthy, or the deployment doesn't have CPU resource requests defined. HPA needs requests to calculate percentage-based utilization.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

13. What are the prerequisites for HPA to work?

Show answer

1) metrics-server must be installed and healthy; 2) The target deployment must have resource requests (at minimum CPU requests for CPU-based scaling); 3) The metrics API must be accessible (/apis/metrics.k8s.io/v1beta1).

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

14. What does exit code 137 mean for a Kubernetes container?

Show answer

Exit code 137 = 128 + 9 (SIGKILL). The container was killed by the kernel OOM killer because it exceeded its memory limit. Check with: kubectl describe pod | grep 'Last State' — look for 'Reason: OOMKilled'.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

15. What must be set on pods for HPA CPU-based scaling to work?

Show answer

Resource requests (resources.requests.cpu) must be defined. HPA computes utilization as currentUsage / request, so without requests the metric is undefined and HPA cannot function.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

16. What is the default HPA scale-down stabilization window?

Show answer

300 seconds (5 minutes). The controller looks back over this window and picks the highest (most conservative) replica count recommendation to prevent flapping.

Example: `kubectl scale deployment web --replicas=5`. HPA for automatic scaling.

17. How do you verify that metrics-server is running and providing data?

Show answer

Run kubectl top nodes and kubectl top pods. If they return metrics, the server is working. Also check kubectl get apiservices | grep metrics and kubectl -n kube-system get pods -l k8s-app=metrics-server.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

18. What does a Kubernetes liveness probe determine?

Show answer

Whether the container is still alive. If the liveness probe fails, Kubernetes kills and restarts the container.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

19. What happens when a readiness probe fails?

Show answer

The pod is removed from Service endpoints so it stops receiving traffic, but it is NOT restarted. Traffic is routed to other healthy pods.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

20. What are the four probe mechanisms Kubernetes supports?

Show answer

httpGet (HTTP GET returning 2xx/3xx), tcpSocket (TCP port is open), exec (command exits 0), and grpc (gRPC health check, Kubernetes 1.24+).

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

21. What is the purpose of a startup probe?

Show answer

It tells Kubernetes the container is still booting. While the startup probe is running, liveness and readiness probes are disabled. Once it succeeds, it never runs again and the other probes take over.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

🟡 Medium (51)¶

1. Would you use Helm, Go or something else for creating an Operator?

Show answer

Depends on the scope and maturity of the Operator. If it mainly covers installation and upgrades, Helm might be enough. If you want to go for Lifecycle management, insights and auto-pilot, this is where you'd probably use Go.

Remember: Helm = K8s package manager. Chart=package, Release=instance, Repo=store.

Example: `helm install my-app bitnami/nginx --set service.type=LoadBalancer`

2. Why there is no such command in Kubernetes? kubectl get containers

Show answer

Because a container is not a Kubernetes object. The smallest object unit in Kubernetes is a Pod. In a single Pod you can find one or more containers.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

3. What components the Operator Framework consists of?

Show answer

1. Operator SDK - allows developers to build operators
2. Operator Lifecycle Manager - helps to install, update and generally manage the lifecycle of all operators
3. Operator Metering - Enables usage reporting for operators that provide specialized services
4.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

4. It is said that Helm is also Templating Engine. What does it mean?

Show answer

It is useful for scenarios where you have multiple applications and all are similar, so there are minor differences in their configuration files and most values are the same. With Helm you can define a common blueprint for all of them and the values that are not fixed and change can be placeholders. This is called a template file and it looks similar to the following

```\napiVersion: v1\nkind: Pod\nmetadata:\n name: {[ .Values.name ]}\nspec:\n containers:\n - name: {{ .Values.container.name }}\n image: {{ .Values.container.image }}\n port: {{ .Values.container.port }}\n```

The values themselves will in separate file:

```\nname: some-app\ncontainer:\n name: some-app-container\n image: some-app-image\n port: 1991\n```

5. What components the Operator consists of?

Show answer

1. CRD (Custom Resource Definition) - You are fanmiliar with Kubernetes resources like Deployment, Pod, Service, etc. CRD is also a resource, but one that you or the developer the operator defines.
2. Controller - Custom control loop which runs against the CRD

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

6. How Helm supports release management?

Show answer

Helm allows you to upgrade, remove and rollback to previous versions of charts. In version 2 of Helm it was with what is known as "Tiller". In version 3, it was removed due to security concerns.

Remember: Helm = K8s package manager. Chart=package, Release=instance, Repo=store.

Example: `helm install my-app bitnami/nginx --set service.type=LoadBalancer`

7. How does a Kubernetes Operator work using the control loop pattern?

Show answer

It uses the control loop used by Kubernetes in general. It watches for changes in the application state. The difference is that is uses a custom control loop.

In addition, it also makes use of CRD's (Custom Resources Definitions) so basically it extends Kubernetes API.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

8. Run a command to view all nodes of the cluster

Show answer

`kubectl get nodes`

Note: You might want to create an alias (`alias k=kubectl`) and get used to `k get no`

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

9. How do you list deployed releases?

Show answer

`helm ls` or `helm list` shows all deployed Helm releases in the current namespace. Add `--all-namespaces` or `-A` to see releases across all namespaces. The output shows NAME, NAMESPACE, REVISION, STATUS, CHART, and APP VERSION. Use `--filter ` to search for specific releases.

10. How to display the resources usages of pods?

Show answer

`kubectl top pod` shows real-time CPU and memory usage per pod. Requires metrics-server to be installed in the cluster. Add `--containers` to see per-container metrics within pods. Use `--sort-by=cpu` or `--sort-by=memory` to find resource-hungry pods quickly.
Gotcha: if metrics-server is not installed, this command returns an error.

11. Explain the role of kubeconfig in connecting to a Kubernetes cluster.

Show answer

Role of kubeconfig:
* kubeconfig is a file that stores cluster information, user credentials, and context.
* It is used by the kubectl command-line tool to interact with a Kubernetes cluster.
* kubeconfig allows users to switch between different clusters and contexts.

Remember: Default: `~/.kube/config`. Override: KUBECONFIG env var or --kubeconfig flag.

Gotcha: Merge: `KUBECONFIG=f1:f2 kubectl config view --merge --flatten > merged`

12. Why do we need Operators?

Show answer

The process of managing stateful applications in Kubernetes isn't as straightforward as managing stateless applications where reaching the desired status and upgrades are both handled the same way for every replica. In stateful applications, upgrading each replica might require different handling due to the stateful nature of the app, each replica might be in a different status. As a result, we often need a human operator to manage stateful applications. Kubernetes Operator is suppose to assist with this.

This also help with automating a standard process on multiple Kubernetes clusters

13. Explain the need for Kustomize by describing actual use cases

Show answer

* You have an helm chart of an application used by multiple teams in your organization and there is a requirement to add annotation to the app specifying the name of the of team owning the app
* Without Kustomize you would need to copy the files (chart template in this case) and modify it to include the specific annotations we need
* With Kustomize you don't need to copy the entire repo or files
* You are asked to apply a change/patch to some app without modifying the original files of the app
* With Kustomize you can define kustomization.yml file that defines these customizations so you don't need to touch the original app files

14. What is kubconfig? What do you use it for?

Show answer

A kubeconfig file is a file used to configure access to Kubernetes when used in conjunction with the kubectl commandline tool (or other clients).
Use kubeconfig files to organize information about clusters, users, namespaces, and authentication mechanisms.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

15. What monitoring solutions are you familiar with in regards to Kubernetes?

Show answer

There are many types of monitoring solutions for Kubernetes. Some open-source, some are in-memory, some of them cost money, ... here is a short list:

* metrics-server: in-memory open source monitoring
* datadog: $$$
* prometheus: open source monitoring solution

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

16. Why do we need Helm? What would be the use case for using it?

Show answer

Sometimes when you would like to deploy a certain application to your cluster, you need to create multiple YAML files/components like: Secret, Service, ConfigMap, etc. This can be tedious task. So it would make sense to ease the process by introducing something that will allow us to share these bundle of YAMLs every time we would like to add an application to our cluster. This something is called Helm.

A common scenario is having multiple Kubernetes clusters (prod, dev, staging). Instead of individually applying different YAMLs in each cluster, it makes more sense to create one Chart and install it in every cluster.

Another scenario is, you would like to share what you've created with the community. For people and companies to easily deploy your application in their cluster.

17. Create a list of all nodes in JSON format and store it in a file called "some_nodes.json"

Show answer

`kubectl get nodes -o json > some_nodes.json` exports all node objects in JSON format. Use `-o jsonpath='{.items[*].metadata.name}'` for just the names. Other output formats: `-o yaml`, `-o wide` (extra columns), `-o name` (just resource names). Pipe to `jq` for filtering: `kubectl get nodes -o json | jq '.items[].status.conditions'`.

18. After creating a service, how to check it was created?

Show answer

`kubectl get svc` lists all services in the current namespace showing NAME, TYPE, CLUSTER-IP, EXTERNAL-IP, and PORT(S). Add `-o wide` for selector details or `--all-namespaces` for cluster-wide view. Use `kubectl describe svc ` for endpoint details and to verify pods are being targeted correctly.

19. Is it possible to override values in values.yaml file when installing a chart?

Show answer

Yes. You can pass another values file:
`helm install --values=override-values.yaml [CHART_NAME]`

Or directly on the command line: `helm install --set some_key=some_value`

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

20. Describe in detail what is the Operator Lifecycle Manager

Show answer

It's part of the Operator Framework, used for managing the lifecycle of operators. It basically extends Kubernetes so a user can use a declarative way to manage operators (installation, upgrade, ...).

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

21. Explain the purpose and usage of Operators in the Kubernetes ecosystem.

Show answer

* Operators in Kubernetes: Operators are custom controllers that extend Kubernetes functionality.
* They automate complex operational tasks for managing applications.
* Operators use custom resources to define application-specific behaviors.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

22. Discuss the use of Helm charts for application packaging in Kubernetes.

Show answer

Helm Charts: Helm Charts are packages of pre-configured Kubernetes resources.
They include templates for deployments, services, ConfigMaps, and other resources needed for an application.
Helm Charts simplify the deployment and versioning of Kubernetes applications.
Helm Charts encapsulate the complexity of deploying applications in Kubernetes, making it easy to share and reproduce application deployments. They provide a standardized and reusable way to package, deploy, and manage applications across different environments.

Remember: Helm = K8s package manager. Chart=package, Release=instance, Repo=store.

Example: `helm install my-app bitnami/nginx --set service.type=LoadBalancer`

23. Check if there are any limits on one of the pods in your cluster

Show answer

`kubectl describe pod | grep -i limits` shows resource limits (CPU, memory) set on the pod's containers. You can also use `kubectl get pod -o jsonpath='{.spec.containers[*].resources}'` for structured output.
Gotcha: pods without limits can consume unbounded resources and affect other workloads on the same node.

24. What the following output of kubectl get rs means?

Show answer

The replicaset `web` has 2 replicas. It seems that the containers inside the Pod(s) are not yet running since the value of READY is 0. It might be normal since it takes time for some containers to start running and it might be due to an error. Running `kubectl describe po POD_NAME` or `kubectl logs POD_NAME` can give us more information.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

25. Describe in high-level how Kustomize works

Show answer

1. You add kustomization.yml file in the folder of the app you would like to customize.
1. You define the customizations you would like to perform
2. You run `kustomize build APP_PATH` where your kustomization.yml also resides

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

26. Explain "Helm Charts"

Show answer

Helm Charts is a bundle of YAML files. A bundle that you can consume from repositories or create your own and publish it to the repositories.

Remember: Helm = K8s package manager. Chart=package, Release=instance, Repo=store.

Example: `helm install my-app bitnami/nginx --set service.type=LoadBalancer`

27. Perhaps a general question but, you suspect one of the pods is having issues, you don't know what exactly. What do you do?

Show answer

Start by inspecting the pods status. we can use the command `kubectl get pods` (--all-namespaces for pods in system namespace)

If we see "Error" status, we can keep debugging by running the command `kubectl describe pod [name]`. In case we still don't see anything useful we can try stern for log tailing.

In case we find out there was a temporary issue with the pod or the system, we can try restarting the pod with the following `kubectl scale deployment [name] --replicas=0`

Setting the replicas to 0 will shut down the process. Now start it with `kubectl scale deployment [name] --replicas=1`

28. How to execute the command "ls" in an existing pod?

Show answer

`kubectl exec some-pod -it -- ls` runs the `ls` command inside a running pod's container. For a shell: `kubectl exec -it pod-name -- /bin/sh`. In multi-container pods, specify the container: `kubectl exec -it pod-name --container=sidecar -- /bin/sh`.
Gotcha: exec requires the container to have the binary installed — distroless images may lack common tools.

29. What openshift-operator-lifecycle-manager namespace includes?

Show answer

It includes:

* catalog-operator - Resolving and installing ClusterServiceVersions the resource they specify.
* olm-operator - Deploys applications defined by ClusterServiceVersion resource

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

30. How to view revision history for a certain release?

Show answer

`helm history RELEASE_NAME` shows the revision history including REVISION number, STATUS, CHART version, and DESCRIPTION. This lets you see what changed between deployments and identify which revision to rollback to.
Example: `helm history my-app` then `helm rollback my-app 3` to revert to revision 3.

31. Explain how you would monitor and scale a critical production application in Kubernetes.

Show answer

**Monitoring:* •
• Use monitoring tools like Prometheus, Grafana, or Kubernetes-native solutions.
• Set up alerts based on key metrics, including resource utilization and application health.
• Monitor pod and node status, and track events.
**Scaling:* •
• Utilize Horizontal Pod Autoscaling (HPA) based on metrics like CPU or custom metrics.
• Consider Vertical Pod Autoscaling for adjusting resource limits dynamically.
• Implement Cluster Autoscaler for scaling the node pool based on demand.
• Effective monitoring involves selecting appropriate tools and setting up alerts.

32. What the following command does?

Show answer

It exposes a ReplicaSet by creating a service called 'replicaset-svc'. The exposed port is 2017 (this is the port used by the application) and the service type is NodePort which means it will be reachable externally.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

33. How to upgrade a release?

Show answer

`helm upgrade RELEASE_NAME CHART_NAME` updates a deployed release with new chart values or a new chart version. Add `--set key=value` for inline overrides or `-f values.yaml` for file-based config. Use `--dry-run` to preview changes before applying.
Gotcha: always pin chart versions in production to avoid unexpected upgrades.

34. What is Helm, and how is it used in Kubernetes?

Show answer

* Helm: Helm is a package manager for Kubernetes applications.
* It simplifies the deployment and management of Kubernetes applications by packaging them into charts.
* Charts are pre-configured Kubernetes resource definitions that can be easily deployed and versioned.
* Helm allows users to define, install, and upgrade even the most complex Kubernetes applications with a single command.
* Charts encapsulate all the required Kubernetes resources and configurations, making it easier to share and reproduce application deployments across different environments.

Remember: Helm = K8s package manager. Chart=package, Release=instance, Repo=store.

Example: `helm install my-app bitnami/nginx --set service.type=LoadBalancer`

35. Explain the Helm Chart Directory Structure

Show answer

someChart/ -> the name of the chart
Chart.yaml -> meta information on the chart
values.yaml -> values for template files
charts/ -> chart dependencies
templates/ -> templates files :)

Remember: Helm = K8s package manager. Chart=package, Release=instance, Repo=store.

Example: `helm install my-app bitnami/nginx --set service.type=LoadBalancer`

36. Are there any tools, projects you are using for building Operators?

Show answer

This one is based more on a personal experience and taste...

* Operator Framework
* Kubebuilder
* Controller Runtime
...

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

37. What does the "ErrImagePull" status of a Pod means?

Show answer

It wasn't able to pull the image specified for running the container(s). This can happen if the client didn't authenticated for example.
More details can be obtained with `kubectl describe po `.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

38. How do you search for charts?

Show answer

`helm search hub ` searches Artifact Hub for charts across all repositories. `helm search repo ` searches only locally-added repositories.
Example: `helm search hub prometheus` finds Prometheus charts from multiple publishers. Add `--max-col-width 80` for readable descriptions and `--version ` to filter by chart version.

39. How do you debug a failing pod?

Show answer

Check the events, logs, container status, resource limits, readiness/liveness probes, and images. If needed: kubectl describe, kubectl logs, and verify networking, secrets, configmaps, and node health.

Remember: Debug flow: Get→Describe→Logs→Exec. Mnemonic: "GDLE."

40. Why does a deployment get stuck in 'Progressing' status?

Show answer

A deployment gets stuck when new pods fail to become Ready within the progressDeadlineSeconds (default 600s). Common causes: readiness probe failure, ImagePullBackOff, OOMKilled on startup, or missing dependencies. Check: kubectl rollout status, describe pod events.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

41. How do you right-size memory limits for a Kubernetes deployment?

Show answer

1) Set requests based on steady-state usage (observe via kubectl top or Prometheus); 2) Set limits 1.5-2x requests to allow for spikes; 3) Monitor with container_memory_working_set_bytes metric; 4) Set alerts at 80% of limit. Never set limits = requests for variable workloads.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

42. What's the difference between ImagePullBackOff and ErrImagePull?

Show answer

ErrImagePull is the first failure to pull an image. ImagePullBackOff means Kubernetes tried, failed, and is now waiting with exponential backoff before retrying. Common causes: wrong tag, private registry without credentials, or image not imported into k3s.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

43. What formula does the HPA use to compute the desired replica count?

Show answer

desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue)). For example, if current CPU is 90% and target is 70%, the scale factor is 90/70 = 1.28, so replicas increase by roughly 28%.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

44. What are the four metric types supported by HPA v2 and when would you use each?

Show answer

Resource (CPU/memory from metrics-server), Pods (per-pod app metrics like RPS via custom metrics adapter), External (cloud service metrics like SQS queue depth via external adapter), and Object (metrics from a specific Kubernetes object like Ingress RPS via custom adapter).

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

45. Why should VPA and HPA not target the same metric?

Show answer

They will conflict. HPA adjusts replica count based on per-pod utilization, while VPA adjusts resource requests on individual pods. If both act on CPU, HPA might scale out while VPA simultaneously changes the request denominator, causing an unstable feedback loop.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

46. Why is memory generally a poor primary metric for HPA scaling?

Show answer

Many applications (JVM, Python) allocate memory and never release it even after load drops. Memory utilization stays high regardless of current demand, so HPA never scales down. CPU is preferred because it correlates better with active request load.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

47. When HPA is configured with multiple metrics, how does it decide the replica count?

Show answer

It evaluates each metric independently and takes the maximum desired replica count across all metrics. The most demanding metric wins. This means combining metrics with very different response characteristics can lead to unexpected scaling behavior.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

48. Why is checking database connectivity in a liveness probe dangerous?

Show answer

If the database goes down, all pods fail liveness simultaneously, Kubernetes restarts them all, they thundering-herd the database on reconnection, and the cycle repeats — a cascading restart storm.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

49. What does failureThreshold control, and how does it interact with periodSeconds?

Show answer

failureThreshold is the number of consecutive probe failures before Kubernetes takes action. Combined with periodSeconds, it sets the detection window: e.g., failureThreshold=3 and periodSeconds=10 means 30 seconds of failures before restart or endpoint removal.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

50. How do readiness probes interact with rolling deployments?

Show answer

New pods must pass their readiness probe before receiving traffic and before old pods are terminated. If new pods never become ready (e.g., broken config), the rollout stalls and old pods continue serving — preventing a bad deploy from causing downtime.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

51. Before startup probes existed, how did operators handle slow-starting containers, and why was that approach fragile?

Show answer

They set a large initialDelaySeconds on the liveness probe. This was fragile because if the app started faster, detection of a truly dead container was delayed; if it started slower, the liveness probe would kill it during boot.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

🔴 Hard (23)¶

1. Do you have experience with deploying a Kubernetes cluster? If so, can you describe the process in high-level?

Show answer

1. Create multiple instances you will use as Kubernetes nodes/workers. Create also an instance to act as the Master. The instances can be provisioned in a cloud or they can be virtual machines on bare metal hosts.
2. Provision a certificate authority that will be used to generate TLS certificates for the different components of a Kubernetes cluster (kubelet, etcd, ...)
1. Generate a certificate and private key for the different components
3. Generate kubeconfigs so the different clients of Kubernetes can locate the API servers and authenticate.
4. Generate encryption key that will be used for encrypting the cluster data
5. Create an etcd cluster

2. After running kubectl run database --image mongo you see the status is "CrashLoopBackOff". What could possibly went wrong and what do you do to confirm?

Show answer

CrashLoopBackOff means the Pod is starting, crashing, starting...and so it repeats itself.
There are many different reasons to get this error - lack of permissions, init-container misconfiguration, persistent volume connection issue, etc.

One of the ways to check why it happened is to run `kubectl describe po ` and having a look at the exit code

```\n Last State: Terminated\n Reason: Error\n Exit Code: 100\n```

Another way to check what's going on, is to run `kubectl logs `. This will provide us with the logs from the containers running in that Pod.

3. How would you approach version upgrades of Kubernetes in a production environment?

Show answer

**Version Upgrades in Production:* •
• Conduct thorough testing in a staging environment before production.
• Follow Kubernetes documentation and release notes for upgrade procedures.
• Use tools like kubeadm for streamlined upgrade processes.
• Ensure backups and have a rollback plan in case of issues. projects/knowledge/interview/kubernetes/366-how-would-you-approach-version-upgrades-of-kuberne.txt

Remember: Version skew: kubelet ≤1 minor behind API server. Upgrade control plane first.

4. You are managing multiple Kubernetes clusters. How do you quickly change between the clusters using kubectl?

Show answer

`kubectl config use-context ` switches between clusters by changing the active context in your kubeconfig. List available contexts with `kubectl config get-contexts`. Each context combines a cluster, user, and namespace. Consider using `kubectx` for faster switching in multi-cluster environments.

5. You try to run a Pod but it's in "Pending" state. What might be the reason?

Show answer

One possible reason is that the scheduler which supposed to schedule Pods on nodes, is not running. To verify it, you can run `kubectl get po -A | grep scheduler` or check directly in `kube-system` namespace.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

6. Describe how the monitoring solution you are working with monitors Kubernetes

Show answer

Common Kubernetes monitoring solutions:

- **Prometheus + Grafana**: Most popular open-source stack. Prometheus scrapes metrics via ServiceMonitors; Grafana provides dashboards. Alertmanager handles alerts
- **metrics-server**: Lightweight, in-memory. Powers `kubectl top` but no persistence or alerting
- **Datadog/New Relic/Dynatrace**: Commercial SaaS platforms with auto-discovery, APM, and built-in dashboards
- **ELK/Loki**: Log aggregation (Elasticsearch or Loki + Grafana for unified metrics/logs)

Key things to monitor: node resources, pod status, API server latency, etcd health, PV usage, network policies.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

7. Explain how you would manage configuration drift in a Kubernetes environment.

Show answer

Managing Configuration Drift:
* Regularly audit configurations using tools like kube-score.
* Use version control for configuration files to track changes.
* Implement GitOps practices for declarative cluster configuration.
* Leverage Helm or Kustomize for consistent application configuration.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

8. Describe the steps involved in troubleshooting a pod that is not starting in Kubernetes.

Show answer

**Troubleshooting Steps:**
• Check Pod Status: Use kubectl get pods to check the pod's current status.
• Pod Events: Use kubectl describe pod to view events and identify issues.
• Logs: Examine container logs using kubectl logs .
• Resource Constraints: Verify resource requests and limits in the pod spec.
• Pod Configuration: Review the pod's configuration, including ConfigMaps and Secrets.
• Network Issues: Check network policies and connectivity.
• Image Availability: Ensure the container image is accessible and correct.
• Health Probes: Inspect readiness

9. How do you debug a crash-looping pod?

Show answer

Systematic approach:

**1. Logs first**:
```bash\nkubectl logs pod-name\nkubectl logs pod-name --previous # Previous crash\n```

**2. Events/describe**:
```bash\nkubectl describe pod pod-name\n```
Look for: OOMKilled, image pull errors, failed mounts, probe failures.

**3. Container command/env**:
* Wrong entrypoint?
* Missing environment variables?
* Bad config mounted?

**4. Resource limits**:
* Memory limit too low → OOMKilled
* CPU throttling causing timeouts

**5. Debug container** (if logs don't help):
```bash\nkubectl debug pod-name -it --image=busybox\n```

Most crashes are: missing config, wrong image tag, resource limits, or dependency not ready.

Remember: Debug flow: Get→Describe→Logs→Exec. Mnemonic: "GDLE."

10. Kubernetes cluster randomly drops traffic under load. Nodes look healthy. What's the root cause?

Show answer

Most likely conntrack table exhaustion from NAT/service explosion.

The problem:
- Every Service + Pod combination creates conntrack entries
- kube-proxy uses iptables NAT for service routing
- Each connection = conntrack entry
- Default nf_conntrack_max often too low (65536)

Why traffic drops silently:
- New connections can't create conntrack entries
- Packets dropped in kernel, no error to application
- No log by default (must enable)
- Appears as random timeouts

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

11. Why do container logs sometimes bring down Kubernetes nodes?

Show answer

Uncontrolled log growth exhausts disk space or inodes, causing kubelet failure.

The cascade:

1. Application logs to stdout/stderr
2. Container runtime captures to JSON files
3. Files grow without bound (misconfigured rotation)
4. Disk fills OR inodes exhausted
5. kubelet can't function (needs disk for pods)
6. Node goes NotReady
7. Pods rescheduled, bring their logs elsewhere
8. Repeat on next node

Specific failure modes:

Example: `kubectl logs -f pod --previous` — follow live or show crashed container logs.

Remember: Multi-container: `--container=name`. Errors without it on multi-container pods.

12. An engineer in your team runs a Pod but the status he sees is "CrashLoopBackOff". What does it means? How to identify the issue?

Show answer

The container failed to run (due to different reasons) and Kubernetes tries to run the Pod again after some delay (= BackOff time).

Some reasons for it to fail:
- Misconfiguration - misspelling, non supported value, etc.
- Resource not available - nodes are down, PV not mounted, etc.

Some ways to debug:

1. `kubectl describe pod POD_NAME`
1. Focus on `State` (which should be Waiting, CrashLoopBackOff) and `Last State` which should tell what happened before (as in why it failed)
2. Run `kubectl logs mypod`
1. This should provide an accurate output of
2. For specific container, you can add `-c CONTAINER_NAME`

13. You encounter a performance issue in a Kubernetes cluster. How do you diagnose and resolve it?

Show answer

**Diagnosing and Resolving Performance Issues:* •
• Resource Utilization: Check CPU, memory, and storage usage for nodes and pods.
• Logs and Events: Analyze container logs and Kubernetes events for anomalies.
• Network: Examine network policies, traffic, and potential bottlenecks.
• Pod Placement: Review node placement and resource allocation for pods.
• Kubernetes Components: Inspect the health and performance of Kubernetes control plane components.
• Application Code: Review application code for performance bottlenecks.
• Scaling: Consider scaling resources based on demand.

14. Users unable to reach an application running on a Pod on Kubernetes. What might be the issue and how to check?

Show answer

Troubleshoot Kubernetes application connectivity layer by layer:

1. **Pod status**: `kubectl get pods` / `kubectl describe pod` — check for CrashLoopBackOff, Pending, ImagePullBackOff
2. **Pod health**: `kubectl exec -it pod -- curl localhost:PORT` — verify app responds inside container
3. **Service & endpoints**: `kubectl get svc` / `kubectl get endpoints` — confirm selector matches pod labels
4. **Network policies**: Check if ingress/egress rules are blocking traffic
5. **DNS**: `kubectl exec -- nslookup service-name` — verify CoreDNS resolution
6. **Ingress/LB**: Check ingress rules, TLS config, and controller logs
7. **External access**: Verify NodePort, LoadBalancer provisioning, or DNS pointing to cluster

15. What happens when a readiness probe fails on a Kubernetes pod?

Show answer

The pod is removed from service endpoints, so it stops receiving traffic. Unlike a liveness probe failure (which restarts the container), a readiness failure just takes the pod out of the load balancer. The pod keeps running. See: training/interactive/runtime-labs/lab-runtime-01-rollout-probe-failure/

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

16. What are the top 3 causes of CrashLoopBackOff?

Show answer

1) Application error on startup (bad config, missing env var, import error); 2) OOMKilled (memory limit too low); 3) Liveness probe failing too aggressively (app healthy but probe times out). Always check 'kubectl logs --previous' to see the crash reason.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

17. Explain the selectPolicy field in HPA behavior and how Max vs Min affect scaling aggressiveness.

Show answer

selectPolicy determines which policy to apply when multiple policies are defined. Max picks whichever policy allows the largest change (most aggressive scaling). Min picks the smallest change (most conservative). Disabled prevents scaling in that direction entirely. For example, with both a Percent(100%) and Pods(5) policy on scaleUp with selectPolicy: Max, the HPA uses whichever allows adding more pods.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

18. How can PodDisruptionBudget conflict with HPA scale-down, and what is the best practice to avoid it?

Show answer

PDB enforces minAvailable during voluntary disruptions. HPA sets the desired replica count, but if scaling down would violate PDB constraints during node drains or spot terminations, evictions are blocked. The HPA controller itself does not check PDB. Best practice: set HPA minReplicas to at least what PDB requires as minimum available.

Example: `kubectl scale deployment web --replicas=5`. HPA for automatic scaling.

19. Why can HPA not scale to zero, and what are the alternatives?

Show answer

HPA requires minReplicas >= 1. It cannot scale to zero because with zero pods there are no metrics to evaluate for scale-up decisions. For scale-to-zero capability (cost savings on idle workloads), use KEDA (Kubernetes Event-Driven Autoscaling), Knative Serving, or a custom controller that can scale from zero based on external signals like queue depth or incoming HTTP requests.

Example: `kubectl scale deployment web --replicas=5`. HPA for automatic scaling.

20. How should liveness and readiness probe endpoints differ in what they check, and why?

Show answer

Liveness should only verify the process is alive and responsive (no dependency checks) — it answers "should I restart?" Readiness should check dependencies, cache warmth, and load — it answers "can I serve traffic?" Using the same endpoint for both causes dependency failures to trigger unnecessary restarts.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

21. A pod shows RESTARTS=5 and readiness probe failures in kubectl describe. Walk through how you would debug this.

Show answer

1) kubectl describe pod to read probe failure events and identify which probe is failing. 2) Check lastState for exit codes (e.g., 137=OOMKilled). 3) kubectl exec into the pod and curl the probe endpoint manually to see the actual response. 4) kubectl logs --previous to read logs from the crashed container. 5) Verify probe port matches the container port and the endpoint returns the expected status code.

Remember: Debug flow: Get→Describe→Logs→Exec. Mnemonic: "GDLE."

22. Why can JVM garbage collection cause liveness probe failures, and how do you mitigate it?

Show answer

Full GC pauses can stop the JVM for several seconds. If timeoutSeconds is set too low (e.g., 1s) and a GC pause exceeds that, the probe times out and counts as a failure. Mitigation: increase timeoutSeconds to exceed worst-case GC pause duration, use a startup probe for slow JVM boot, and tune GC to reduce pause times.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.

23. What is the constraint on successThreshold for liveness and startup probes, and why does it matter for readiness?

Show answer

For liveness and startup probes, successThreshold must be 1 (Kubernetes ignores other values). Only readiness probes can require multiple consecutive successes before the pod is added back to endpoints. This matters because you may want a readiness probe to confirm stability (e.g., successThreshold=2) before resuming traffic after a transient failure.

Remember: `kubectl explain ` for fields. `kubectl api-resources` for all resource types.