Skip to content

Portal | Level: L2: Operations | Topics: Argo Workflows | Domain: Kubernetes

Argo Workflows — Primer

Why This Matters

Kubernetes CronJobs are fine for simple periodic tasks. But when you need to run a sequence of steps, fan out to parallel jobs, pass data between steps, retry specific failures, or build a machine learning pipeline, CronJobs fall apart. You're left stitching together Jobs with shell scripts and hoping the intermediate state doesn't get lost.

Who made it: Argo Workflows was created at Applatix (later acquired by Intuit) in 2017. It became a CNCF incubating project in 2020 and graduated in 2022. The name "Argo" references the ship from Greek mythology that carried Jason and the Argonauts — fitting for a tool that orchestrates journeys through complex pipelines.

Argo Workflows is a Kubernetes-native workflow engine that turns DAGs and pipelines into first-class Kubernetes resources. Every step runs as a Pod. Progress is visible in a UI. Artifacts flow between steps via S3 or GCS. Failures are retried with backoff. The entire workflow is auditable in etcd.

For platform engineers, Argo Workflows replaces Jenkins pipelines, Airflow DAGs, and ad-hoc shell scripts for batch compute, ML training pipelines, data processing, and CI/CD tasks that don't fit neatly into a linear pipeline. Understanding it means you can design workflows that are parallelizable, resumable, and observable.

Core Concepts

1. Installation

kubectl create namespace argo
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/latest/download/install.yaml

# Wait for rollout
kubectl -n argo rollout status deploy/workflow-controller
kubectl -n argo rollout status deploy/argo-server

# Install CLI
curl -sLO https://github.com/argoproj/argo-workflows/releases/latest/download/argo-linux-amd64.gz
gunzip argo-linux-amd64.gz && chmod +x argo-linux-amd64
mv argo-linux-amd64 /usr/local/bin/argo

# Port-forward UI
kubectl -n argo port-forward svc/argo-server 2746:2746
# Open: https://localhost:2746

2. The Workflow Resource

A Workflow is a single run of a pipeline. It contains an entrypoint (the first template to run) and a set of templates (reusable step definitions).

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
  namespace: argo
spec:
  entrypoint: greet
  templates:
    - name: greet
      container:
        image: alpine:3.18
        command: [echo]
        args: ["Hello, Argo Workflows!"]
        resources:
          requests:
            memory: 64Mi
            cpu: 100m
argo submit workflow.yaml -n argo --watch
argo list -n argo
argo get @latest -n argo
argo logs @latest -n argo

3. Template Types

Container Template

Runs a single container (most common):

- name: build-image
  container:
    image: docker:24-dind
    command: [docker, build]
    args: ["-t", "myapp:{{workflow.parameters.tag}}", "."]
    volumeMounts:
      - name: docker-sock
        mountPath: /var/run/docker.sock

Script Template

Inline script (avoids building a custom image for simple logic):

- name: generate-report
  script:
    image: python:3.11-slim
    command: [python]
    source: |
      import json, datetime
      report = {
          "timestamp": datetime.datetime.now(datetime.UTC).isoformat(),
          "status": "ok",
          "records_processed": 42
      }
      print(json.dumps(report))

Resource Template

Create, patch, or delete Kubernetes resources:

- name: create-configmap
  resource:
    action: create
    manifest: |
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: pipeline-output
        namespace: argo
      data:
        result: "{{inputs.parameters.result}}"
- name: wait-for-job
  resource:
    action: get
    successCondition: status.succeeded > 0
    failureCondition: status.failed > 3
    manifest: |
      apiVersion: batch/v1
      kind: Job
      metadata:
        name: external-job
        namespace: argo

Suspend Template

Pause the workflow until manually resumed:

- name: wait-for-approval
  suspend:
    duration: "1h"    # auto-resume after 1h if not manually resumed earlier

# Resume a suspended workflow
argo resume my-workflow-xxxxx -n argo
# Or resume a specific node
argo resume my-workflow-xxxxx -n argo --node-field-selector=displayName=wait-for-approval

4. Steps vs DAG

Steps define a linear sequence with optional parallelism at each step level:

- name: pipeline
  steps:
    - - name: fetch-data          # Step 1: sequential
        template: fetch
    - - name: transform-a         # Step 2: parallel (same dash level)
        template: transform
        arguments:
          parameters:
            - name: shard
              value: "a"
      - name: transform-b
        template: transform
        arguments:
          parameters:
            - name: shard
              value: "b"
    - - name: load                # Step 3: after both transforms complete
        template: load

DAG defines tasks with explicit dependencies — more flexible for complex graphs:

- name: ml-pipeline
  dag:
    tasks:
      - name: preprocess
        template: preprocess-data
      - name: train-model
        dependencies: [preprocess]
        template: train
      - name: evaluate
        dependencies: [train-model]
        template: evaluate
      - name: deploy-if-good
        dependencies: [evaluate]
        template: deploy
        when: "{{tasks.evaluate.outputs.parameters.accuracy}} >= 0.95"
      - name: notify-failure
        dependencies: [evaluate]
        template: notify
        when: "{{tasks.evaluate.outputs.parameters.accuracy}} < 0.95"

Use steps when the pipeline is inherently sequential with parallel bursts. Use DAG when dependencies are complex or when you need conditional branching based on upstream outputs.

Gotcha: In the steps syntax, parallel tasks are defined by putting multiple items at the same dash-dash level (same list item). This is easy to get wrong — a missing - - (double dash at the step level) turns parallel steps into sequential ones. Use DAG when readability matters more than brevity.

5. Artifacts — Passing Data Between Steps

Artifacts let steps exchange files without requiring shared volumes. Argo Workflows supports S3, GCS, HDFS, and HTTP as artifact backends.

Configure the artifact repository:

# In the workflow-controller-configmap
apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
  namespace: argo
data:
  artifactRepository: |
    s3:
      bucket: my-argo-artifacts
      endpoint: s3.amazonaws.com
      insecure: false
      accessKeySecret:
        name: s3-credentials
        key: accessKey
      secretKeySecret:
        name: s3-credentials
        key: secretKey

Using artifacts in a workflow:

- name: generate-dataset
  container:
    image: python:3.11-slim
    command: [python, /scripts/generate.py]
  outputs:
    artifacts:
      - name: dataset
        path: /tmp/dataset.parquet

- name: train-model
  inputs:
    artifacts:
      - name: dataset
        from: "{{steps.generate-dataset.outputs.artifacts.dataset}}"
        path: /data/dataset.parquet
  container:
    image: pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime
    command: [python, /scripts/train.py]
  outputs:
    artifacts:
      - name: model
        path: /tmp/model.pt
    parameters:
      - name: accuracy
        valueFrom:
          path: /tmp/accuracy.txt

6. Parameters — Inputs, Outputs, and Expressions

Workflow-level parameters (passed at submit time):

spec:
  arguments:
    parameters:
      - name: image-tag
        value: latest      # default, overridden at submit
      - name: environment
        value: staging
  entrypoint: deploy-pipeline

argo submit workflow.yaml \
  -p image-tag=v1.2.3 \
  -p environment=production \
  -n argo

Passing outputs as inputs between steps:

steps:
  - - name: get-version
      template: detect-version
  - - name: build
      template: build-image
      arguments:
        parameters:
          - name: tag
            value: "{{steps.get-version.outputs.parameters.version}}"

Output parameters from a file:

- name: detect-version
  script:
    image: alpine:3.18
    command: [sh]
    source: |
      git describe --tags --abbrev=0 > /tmp/version.txt
      cat /tmp/version.txt
  outputs:
    parameters:
      - name: version
        valueFrom:
          path: /tmp/version.txt

7. Retry Strategies

- name: flaky-api-call
  retryStrategy:
    limit: "5"
    retryPolicy: "Always"      # Always | OnFailure | OnError | OnTransientError
    backoff:
      duration: "10s"
      factor: "2"              # exponential backoff: 10s, 20s, 40s, 80s, 160s
      maxDuration: "3m"
  container:
    image: curlimages/curl
    command: [curl, -sf, "https://api.example.com/data"]

Retry policies: - Always: retry on any failure (exit code != 0 or pod failure) - OnFailure: retry on non-zero exit code only - OnError: retry on pod infrastructure failures only (OOM, eviction) - OnTransientError: retry on known transient errors (node pressures, API server blips)

8. Parallelism and Fan-out with withItems / withParam

withItems — static list fan-out:

- name: process-shards
  steps:
    - - name: process
        template: process-shard
        arguments:
          parameters:
            - name: shard
              value: "{{item}}"
        withItems:
          - shard-001
          - shard-002
          - shard-003
          - shard-004

withParam — dynamic fan-out from a prior step's JSON output:

- name: discover-shards
  template: list-shards
# outputs.result: '["shard-001","shard-002","shard-003"]'

- name: process-all
  template: process-shard
  arguments:
    parameters:
      - name: shard
        value: "{{item}}"
  withParam: "{{steps.discover-shards.outputs.result}}"

Control parallelism to avoid overwhelming downstream systems:

spec:
  parallelism: 5      # global: at most 5 pods running at once in this workflow

- name: process-shard
  parallelism: 3      # template-level: at most 3 concurrent instances of this step

9. WorkflowTemplate — Reusable Definitions

WorkflowTemplate is a cluster-scoped (or namespace-scoped) reusable template library. You reference templates from it in ad-hoc Workflows or CronWorkflows.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: build-templates
  namespace: argo
spec:
  templates:
    - name: docker-build
      inputs:
        parameters:
          - name: image
          - name: tag
          - name: dockerfile
            value: Dockerfile
      container:
        image: gcr.io/kaniko-project/executor:latest
        command: [/kaniko/executor]
        args:
          - --dockerfile={{inputs.parameters.dockerfile}}
          - --destination={{inputs.parameters.image}}:{{inputs.parameters.tag}}
          - --cache=true
          - --cache-repo={{inputs.parameters.image}}-cache

Reference from a Workflow:

spec:
  entrypoint: ci-pipeline
  templates:
    - name: ci-pipeline
      steps:
        - - name: build
            templateRef:
              name: build-templates
              template: docker-build
            arguments:
              parameters:
                - name: image
                  value: ghcr.io/myorg/myapp
                - name: tag
                  value: "{{workflow.parameters.tag}}"

10. CronWorkflow

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: nightly-etl
  namespace: argo
spec:
  schedule: "0 2 * * *"    # 2am UTC daily
  timezone: "UTC"
  concurrencyPolicy: Forbid  # Allow | Forbid | Replace
  startingDeadlineSeconds: 300
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 5
  workflowSpec:
    entrypoint: etl-pipeline
    arguments:
      parameters:
        - name: date
          value: "{{workflow.creationTimestamp.Y}}-{{workflow.creationTimestamp.m}}-{{workflow.creationTimestamp.d}}"
    templates:
      - name: etl-pipeline
        dag:
          tasks:
            - name: extract
              template: run-extract
            - name: transform
              dependencies: [extract]
              template: run-transform
            - name: load
              dependencies: [transform]
              template: run-load

11. RBAC

Service account for workflows (least-privilege):

apiVersion: v1
kind: ServiceAccount
metadata:
  name: argo-workflow-sa
  namespace: argo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: argo-workflow-role
  namespace: argo
rules:
  - apiGroups: [""]
    resources: [pods]
    verbs: [get, watch, patch]
  - apiGroups: [""]
    resources: [pods/log]
    verbs: [get, watch]
  - apiGroups: [argoproj.io]
    resources: [workflows, workflowtaskresults]
    verbs: [get, watch, patch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: argo-workflow-rb
  namespace: argo
subjects:
  - kind: ServiceAccount
    name: argo-workflow-sa
roleRef:
  kind: Role
  name: argo-workflow-role
  apiGroup: rbac.authorization.k8s.io

In WorkflowSpec:

spec:
  serviceAccountName: argo-workflow-sa

12. Argo Events Integration

Argo Events provides event-driven triggers for Workflows. An EventSource captures events (webhooks, S3 notifications, Kafka messages), and a Sensor maps them to WorkflowTemplate submissions.

apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
  name: github-webhook
  namespace: argo-events
spec:
  webhook:
    push:
      port: "12000"
      endpoint: /push
      method: POST
---
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: github-push-sensor
  namespace: argo-events
spec:
  template:
    serviceAccountName: argo-events-sa
  dependencies:
    - name: push-event
      eventSourceName: github-webhook
      eventName: push
  triggers:
    - template:
        name: trigger-ci
        k8s:
          operation: create
          source:
            resource:
              apiVersion: argoproj.io/v1alpha1
              kind: Workflow
              metadata:
                generateName: ci-triggered-
                namespace: argo
              spec:
                workflowTemplateRef:
                  name: ci-pipeline
                arguments:
                  parameters:
                    - name: branch
                      value: "{{.Input.body.ref}}"

13. When to Use Argo Workflows vs Alternatives

Tool Best for Avoid when
Argo Workflows Complex DAGs, ML pipelines, multi-step batch, fan-out with artifact passing Simple periodic tasks
Kubernetes CronJob Simple, single-step periodic tasks Multi-step, artifact passing, fan-out
Tekton CI/CD pipelines, strong Kubernetes CRD model, Tekton Hub integration Non-CI workflows, small teams
Airflow Python-native DAGs, large data engineering teams, existing Airflow investment Kubernetes-native environments without Python expertise
GitHub Actions / GitLab CI Code-centric CI triggered by Git events Cluster-internal workflows, non-Git triggers

Quick Reference

# Submit
argo submit workflow.yaml -n argo
argo submit workflow.yaml -n argo --wait
argo submit workflow.yaml -n argo -p image-tag=v1.2.3

# Monitor
argo list -n argo
argo get my-workflow-xxxxx -n argo
argo get @latest -n argo --watch
argo logs my-workflow-xxxxx -n argo
argo logs my-workflow-xxxxx -n argo -f    # follow

# Control
argo suspend my-workflow-xxxxx -n argo
argo resume my-workflow-xxxxx -n argo
argo retry my-workflow-xxxxx -n argo      # retry failed workflow
argo retry my-workflow-xxxxx -n argo --restart-successful  # retry all nodes
argo terminate my-workflow-xxxxx -n argo  # stop immediately
argo delete my-workflow-xxxxx -n argo

# CronWorkflow
argo cron list -n argo
argo cron suspend nightly-etl -n argo
argo cron resume nightly-etl -n argo

# WorkflowTemplate
argo template list -n argo
argo template get build-templates -n argo

# Garbage collect old workflows
argo delete --completed -n argo
argo delete --older 7d -n argo

Wiki Navigation

Prerequisites