Skip to content

Portal | Level: L2: Operations | Topics: GraphQL | Domain: DevOps & Tooling

GraphQL - Primer

Why GraphQL Matters

REST APIs have a fundamental mismatch problem: the server decides what data each endpoint returns, but the client decides what it actually needs. This collision produces two chronic symptoms.

Over-fetching: A GET /users/42 endpoint returns a 40-field user object. The mobile app needed three fields. You just wasted bandwidth, serialization CPU, and cache space on 37 unused fields — on every request, from every user.

Under-fetching: The same app needs the user's name, their last five orders, and each order's shipping status. That's three round trips — /users/42, /orders?user=42, and /orders/X/status — before the first screen can render. Each round trip adds latency and a new failure mode.

GraphQL solves both: clients declare exactly what they need in a single request and get exactly that back.

For DevOps engineers, GraphQL appears at multiple layers: as the API protocol for internal microservices, as the query interface for observability platforms (GitHub API v4, Shopify, Stripe), as a schema that schema registries track and lint, and as a gateway layer that tools like Apollo Router, Hasura, or AWS AppSync sit in front of your services.

Understanding GraphQL's operational properties — how queries execute, where performance cliffs hide, what breaks when schemas change — is now a required skill for API gateway work and backend service management.


Schema Definition Language (SDL)

The schema is the contract. Everything GraphQL does flows from it.

Scalar Types

# Built-in scalars
String
Int
Float
Boolean
ID          # serialized as String, semantically an opaque identifier

# Custom scalars (declared in schema, implemented in resolvers)
scalar DateTime
scalar JSON
scalar URL

Object Types

type User {
  id: ID!             # ! means non-nullable
  email: String!
  name: String
  createdAt: DateTime!
  orders: [Order!]!   # non-null list of non-null Orders
}

type Order {
  id: ID!
  status: OrderStatus!
  items: [LineItem!]!
  total: Float!
  user: User!
}

enum OrderStatus {
  PENDING
  PROCESSING
  SHIPPED
  DELIVERED
  CANCELLED
}

The Root Types

type Query {
  user(id: ID!): User
  users(filter: UserFilter, limit: Int = 20, offset: Int = 0): [User!]!
  order(id: ID!): Order
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateOrder(id: ID!, input: UpdateOrderInput!): Order!
  cancelOrder(id: ID!): Order!
}

type Subscription {
  orderStatusChanged(orderId: ID!): Order!
  newOrderCreated: Order!
}

Input Types

Mutations take input types, not inline arguments. This enforces a clean separation between read and write schemas and enables reuse.

input CreateUserInput {
  email: String!
  name: String!
  role: UserRole = USER
}

input UserFilter {
  email: String
  role: UserRole
  createdAfter: DateTime
}

Interfaces and Unions

interface Node {
  id: ID!
}

type Product implements Node {
  id: ID!
  name: String!
  price: Float!
}

union SearchResult = User | Order | Product

type Query {
  node(id: ID!): Node          # fetch any Node by ID
  search(query: String!): [SearchResult!]!
}

Resolvers and the Resolver Chain

A resolver is a function that fulfills one field in the schema. The execution engine walks the query tree and calls a resolver for each field.

Query
 └─ user(id: "42")         ← Query.user resolver called
     ├─ id                 ← User.id resolver (trivial — returns user.id)
     ├─ email              ← User.email resolver
     └─ orders             ← User.orders resolver (DB query or service call)
         └─ [0]
             ├─ id
             └─ status

In JavaScript/TypeScript (Apollo Server pattern):

const resolvers = {
  Query: {
    user: async (parent, { id }, context) => {
      return context.db.users.findById(id);
    },
    users: async (parent, { filter, limit, offset }, context) => {
      return context.db.users.findAll({ filter, limit, offset });
    },
  },

  User: {
    // Default resolver: returns parent.orders if it exists
    // Override to fetch lazily:
    orders: async (parent, args, context) => {
      return context.db.orders.findByUserId(parent.id);
    },
  },

  Mutation: {
    createUser: async (parent, { input }, context) => {
      await context.auth.requireRole('ADMIN');
      return context.db.users.create(input);
    },
  },
};

Resolver arguments: - parent (or root): the resolved value of the parent field - args: arguments from the query for this field - context: shared request context — DB connections, auth token, DataLoader instances - info: query AST and field path — used for advanced optimizations


The N+1 Problem and DataLoader

This is the most important performance concept in GraphQL. Understand it before you run GraphQL in production.

The problem:

query {
  orders(limit: 100) {
    id
    status
    user {       # ← Each order's User requires a separate DB lookup
      name
      email
    }
  }
}

If you resolve User.name naively — one DB query per order — 100 orders generates 101 queries: 1 to fetch orders, then 1 per order to fetch the user. This is N+1.

SELECT * FROM orders LIMIT 100;                    -- 1 query
SELECT * FROM users WHERE id = 'u1';               -- 100 queries
SELECT * FROM users WHERE id = 'u2';
...
SELECT * FROM users WHERE id = 'u100';

DataLoader fixes it by batching and caching:

import DataLoader from 'dataloader';

// Create a loader per request (not global — per-request scoping prevents
// cross-request cache pollution)
const userLoader = new DataLoader(async (userIds) => {
  // Called ONCE per batch, not once per ID
  const users = await db.users.findByIds(userIds);
  // Must return results in the same order as userIds
  return userIds.map(id => users.find(u => u.id === id) ?? null);
});

// In resolver:
User: {
  // This looks up single IDs but DataLoader batches them automatically
  orders: async (parent, args, context) => {
    return context.loaders.user.load(parent.userId);
  },
}

Result:

SELECT * FROM orders LIMIT 100;                    -- 1 query
SELECT * FROM users WHERE id IN ('u1','u2',...);   -- 1 batched query

DataLoader uses microtask batching: it collects all .load() calls within the same tick of the event loop, then fires one batch query.

Rules for DataLoader: - One instance per request, per resource type - Never share DataLoader instances across requests (cache poisoning) - The batch function must return results in the same order as input IDs - Optionally disable caching for mutation contexts


Schema-First vs Code-First Development

Schema-first: You write the SDL, then generate or implement resolvers. The schema is the authoritative contract.

# schema.graphql — written by humans first
type Query {
  product(id: ID!): Product
}

Tools: Apollo Studio, GraphQL Code Generator, graphql-tools.

Pros: Schema is readable, versionable, and shareable as a contract before implementation starts. Forces API design upfront.

Cons: Schema and implementation can drift if codegen is not enforced.

Code-first: You write resolver classes/functions with decorators, and the schema is derived from code.

// TypeGraphQL (TypeScript)
@ObjectType()
class Product {
  @Field(() => ID)
  id: string;

  @Field()
  name: string;
}

@Resolver()
class ProductResolver {
  @Query(() => Product, { nullable: true })
  async product(@Arg('id') id: string): Promise<Product | null> {
    return db.products.findById(id);
  }
}

Tools: TypeGraphQL, Pothos (GraphQL Nexus successor), strawberry (Python).

Pros: Types stay in sync with resolvers by construction. Better IDE support.

Cons: Schema is implicit — harder to share as a contract without running the code.

For DevOps gateway scenarios: Schema-first + schema registry wins. You need the SDL to exist independently of any single implementation, so tooling (linters, breaking change detectors, federation routers) can reason about it.


Pagination

Two strategies. Neither is obviously right — know the tradeoffs.

Offset Pagination

type Query {
  users(limit: Int!, offset: Int!): [User!]!
}
# Page 1
query { users(limit: 20, offset: 0) { id name } }
# Page 2
query { users(limit: 20, offset: 20) { id name } }

Pros: Simple. Works with SQL LIMIT/OFFSET natively. Easy to jump to arbitrary pages.

Cons: Unstable under concurrent writes. If a row is inserted before offset 20 between page 1 and page 2 fetches, you'll see a duplicate or skip an item. Performance degrades at high offsets (database scans all skipped rows).

Cursor Pagination (Relay Connection Spec)

The Relay spec is the de facto standard for GraphQL pagination. It uses opaque cursors (usually base64-encoded row IDs or timestamps) and a consistent connection wrapper.

type UserConnection {
  edges: [UserEdge!]!
  pageInfo: PageInfo!
  totalCount: Int
}

type UserEdge {
  node: User!
  cursor: String!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

type Query {
  users(first: Int, after: String, last: Int, before: String): UserConnection!
}
# First page
query {
  users(first: 20) {
    edges { node { id name } cursor }
    pageInfo { hasNextPage endCursor }
  }
}

# Next page
query {
  users(first: 20, after: "cursor_from_previous_endCursor") {
    edges { node { id name } cursor }
    pageInfo { hasNextPage endCursor }
  }
}

Pros: Stable under writes. Efficient at any depth (no offset scan). Standard shape that clients and tooling understand.

Cons: Verbose. Cannot jump to arbitrary pages. Requires cursor storage or computation.

Production rule: Use cursor pagination for any list that can grow. Use offset only for small, stable, admin-facing lists.


Error Handling Patterns

GraphQL returns HTTP 200 even for application errors. This is by design — the response may be partially successful.

{
  "data": {
    "user": null,
    "orders": [{ "id": "o1" }]
  },
  "errors": [
    {
      "message": "User not found",
      "locations": [{ "line": 2, "column": 3 }],
      "path": ["user"],
      "extensions": {
        "code": "NOT_FOUND",
        "userId": "42"
      }
    }
  ]
}

Error patterns:

  1. Nullable fields for recoverable errors: Make fields that can fail nullable (user: User not user: User!). Resolvers return null and add to the errors array.

  2. Error unions for expected failures (preferred in typed APIs):

union UpdateOrderResult = Order | OrderNotFound | InsufficientPermissions

type OrderNotFound {
  message: String!
  orderId: ID!
}

type InsufficientPermissions {
  message: String!
  requiredRole: String!
}

type Mutation {
  updateOrder(id: ID!, input: UpdateOrderInput!): UpdateOrderResult!
}
  1. Extensions for machine-readable error codes: Always include extensions.code — clients should not parse message strings.

  2. Never expose internal errors: Catch resolver exceptions and translate to client-safe messages. Log full stack traces server-side.


Security: Query Complexity, Depth Limiting, Persisted Queries

A naive GraphQL server is a DoS target. Clients control the query shape — a malicious or misconfigured client can send a query that fans out exponentially.

Depth Limiting

# Depth attack: resolves User → orders → user → orders → user → ...
query Deep {
  user(id: "1") {
    orders {
      user {
        orders {
          user {
            orders { id }
          }
        }
      }
    }
  }
}
import depthLimit from 'graphql-depth-limit';

const server = new ApolloServer({
  validationRules: [depthLimit(5)],  // reject queries deeper than 5 levels
});

Complexity Scoring

Assign cost weights to fields and reject queries that exceed a budget.

import { createComplexityLimitRule } from 'graphql-validation-complexity';

const ComplexityLimit = createComplexityLimitRule(1000, {
  // Paginated lists multiply complexity by requested count
  listFactor: 10,
  introspectionListFactor: 2,
});

const server = new ApolloServer({
  validationRules: [depthLimit(7), ComplexityLimit],
});

Persisted Queries

Instead of accepting arbitrary query strings, clients register query hashes ahead of time. The server only executes pre-approved queries.

# Client sends:
POST /graphql
{ "extensions": { "persistedQuery": { "version": 1, "sha256Hash": "abc123..." } } }

# Server looks up the hash in its registry, executes the known-safe query
# Unknown hashes → 404

Benefits: blocks arbitrary query injection, enables better caching (GET requests with hash), reduces payload size.

Automatic Persisted Queries (APQ) in Apollo: first request sends hash only, server returns PersistedQueryNotFound, client re-sends with full query + hash, server caches it. Subsequent requests use hash only.

Other Security Controls

  • Disable introspection in production: Introspection reveals your full schema to attackers. Disable it or gate it behind authentication.
  • Rate limit at the query level: Not just HTTP requests — a single heavy query counts more than a simple lookup.
  • Field-level authorization: Check permissions inside resolvers, not just at the gateway. Defence in depth.
  • Input validation: Validate scalar inputs (email format, string length) — GraphQL type system enforces type but not business constraints.

GraphQL in Production: Federation, Schema Registry, Caching

Federation

Large organizations run multiple GraphQL services. Federation stitches them into a single unified graph without a central service owning all schemas.

Apollo Federation model:

# users-service schema (subgraph)
type User @key(fields: "id") {
  id: ID!
  email: String!
  name: String!
}

# orders-service schema (subgraph)
extend type User @key(fields: "id") {
  id: ID! @external
  orders: [Order!]!      # orders-service extends User with its domain
}

type Order {
  id: ID!
  status: OrderStatus!
}

The Apollo Router (or Apollo Gateway) composes these subgraphs into a single supergraph schema and routes query fragments to the correct service.

Query planning: The router splits a federated query into sub-queries, fans out to relevant subgraphs in parallel or sequence, and merges results.

# Client sends to router:
query {
  user(id: "42") {
    name           # → routed to users-service
    orders { id }  # → routed to orders-service
  }
}

Schema Registry

A schema registry tracks schema versions across services. Key functions: - Breaking change detection: Flag removals/renames before deployment - Schema linting: Enforce conventions (naming, deprecation, pagination) - Composition validation: Ensure subgraphs compose without conflicts - Changelog: Schema diff between versions

Apollo Studio, Hygraph, and self-hosted options (graphql-inspector) provide this.

Breaking vs non-breaking changes:

Change Impact
Add field Non-breaking
Add nullable argument Non-breaking
Deprecate field Non-breaking
Remove field BREAKING
Change field type BREAKING
Add non-null argument BREAKING
Rename field BREAKING

Caching

GraphQL's flexible queries make HTTP caching harder than REST (POST requests aren't cached by default).

Response caching strategies:

  1. Persisted queries + GET: Convert queries to GET requests with hash. CDN can cache GET responses.

  2. @cacheControl directive (Apollo):

type Query {
  product(id: ID!): Product @cacheControl(maxAge: 300)
}

type Product {
  id: ID!
  name: String! @cacheControl(maxAge: 3600)
  currentPrice: Float! @cacheControl(maxAge: 60, scope: PUBLIC)
}
  1. DataLoader as request-level cache: DataLoader's per-request cache prevents redundant fetches within a single query execution.

  2. Redis for resolver-level caching: Cache expensive resolver results with TTLs.

Query: {
  expensiveReport: async (_, args, context) => {
    const cacheKey = `report:${JSON.stringify(args)}`;
    const cached = await context.redis.get(cacheKey);
    if (cached) return JSON.parse(cached);
    const result = await computeExpensiveReport(args);
    await context.redis.setex(cacheKey, 300, JSON.stringify(result));
    return result;
  },
},

Key Takeaways

  • GraphQL solves over-fetching and under-fetching by letting clients declare their data needs. The schema is the contract.
  • Resolvers form a tree that mirrors the query shape. Each field is resolved independently.
  • The N+1 problem is GraphQL's most common production pitfall. DataLoader solves it via per-request batching and caching.
  • Cursor pagination (Relay spec) is stable under concurrent writes and performs consistently at scale. Prefer it over offset pagination.
  • Security requires explicit controls: depth limits, complexity scoring, and persisted queries. Never leave introspection enabled in production.
  • Federation enables multiple teams to own subgraphs while exposing a unified API. A schema registry enforces compatibility across teams.
  • HTTP caching is non-trivial with GraphQL — persisted queries + GET, @cacheControl directives, and application-level caching (DataLoader, Redis) are the main tools.

Wiki Navigation

Prerequisites

  • Graphql Flashcards (CLI) (flashcard_deck, L1) — GraphQL