Skip to content

Portal | Level: L2 | Domain: DevOps

GraphQL - Street-Level Ops

Hands-on patterns, diagnosis commands, and gotcha-avoidance for GraphQL in production.


Quick Diagnosis Commands

Introspection: Is the server alive and what does it know?

# Basic introspection — list all top-level types
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"query":"{ __schema { types { name kind } } }"}' | jq '.data.__schema.types[]'

# List all query fields available at root
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{ __schema { queryType { fields { name description } } } }"}' \
  | jq '.data.__schema.queryType.fields[]'

# Inspect a specific type
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ __type(name: \"User\") { name fields { name type { name kind ofType { name kind } } } } }"
  }' | jq '.'

# Full schema introspection (standard introspection query — paste into any GraphQL client)
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "query IntrospectionQuery { __schema { queryType { name } mutationType { name } subscriptionType { name } types { ...FullType } directives { name description locations args { ...InputValue } } } } fragment FullType on __Type { kind name description fields(includeDeprecated: true) { name description args { ...InputValue } type { ...TypeRef } isDeprecated deprecationReason } inputFields { ...InputValue } interfaces { ...TypeRef } enumValues(includeDeprecated: true) { name description isDeprecated deprecationReason } possibleTypes { ...TypeRef } } fragment InputValue on __InputValue { name description type { ...TypeRef } defaultValue } fragment TypeRef on __Type { kind name ofType { kind name ofType { kind name ofType { kind name ofType { kind name ofType { kind name ofType { kind name } } } } } } }"
  }' | jq '.data.__schema' > schema-snapshot.json

Check for errors in response

# Any response with errors
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{ user(id: \"badid\") { id name } }"}' \
  | jq 'if .errors then .errors else "no errors" end'

# Extract error codes from extensions
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{ user(id: \"missing\") { id } }"}' \
  | jq '.errors[].extensions.code'

Measure resolver latency with Apollo tracing

# Enable tracing extension in request (if server supports it)
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -H "X-Apollo-Tracing: 1" \
  -d '{"query":"{ users(limit: 10) { id orders { id } } }"}' \
  | jq '.extensions.tracing.execution.resolvers | sort_by(.duration) | reverse | .[0:10]'

Check deprecated fields usage

# Find all deprecated fields in schema
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{ __schema { types { name fields(includeDeprecated: true) { name isDeprecated deprecationReason } } } }"}' \
  | jq '.data.__schema.types[].fields[] | select(.isDeprecated == true) | {name, deprecationReason}'

Send a mutation via curl

curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "query": "mutation CreateUser($input: CreateUserInput!) { createUser(input: $input) { id email } }",
    "variables": {
      "input": {
        "email": "ops@example.com",
        "name": "Ops Bot"
      }
    }
  }' | jq '.'

Send a subscription test (wscat for WebSocket)

# Install wscat: npm install -g wscat
wscat -c wss://api.example.com/graphql \
  -H "Authorization: Bearer $TOKEN" \
  --execute '{"type":"connection_init","payload":{}}' \
  --execute '{"id":"1","type":"subscribe","payload":{"query":"subscription { orderStatusChanged(orderId: \"o42\") { id status } }"}}'

Gotcha: N+1 in Production

Symptom: GraphQL query takes 2-10s. DB CPU spikes. Query logs show hundreds of nearly identical SELECT * FROM users WHERE id = ? queries fired sequentially.

Diagnosis:

# Enable slow query logging in Postgres (temporary)
psql -c "SET log_min_duration_statement = 100;"  # log queries > 100ms

# Or check current active queries
psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
         FROM pg_stat_activity
         WHERE state != 'idle' AND query NOT ILIKE '%pg_stat_activity%'
         ORDER BY duration DESC;"

Fix: DataLoader wrapping per loader type, instantiated in request context:

// context.js — request context factory
import DataLoader from 'dataloader';

export function createContext(req) {
  return {
    db,
    auth: req.auth,
    loaders: {
      user: new DataLoader(async (ids) => {
        const rows = await db.raw('SELECT * FROM users WHERE id = ANY(?)', [ids]);
        const map = new Map(rows.map(r => [r.id, r]));
        return ids.map(id => map.get(id) ?? null);
      }),
      ordersByUserId: new DataLoader(async (userIds) => {
        const rows = await db.raw('SELECT * FROM orders WHERE user_id = ANY(?)', [userIds]);
        const grouped = userIds.map(uid => rows.filter(r => r.user_id === uid));
        return grouped;
      }),
    },
  };
}

Rule: Every resolver that calls db.findById(parent.someId) must go through a DataLoader. Code-review for .findById( calls in resolvers without context.loaders.


Gotcha: Schema Breaking Changes Deployed Without Warning

Symptom: Client deploys, everything breaks. GraphQL returns Cannot query field "legacyUserName" on type "User".

Diagnosis:

# Compare schema before/after with graphql-inspector
npx graphql-inspector diff \
  https://api.example.com/graphql \
  ./schema-new.graphql

# Output example:
# ✖ Field 'User.legacyUserName' was removed  [BREAKING]
# ⚠ Field 'User.name' is deprecated [NON_BREAKING]

Fix workflow: 1. Mark old field deprecated first: legacyUserName: String @deprecated(reason: "Use 'name' instead") 2. Give clients at least one release cycle to migrate 3. Only then remove the field 4. Gate all schema changes through a schema registry that blocks breaking changes in CI

# .github/workflows/schema-check.yml
- name: Check schema for breaking changes
  run: |
    npx graphql-inspector diff \
      ${{ secrets.PROD_GRAPHQL_ENDPOINT }} \
      ./schema.graphql \
      --fail-on-breaking

Gotcha: Subscription Memory Leaks

Symptom: Server memory grows slowly over hours. Restart fixes it temporarily. Active subscription count in metrics climbs but never drops after clients disconnect.

Root cause: Subscription resolvers set up event listeners or PubSub subscriptions but the cleanup function (the return value of subscribe) is never called, or the transport layer doesn't call it on disconnect.

Fix:

// Always return a cleanup function from the subscribe resolver
Subscription: {
  orderStatusChanged: {
    subscribe: async function* (parent, { orderId }, context) {
      const channel = `order:${orderId}:status`;

      // Register listener
      const listener = await context.pubsub.subscribe(channel);

      try {
        for await (const event of listener) {
          yield { orderStatusChanged: event };
        }
      } finally {
        // This runs on disconnect or generator completion
        await context.pubsub.unsubscribe(channel);
        console.log(`Cleaned up subscription for order ${orderId}`);
      }
    },
    resolve: (payload) => payload.orderStatusChanged,
  },
},

Monitor:

# Expose subscription count as a Prometheus gauge
# In your GraphQL server setup:
const activeSubscriptions = new promClient.Gauge({
  name: 'graphql_active_subscriptions_total',
  help: 'Number of active GraphQL subscriptions',
  labelNames: ['operation'],
});

# Alert if subscriptions grow monotonically over 30 minutes
# with no corresponding client disconnects

Gotcha: Introspection Enabled in Production

Symptom: Security scan flags the GraphQL endpoint. __schema queries return full schema including internal types, field descriptions, and deprecated fields with business logic clues.

Fix:

// Apollo Server v4
import { ApolloServer } from '@apollo/server';

const server = new ApolloServer({
  schema,
  introspection: process.env.NODE_ENV !== 'production',  // disable in prod
});

// Or: allow introspection only for authenticated users
plugins: [
  {
    requestDidStart: async ({ request, contextValue }) => ({
      async didResolveOperation({ request, document }) {
        const hasIntrospection = document.definitions.some(
          def => def.selectionSet?.selections.some(
            sel => sel.name?.value?.startsWith('__')
          )
        );
        if (hasIntrospection && !contextValue.user?.isAdmin) {
          throw new ForbiddenError('Introspection requires authentication');
        }
      },
    }),
  },
],

Pattern: Persisted Queries Setup

Reduce payload size and prevent arbitrary query injection.

// Apollo Client (frontend)
import { ApolloClient, InMemoryCache, createHttpLink } from '@apollo/client';
import { createPersistedQueryLink } from "@apollo/client/link/persisted-queries";
import { sha256 } from 'crypto-hash';

const persistedQueriesLink = createPersistedQueryLink({ sha256 });
const httpLink = createHttpLink({ uri: '/graphql' });

const client = new ApolloClient({
  cache: new InMemoryCache(),
  link: persistedQueriesLink.concat(httpLink),
});
// Apollo Server — enable APQ
import { ApolloServer } from '@apollo/server';
import responseCachePlugin from '@apollo/server-plugin-response-cache';

const server = new ApolloServer({
  schema,
  plugins: [
    responseCachePlugin(),  // also caches full responses
  ],
  // APQ is enabled by default in Apollo Server v4
});

Pattern: DataLoader for Batching Across Services (not just DB)

DataLoader works for any async resource — HTTP calls to microservices, Redis lookups, or S3 fetches.

// Batch HTTP calls to a downstream service
const productLoader = new DataLoader(async (productIds) => {
  const response = await fetch('https://catalog-service/batch', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ ids: productIds }),
  });
  const products = await response.json();
  const map = new Map(products.map(p => [p.id, p]));
  // Preserve order — DataLoader requires index correspondence
  return productIds.map(id => map.get(id) ?? new Error(`Product ${id} not found`));
});

Pattern: Structured Error Formatting

import { ApolloServer, ApolloError } from '@apollo/server';

// Custom error classes
class NotFoundError extends ApolloError {
  constructor(resource, id) {
    super(`${resource} with id ${id} not found`, 'NOT_FOUND', { resource, id });
  }
}

class AuthorizationError extends ApolloError {
  constructor(message = 'Not authorized') {
    super(message, 'FORBIDDEN');
  }
}

// Format errors before they leave the server
const server = new ApolloServer({
  schema,
  formatError: (formattedError, error) => {
    // Log the full internal error
    console.error('GraphQL error:', {
      message: error.message,
      stack: error.stack,
      path: formattedError.path,
    });

    // Never expose internal server errors to clients
    if (error.originalError instanceof DatabaseError) {
      return {
        message: 'Internal server error',
        extensions: { code: 'INTERNAL_SERVER_ERROR' },
      };
    }

    return formattedError;
  },
});

Pattern: Rate Limiting Per Operation

import rateLimit from 'express-rate-limit';
import { parse, getOperationAST } from 'graphql';

// Middleware: extract operation name and apply different rate limits
const graphqlRateLimiter = (req, res, next) => {
  try {
    const { query } = req.body;
    const document = parse(query);
    const operation = getOperationAST(document);
    const operationName = operation?.name?.value ?? 'anonymous';

    // Tighter limits for expensive operations
    const expensiveOps = ['GenerateReport', 'ExportAllUsers', 'BulkSearch'];
    if (expensiveOps.includes(operationName)) {
      return expensiveOpLimiter(req, res, next);
    }
  } catch (e) {
    // Malformed query — pass through, let GraphQL handle validation
  }
  return standardLimiter(req, res, next);
};

const standardLimiter = rateLimit({ windowMs: 60_000, max: 1000 });
const expensiveOpLimiter = rateLimit({ windowMs: 60_000, max: 10 });

Pattern: Schema Validation in CI with graphql-inspector

# Install
npm install -g @graphql-inspector/cli

# Compare local schema against production
graphql-inspector diff \
  "https://api.example.com/graphql" \
  "./schema.graphql"

# Validate schema against rules
graphql-inspector validate \
  "./schema.graphql" \
  --rule naming-convention \
  --rule require-deprecation-reason

# Find usage of deprecated fields in operations
graphql-inspector coverage \
  "./schema.graphql" \
  "./src/**/*.graphql"

Quick Reference: HTTP Status Codes with GraphQL

Scenario HTTP Status Notes
Successful query (even with partial errors) 200 Check .errors in body
Malformed JSON body 400
Missing required Content-Type 415
Auth failure (before execution) 401
Method not allowed (GET on mutation) 405
Server crashed during execution 500 Only if error is unhandled
Persisted query not found (first request) 200 + PersistedQueryNotFound error Client resends with full query

Wiki Navigation