MySQL / MariaDB Operations — Street-Level Ops¶
Quick Diagnosis Commands¶
# Is MySQL alive?
mysqladmin -u root -p ping
mysqladmin -u root -p status
# What's running right now?
mysql -e "SHOW FULL PROCESSLIST\G" | less
# Key health metrics at a glance
mysqladmin -u root -p extended-status | grep -E \
"Threads_connected|Slow_queries|Questions|Com_select|Com_insert|Com_update|Com_delete|Innodb_buffer_pool_read_requests|Innodb_buffer_pool_reads"
# Check replication status (replica)
mysql -e "SHOW SLAVE STATUS\G" | grep -E "Running|Seconds|Error|Master_Log"
# Disk space (MySQL data dir)
du -sh /var/lib/mysql/
df -h /var/lib/mysql
One-liner:
SHOW FULL PROCESSLISTis the MySQL equivalent ofps aux. When the database is slow, this is always the first command. Look for queries in "Sending data" state with high Time values — those are your blockers.
Gotcha: Replica Lag Grows Monotonically¶
Seconds_Behind_Master keeps growing. The SQL thread is falling behind.
Rule: Replica SQL thread is single-threaded by default in MySQL 5.6 and earlier. Parallel replication requires configuration.
-- Check if SQL thread is running
SHOW SLAVE STATUS\G
-- Slave_SQL_Running: Yes = running but slow
-- Slave_SQL_Running: No = stopped (check Last_SQL_Error)
-- Enable parallel replication (MySQL 5.7+)
STOP SLAVE SQL_THREAD;
SET GLOBAL slave_parallel_workers = 4;
SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK'; -- better than DATABASE
START SLAVE SQL_THREAD;
-- Monitor slave thread activity
SELECT * FROM performance_schema.replication_applier_status_by_worker;
-- If lag is from a huge transaction, wait it out
-- Check what the SQL thread is currently executing:
SELECT * FROM performance_schema.replication_applier_status_by_coordinator\G
Gotcha: "Too Many Connections" Error¶
Application gets ERROR 1040 (HY000): Too many connections.
Rule: MySQL has a hard ceiling (max_connections). The connection from the error itself often can't connect either — reserve one connection for root.
-- Check current state (connect as root from the server itself)
SHOW GLOBAL STATUS LIKE 'Threads_connected';
SHOW GLOBAL STATUS LIKE 'Max_used_connections';
SHOW GLOBAL VARIABLES LIKE 'max_connections';
-- Immediate fix: increase limit (runtime, no restart needed)
SET GLOBAL max_connections = 500;
-- Persistent fix in my.cnf:
-- max_connections = 500
-- Find who is holding connections
SELECT user, host, db, command, time, state
FROM information_schema.processlist
WHERE command != 'Sleep'
ORDER BY time DESC;
-- Kill sleeping connections older than 60 seconds (generate KILL statements)
SELECT CONCAT('KILL ', id, ';')
FROM information_schema.processlist
WHERE command = 'Sleep' AND time > 60 AND user != 'replication';
With ProxySQL: check if the connection pool is exhausted:
-- On ProxySQL admin port 6032
SELECT hostgroup, srv_host, status, ConnUsed, ConnFree, ConnOK, ConnERR
FROM stats_mysql_connection_pool;
Gotcha: Replication Broke — Duplicate Key Error¶
Last_SQL_Error: Could not execute Write_rows event on table mydb.users;
Duplicate entry '12345' for key 'PRIMARY'
The replica has a row the primary doesn't (out-of-band write to replica, or replica was used for a write that also went to primary).
-- Option 1: Skip one event (quick fix, may mask data divergence)
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;
-- With GTID: you can't use skip_counter; inject an empty transaction instead
STOP SLAVE;
SET GTID_NEXT = 'uuid:transaction_id'; -- the GTID shown in Last_SQL_Error
BEGIN; COMMIT;
SET GTID_NEXT = 'AUTOMATIC';
START SLAVE;
-- Option 2: Verify replica is actually in sync (then fix root cause)
pt-table-checksum --host=primary --databases=mydb --tables=users
-- If drifted, resync: pt-table-sync --execute h=primary,D=mydb,t=users h=replica
-- Option 3: If drift is large, rebuild the replica from a fresh backup
Default trap:
sql_slave_skip_counterskips the next N events, not transactions. A multi-statement transaction may have multiple events. Skipping 1 event might leave the transaction half-applied. With GTID replication, the empty-transaction injection method is safer because it operates at the transaction level.
Pattern: Emergency Schema Change on a Live Table¶
A schema change (ALTER TABLE) acquires a metadata lock and blocks all reads and writes on the table for its duration. On a 100 GB table this can take 30+ minutes.
# Use pt-online-schema-change instead
pt-online-schema-change \
--alter "ADD COLUMN metadata JSON NULL AFTER updated_at" \
--host=primary.db.example.com \
--user=root --password=secret \
--chunk-size=1000 \
--max-load "Threads_running=25" \
--critical-load "Threads_running=50" \
--execute \
D=mydb,t=users
# MySQL 8.0+ Online DDL (for some operations — check docs for which ones are instant)
ALTER TABLE users ADD COLUMN metadata JSON NULL, ALGORITHM=INSTANT;
ALTER TABLE users ADD INDEX idx_email (email), ALGORITHM=INPLACE, LOCK=NONE;
Pattern: Identifying and Killing Blocking Transactions¶
-- Who is blocking writes?
SELECT
r.trx_mysql_thread_id AS waiting_thread,
SUBSTRING(r.trx_query, 1, 80) AS waiting_query,
b.trx_mysql_thread_id AS blocking_thread,
SUBSTRING(b.trx_query, 1, 80) AS blocking_query,
b.trx_started AS blocking_started,
TIMESTAMPDIFF(SECOND, b.trx_started, NOW()) AS blocking_seconds
FROM information_schema.innodb_lock_waits w
JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id
ORDER BY blocking_seconds DESC;
-- Kill the blocker
KILL 54321; -- blocking_thread value
-- In MySQL 8.0: use performance_schema for more detail
SELECT * FROM performance_schema.data_lock_waits\G
Scenario: Database Won't Start After Crash¶
# Step 1: Read the error log
tail -100 /var/log/mysql/error.log
# Common errors:
# "InnoDB: Corruption in the InnoDB tablespace" -> force recovery
# "Table './mydb/users' is marked as crashed" -> MyISAM crash
# "Can't create test file" -> permission issue on data dir
# Step 2: Permission issue fix
ls -la /var/lib/mysql/
chown -R mysql:mysql /var/lib/mysql/
chmod 750 /var/lib/mysql/
systemctl start mysql
# Step 3: InnoDB crash recovery
# Edit /etc/mysql/mysql.conf.d/mysqld.cnf
# Add: innodb_force_recovery = 1
systemctl start mysql
# If it starts: dump immediately
mysqldump --all-databases > /backup/emergency_$(date +%Y%m%d_%H%M%S).sql
# Then rebuild: remove innodb_force_recovery, restore from dump
# Step 4: MyISAM table repair
mysqlcheck -u root -p --repair --all-databases
# Or specific table:
mysql -e "REPAIR TABLE mydb.legacy_table;"
Scenario: Slow Application — Database CPU at 100%¶
-- Step 1: See what's running
SHOW FULL PROCESSLIST;
-- Look for long-running queries in the State column
-- Step 2: Find the top offenders via performance_schema
SELECT digest_text,
count_star AS exec_count,
ROUND(sum_timer_wait / 1e12, 2) AS total_sec,
ROUND(avg_timer_wait / 1e12, 4) AS avg_sec,
ROUND(sum_rows_examined / count_star) AS avg_rows_examined
FROM performance_schema.events_statements_summary_by_digest
WHERE last_seen > NOW() - INTERVAL 10 MINUTE
ORDER BY sum_timer_wait DESC
LIMIT 10;
-- Step 3: EXPLAIN the bad queries
EXPLAIN FORMAT=JSON SELECT ...;
-- Step 4: Kill the worst offenders immediately
SELECT CONCAT('KILL QUERY ', id, ';')
FROM information_schema.processlist
WHERE time > 30 AND user != 'replication'
ORDER BY time DESC;
Emergency: Disk Full — MySQL Stopped Writing¶
# MySQL logs: "ERROR 28 from storage engine" or "ERROR 1030 (HY000): Got error 28"
# Step 1: Identify what's consuming space
du -sh /var/lib/mysql/*/ # per-database directories
ls -lSh /var/lib/mysql/*.ibd | tail -20 # largest InnoDB tablespace files
# Step 2: Free space immediately
# Purge old binary logs (only if replicas are caught up)
mysql -e "SHOW SLAVE HOSTS;" # check replicas exist
mysql -e "PURGE BINARY LOGS BEFORE NOW() - INTERVAL 1 DAY;"
# Or purge to a specific file
mysql -e "PURGE BINARY LOGS TO 'mysql-bin.000080';"
# Step 3: If a table has grown unexpectedly (e.g., a log table)
mysql -e "TRUNCATE TABLE mydb.application_logs;"
# Or: DELETE in batches (TRUNCATE is faster but non-transactional)
# Step 4: Reclaim space from deleted rows (InnoDB doesn't shrink files automatically)
mysql -e "OPTIMIZE TABLE mydb.large_table;"
# Or use pt-online-schema-change with a no-op alter to rebuild the file
# Step 5: Increase disk or add volume and symlink
Useful One-Liners¶
# Quick replica health check
mysql -e "SHOW SLAVE STATUS\G" | grep -E "Running|Seconds|Error"
# Find tables with most rows
mysql -e "SELECT table_schema, table_name, table_rows, ROUND(data_length/1024/1024,1) AS data_mb
FROM information_schema.tables ORDER BY data_length DESC LIMIT 20;"
# Show current variables (buffer pool, connections)
mysql -e "SHOW GLOBAL VARIABLES LIKE 'innodb_buffer%';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Threads%';"
# List indexes on a table
mysql -e "SHOW INDEX FROM mydb.orders\G"
# Show running queries older than 5 seconds
mysql -e "SELECT id,user,host,db,time,state,left(info,80) FROM information_schema.processlist WHERE time > 5 ORDER BY time DESC;"
# Replica: skip one GTID error transaction
mysql -e "STOP SLAVE; SET GTID_NEXT='$(mysql -e "SHOW SLAVE STATUS\G" | grep Retrieved | awk '{print $2}')'; BEGIN; COMMIT; SET GTID_NEXT='AUTOMATIC'; START SLAVE;"
# Check table sizes
mysql -e "SELECT CONCAT(table_schema,'.',table_name) AS tbl,
ROUND((data_length+index_length)/1024/1024,1) AS total_mb
FROM information_schema.tables ORDER BY total_mb DESC LIMIT 20;"
# Flush slow query log without restart
mysql -e "FLUSH SLOW LOGS;"
Debug clue: If
innodb_buffer_pool_readsis high relative toinnodb_buffer_pool_read_requests, InnoDB is reading from disk instead of cache. The hit ratio should be >99%. Calculate it:1 - (reads / read_requests) * 100. If it is below 99%, the buffer pool is too small. Increaseinnodb_buffer_pool_size— the general rule is 70-80% of total RAM on a dedicated database server.