This post walks through troubleshooting a nearly full disk on an AWS EC2 Linux instance, identifying Docker and logs as the cause, and cleaning them up safely.
Symptoms and first checks
When disk usage is high, commands can fail, services can crash, and the OS may become unstable. The first indication here was df -h showing the root filesystem / at 98% usage on a 25 GB volume.
Key command to start with:
df -hFocus on the line for / (often /dev/xvda1 on EC2) and check the Use% column.
Step 1: Find which top-level directory is big
Use du to see which top-level paths under / use the most space:
sudo du -xhd1 / | sort -h
In this case, the notable output looked like:
- /var ≈ 16G
- /home ≈ 5.4G
- /usr ≈ 2.9G
Because /var was far larger than everything else, that became the primary target.
Step 2: Drill into /var
Next, inspect usage inside /var:
sudo du -xhd1 /var | sort -hThe important lines were:
- /var/log ≈ 2.2G
- /var/lib ≈ 14G
So, two areas to investigate: /var/lib (huge) and /var/log (large but smaller).
Step 3: Identify Docker as the culprit in /var/lib
To see what was big inside /var/lib:
sudo du -xhd1 /var/lib | sort -hThe result showed:
- /var/lib/docker ≈ 14G
- everything else in /var/lib was tiny
This indicated Docker data was the main consumer.
Check Docker’s own view of disk usage
Use Docker’s built-in disk usage summary:
docker system dfExample output pattern:
- Images: a few GB
- Containers: small
- Local Volumes: tiny
- Build Cache: large and fully reclaimable (≈ 8.5G in this case)
When build cache is large and fully reclaimable, it is usually safe to delete on dev/CI hosts.
Step 4: Clean up Docker disk usage
There are a few levels of Docker cleanup, from conservative to aggressive. Choose based on how critical the instance is and how easily images/containers can be recreated.
Option A: Conservative cleanup
Keeps most images, removes unused/temporary data:
docker system prune
docker builder prune- docker system prune removes:stopped containersunused networksdangling imagesbuild cache
- stopped containers
- unused networks
- dangling images
- build cache
- docker builder prune explicitly cleans build cache (which was ~8.5G in this case).
Option B: Aggressive cleanup (for dev/CI or easily reproducible environments)
docker system prune -a --volumesThis removes:
- all stopped containers
- all unused networks
- all images not used by any running container
- all unused volumes
Use this only if you are sure you can easily rebuild whatever you delete.
Optional: Trim oversized Docker container logs
If container logs in /var/lib/docker/containers are huge but containers must stay running, you can truncate logs:
sudo sh -c 'truncate -s 0 /var/lib/docker/containers/*/*-json.log'This zeros the log files but does not stop containers.
After any Docker cleanup, re-check:
df -h
sudo du -xhd1 /var/lib/docker | sort -h
Step 5: Clean up /var/log safely
Even after Docker cleanup, /var/log can still be several gigabytes because of system and application logs.
Inspect what is big inside /var/log
sudo du -xhd1 /var/log | sort -hCommon heavy hitters:
- /var/log/journal (systemd journal)
- Large *.log files (e.g., messages, secure, app logs)
- Rotated/compressed logs (*.gz, *.1, etc.)
Trim classic log files
To shrink large current .log files while keeping them:
sudo find /var/log -maxdepth 1 -type f -name "*.log" -size +50M -exec truncate -s 0 {} \;
To delete older, compressed log archives you no longer need:
sudo rm /var/log/*.gz
sudo rm /var/log/*/*.gz
This only affects historical logs, not live logging.
Systemd journal cleanup
If journalctl --disk-usage shows large usage (about 2G in this case):
sudo journalctl --disk-usage
To keep only recent logs by time, for example last 2 days:
sudo journalctl --vacuum-time=2d
To cap journal size by space, e.g., 200M:
sudo journalctl --vacuum-size=200M
Re-check:
sudo journalctl --disk-usage
df -h
Optional: Make journal limits permanent
Edit /etc/systemd/journald.conf and set values like:
SystemMaxUse=200M
SystemMaxFileSize=50M
MaxRetentionSec=7day
Then restart journald:
sudo systemctl restart systemd-journald
Step 6: Verify overall disk health
After Docker and log cleanup, confirm that root usage is back to a safe level:
df -h
sudo du -xhd1 / | sort -h
sudo du -xhd1 /var | sort -h
Aim to keep root usage under 80–85% for comfort and to avoid future issues.
Step 7: If needed, increase the EBS root volume
If, after all cleanups, the root volume is still tight, consider increasing its size:
- In the EC2 console, find the root volume (often attached as /dev/xvda).
- Use Modify volume to increase its size (e.g., from 25 GiB to 50 GiB).
- On the instance, verify new size at the block level:bashlsblk
- Grow the partition (example for/dev/xvda1):bashsudo growpart /dev/xvda 1
- Grow the filesystem:If ext4:bashsudo resize2fs /dev/xvda1 If XFS:bashsudo xfs_growfs /
- Confirm with df -h that / now has more total space and a lower percentage used.
Step 8: Check NGINX status
As part of troubleshooting or verification, you might want to check if NGINX is running.
Test the configuration:
sudo nginx -t
If syntax is OK and the test is successful, manage the service:
sudo systemctl status nginx
sudo systemctl start nginx
sudo systemctl enable nginx # optional, start at boot
sudo systemctl status nginx # confirm active (running)
If status shows inactive (dead) but nginx -t passes, starting the service should bring it up normally.
Summary checklist
For future incidents on similar EC2 instances:
- Check disk usage: df -h.
- Find biggest top-level dirs: sudo du -xhd1 / | sort -h.
- If /var is large, drill into /var and /var/lib, /var/log.
- If /var/lib/docker is large:docker system dfdocker system prune and/or docker builder pruneOptionally docker system prune -a --volumes for aggressive cleanupTruncate container logs if huge.
- If /var/log is large:Truncate or remove old logs.Use journalctl --vacuum-time or --vacuum-size for journal.
- Re-check df -h; if still tight, plan to increase the EBS volume and expand the filesystem.
- Verify critical services like NGINX with nginx -t and systemctl status nginx.




