Freeing Disk Space on an EC2 Linux Instance: Docker, Logs, and NGINX Troubleshooting Guide

This post walks through troubleshooting a nearly full disk on an AWS EC2 Linux instance, identifying Docker and logs as the cause, and cleaning them up safely.

Symptoms and first checks

When disk usage is high, commands can fail, services can crash, and the OS may become unstable. The first indication here was df -h showing the root filesystem / at 98% usage on a 25 GB volume.

Key command to start with:

df -h

Focus on the line for / (often /dev/xvda1 on EC2) and check the Use% column.

Step 1: Find which top-level directory is big

Use du to see which top-level paths under / use the most space:

sudo du -xhd1 / | sort -h

In this case, the notable output looked like:

/var ≈ 16G
/home ≈ 5.4G
/usr ≈ 2.9G

Because /var was far larger than everything else, that became the primary target.

Step 2: Drill into /var

Next, inspect usage inside /var:

sudo du -xhd1 /var | sort -h

The important lines were:

/var/log ≈ 2.2G
/var/lib ≈ 14G

So, two areas to investigate: /var/lib (huge) and /var/log (large but smaller).

Step 3: Identify Docker as the culprit in /var/lib

To see what was big inside /var/lib:

sudo du -xhd1 /var/lib | sort -h

The result showed:

/var/lib/docker ≈ 14G
everything else in /var/lib was tiny

This indicated Docker data was the main consumer.

Check Docker’s own view of disk usage

Use Docker’s built-in disk usage summary:

docker system df

Example output pattern:

Images: a few GB
Containers: small
Local Volumes: tiny
Build Cache: large and fully reclaimable (≈ 8.5G in this case)

When build cache is large and fully reclaimable, it is usually safe to delete on dev/CI hosts.

Step 4: Clean up Docker disk usage

There are a few levels of Docker cleanup, from conservative to aggressive. Choose based on how critical the instance is and how easily images/containers can be recreated.

Option A: Conservative cleanup

Keeps most images, removes unused/temporary data:

docker system prune
docker builder prune

docker system prune removes:stopped containersunused networksdangling imagesbuild cache
- stopped containers
- unused networks
- dangling images
- build cache
docker builder prune explicitly cleans build cache (which was ~8.5G in this case).

Option B: Aggressive cleanup (for dev/CI or easily reproducible environments)

docker system prune -a --volumes

This removes:

all stopped containers
all unused networks
all images not used by any running container
all unused volumes

Use this only if you are sure you can easily rebuild whatever you delete.

Optional: Trim oversized Docker container logs

If container logs in /var/lib/docker/containers are huge but containers must stay running, you can truncate logs:

sudo sh -c 'truncate -s 0 /var/lib/docker/containers/*/*-json.log'

This zeros the log files but does not stop containers.

After any Docker cleanup, re-check:

df -h
sudo du -xhd1 /var/lib/docker | sort -h

Step 5: Clean up /var/log safely

Even after Docker cleanup, /var/log can still be several gigabytes because of system and application logs.

Inspect what is big inside /var/log

sudo du -xhd1 /var/log | sort -h

Common heavy hitters:

/var/log/journal (systemd journal)
Large *.log files (e.g., messages, secure, app logs)
Rotated/compressed logs (*.gz, *.1, etc.)

Trim classic log files

To shrink large current .log files while keeping them:

sudo find /var/log -maxdepth 1 -type f -name "*.log" -size +50M -exec truncate -s 0 {} \;

To delete older, compressed log archives you no longer need:

sudo rm /var/log/*.gz
sudo rm /var/log/*/*.gz

This only affects historical logs, not live logging.

Systemd journal cleanup

If journalctl --disk-usage shows large usage (about 2G in this case):

sudo journalctl --disk-usage

To keep only recent logs by time, for example last 2 days:

sudo journalctl --vacuum-time=2d

To cap journal size by space, e.g., 200M:

sudo journalctl --vacuum-size=200M

Re-check:

sudo journalctl --disk-usage
df -h

Optional: Make journal limits permanent

Edit /etc/systemd/journald.conf and set values like:

SystemMaxUse=200M
SystemMaxFileSize=50M
MaxRetentionSec=7day

Then restart journald:

sudo systemctl restart systemd-journald

Step 6: Verify overall disk health

After Docker and log cleanup, confirm that root usage is back to a safe level:

df -h
sudo du -xhd1 / | sort -h
sudo du -xhd1 /var | sort -h

Aim to keep root usage under 80–85% for comfort and to avoid future issues.

Step 7: If needed, increase the EBS root volume

If, after all cleanups, the root volume is still tight, consider increasing its size:

In the EC2 console, find the root volume (often attached as /dev/xvda).
Use Modify volume to increase its size (e.g., from 25 GiB to 50 GiB).
On the instance, verify new size at the block level:bashlsblk
Grow the partition (example for/dev/xvda1):bashsudo growpart /dev/xvda 1
Grow the filesystem:If ext4:bashsudo resize2fs /dev/xvda1 If XFS:bashsudo xfs_growfs /
Confirm with df -h that / now has more total space and a lower percentage used.

Step 8: Check NGINX status

As part of troubleshooting or verification, you might want to check if NGINX is running.

Test the configuration:

sudo nginx -t

If syntax is OK and the test is successful, manage the service:

sudo systemctl status nginx
sudo systemctl start nginx
sudo systemctl enable nginx   # optional, start at boot
sudo systemctl status nginx   # confirm active (running)

If status shows inactive (dead) but nginx -t passes, starting the service should bring it up normally.

Summary checklist

For future incidents on similar EC2 instances:

Check disk usage: df -h.
Find biggest top-level dirs: sudo du -xhd1 / | sort -h.
If /var is large, drill into /var and /var/lib, /var/log.
If /var/lib/docker is large:docker system dfdocker system prune and/or docker builder pruneOptionally docker system prune -a --volumes for aggressive cleanupTruncate container logs if huge.
If /var/log is large:Truncate or remove old logs.Use journalctl --vacuum-time or --vacuum-size for journal.
Re-check df -h; if still tight, plan to increase the EBS volume and expand the filesystem.
Verify critical services like NGINX with nginx -t and systemctl status nginx.