blog.thms.uk

Clearing up the Mastodon Database using pg_repack

On my Mastodon instance, I recently ran out of disk space during a server upgrade which crashed my instance. Ouch!

In this instance I could thankfully clear out enough space clearing log files as well as an old backup I still had flying around. But even after this, I had only a few GB of disk space left. I wanted to get ahead of this before this became a problem again.

On PostgreSQL the database disk space can balloon over time, as updates to the statuses table (and others) lead to loads of 'dead' rows which are no longer used, but not always reclaimed by PostgreSQL. Although these rows are dead, they still occupy disk space and will degrade the performance of your queries. This post will run through how to use pg_repack to delete these dead rows, and reclaim the disk space occupied by them.

Pre-requisite.

In order for pg_repack to do its thing, it mostly makes a copy of your existing tables (omitting the wasted space) and then removes the original, one by one. (Yes, this is an extreme simplification, but it gets the point across.)

As such, pg_repack is not a tool you can use when you have already run out of disk space. As a guideline, if your free disk space is smaller than your largest table you'll likely run into issues.

To figure out whether you have enough disk space to use pg_repack, you'll therefore want to figure out both your free disk space, and the size of your largest table.

If you already have those figures, feel free to skip the rest of this paragraph, but otherwise don't: Attempting to run pg_repack without enough disk space can cause serious issues.

So, let's firstly determine the free space you have on your disk:

  1. Log in as the mastodon user
    ssh mastodon@example.com # (replace `example.com` with your host name).
    
  2. Use df -h to determine the free space on your disk:
    mastodon@mstdn:~$ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    tmpfs           382M  1.2M  381M   1% /run
    /dev/sda1        38G  29.4G  8.6G  77% /
    [...]
    

Here I have 8.6 GB free disk space.

Now let's check the size of your largest table:

  1. While still logged in as mastodon user, open psql:
    psql -d mastodon_production
    
  2. Type in the following and press ENTER:
    SELECT
         table_schema || '.' || table_name AS table_full_name,
         pg_size_pretty(pg_total_relation_size('"' || table_schema || '"."' || table_name || '"')) AS size
     FROM information_schema.tables
     ORDER BY
         pg_total_relation_size('"' || table_schema || '"."' || table_name || '"') DESC;
    

Have a look at the first row: It'll tell you how the size of the largest table:

table_full_name             | size
----------------------------+------------
public.statuses             | 8506 MB
public.media_attachments    | 1192 MB

In my case my largest table is about 8.5GB and my free disk space is 8.6 GB so I'm good to go.

If you do not have enough free disk space, do not continue: First clear out enough space, e.g. by clearing out log files, old backups, etc.

Install pg_repack

Assuming you have installed Mastodon following the official installation instructions, you should be able to install pg_repack using standard linux tools:

  1. Log into your server as root:
    ssh root@example.com # (replace `example.com` with your host name).
    
  2. Update the list of packages:
    apt update
    
  3. Install pg_repack:
    apt install postgresql-15-repack
    
    (If you are running postgreSQL 16, ensure you install postgresql-16-repack instead.)
  4. Enable the extension within postgres:
    1. Switch to the postgres user:
      su postgres
      
    2. Connect to your SQL database:
      psql -d mastodon_production
      
    3. Within your SQL database create the following commands:
      CREATE EXTENSION pg_repack;
      exit;
      

Run pg_repack

pg_repack must be run as the PostgreSQL super user, so firstly switch to the postgreSQL user, if you haven't already done so:

su postgres

Secondly, run pg_repack:

pg_repack -d mastodon_production --dry-run

Here, mastodon_production is your database name. We append the --dry-run flag to see whether everything is working as expected: This will ensure that pg_repack will simply print out what would've been repacked, but not perform any action.

If you are happy with the result, re-run the command without --dry-run:

pg_repack -d mastodon_production 

Run pg_repack on a schedule

Ideally you'll probably want to run pg_repack regularly to clear out enough space. Here is how you can set up a cron job (i.e. a 'scheduled task') to run pg_repack regularly:

  1. Make sure you are still logged in as postgres user. Otherwise type in su postgres to switch to the user.
  2. Open your crontab editor:
    crontab -e
    
    At this stage you may be asked which editor you want to use. Unless you know what you are doing, I recommend using nano
  3. Scroll to the bottom of the editor and type in the following:
    0 0 * * 1 pg_repack -d mastodon_production
    
    This will run pg_repack every Monday at midnight. If you want to run it monthly instead, you may use the following to run it at midnight on every 1st of the month:
    0 0 1 * * pg_repack -d mastodon_production
    
  4. On your keyboard press CTRL + X to exit the editor.
  5. Press the Y key to save your changes.

And you are done: pg_repack will now run periodically to clear out space on your disk.

Summary

This post shows how to use pg_repack to reclaim the disk space used by dead rows in your Mastodon PostgreSQL database. Obviously re-packing will consume some CPU cycles, so you may see slightly worse performance during the process. But if your disk space is limited, this may be worth doing.

If you have any questions or comments I'd love to see them in the comments below.