blog.thms.uk

Setting up Elasticsearch for Mastodon 4.2.x

With the release of version 4.2.0 of Mastodon, full text search has finally graduated to being useful: Whilst you could previously only search through posts you had favourited, bookmarked, or written yourself - which, lets face it, was kind of useless - Mastodon can now finally search through posts by anyone who has opted into this. At the moment, this opt-in still significantly hampers search, as not many have opted in yet. Hopefully that will change, as more servers roll out 4.2.0, and more users become aware of this setting.

This post will walk through setup of Elasticsearch for Mastodon, with a particular focus on single user and micro instances, because that's what I run. Obviously, the same instructions apply in principle to larger instances, but once you get to a certain size you'll want to pay attention to things such as scaling, redundancy, etc - non of which we owners of very small instances need to worry about.

Elasticsearch has a bit of a reputation of being hard to set up, and prone to issues. I'm happy to report that I've been running Elasticsearch on my instance (originally via a patch that enabled proper full text search) for about 7 months by now, and it's been running very smoothly, and without any issues. So don't be afraid of it!

Server Requirements

Firstly, you need a server to run Elasticsearch on. I did experiment with running Elasticsearch on my mastodon instance, but ultimately decided to create a separate server for it: Mastodon just runs far more reliably if it doesn't have to contend with Elasticsearch for server resources. It also means that if Elasticsearch ever did run into problems (and in the 7 months that I've been running it, it never has), your main Mastodon instance is more isolated from that.

So, my recommendation is: Rather than scaling up your main Mastodon server to make space for Elasticsearch, set up a second server dedicated to Elasticsearch.

Which raises the question: What size of server do you need for Elasticsearch? I personally think that Elasticsearch's requirements are often overestimated. For reference, I can tell you that I'm running a single user instance. My instance has about 9 million posts, and 360,000 users in the database, of which about 1% have opted into full text indexing. (I really hope that number will grow.)

Currently, I'm running Elasticsearch on a 'CAX11' server from Hetzner. This has 2 vCPUs and 4GB RAM, and costs me €3.95 per month. This seems to be plenty for my single user instance. (And in fact I was running Elasticsearch without any issues on a server half that size with my previous provider, but this is simply the cheapest and smallest server Hetzner offers.)

My Elasticsearch server is IPv6 only (because Hetzner charges extra for IPv4), which works fine: As the server doesn't need to be accessible to any server or user on the internet (other than my Mastodon server), there is no need to mess with NAT or any other complications that usually come from IPv6 only.

Set up Elasticsearch

Once you have your server, you firstly need to install the Java Runtime Environment:

apt install openjdk-17-jre-headless

Then add the Elasticsearch repository to apt:

wget -O /usr/share/keyrings/elasticsearch.asc https://artifacts.elastic.co/GPG-KEY-elasticsearch
echo "deb [signed-by=/usr/share/keyrings/elasticsearch.asc] https://artifacts.elastic.co/packages/7.x/apt stable main" > /etc/apt/sources.list.d/elastic-7.x.list

And install Elasticsearch:

apt update
apt install elasticsearch

We are almost done already, but there are a couple of changes we need to make to the Elasticsearch configuration:

Because we're running Elasticsearch on a separate machine, we need to bind it to our private network address. (Don't bind it to your public network, otherwise the whole world can access it! And apply a Firewall rule, too, to allow access only from your Mastodon server.)

Additionally, to prevent your mastodon logs being flooded with the message 'Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone.', I suggest enabling password authentication. This isn't really needed, if you've locked down the firewall (you have locked down the firewall, right?!), but just stops that warning.

To do both, open up /etc/elasticsearch/elasticsearch.yml and insert these three lines at the bottom:

network.host: 10.0.0.3 # replace with your own private IP address
discovery.type: single-node
xpack.security.enabled: true

Then start Elasticsearch:

systemctl daemon-reload
systemctl enable --now elasticsearch

And finally, create a password for connection to Elasticsearch:

cd /usr/share/elasticsearch
bin/elasticsearch-setup-passwords auto

This will generate passwords for all the users. The one you are after is the password for the elastic user:

Changed password for user elastic
PASSWORD elastic = **********

Write it down in your password manager, because you will need it.

Securing Elasticsearch

Now, you could use the elastic user to connect to your Elasticsearch instance directly. However, for security reasons you should not: This user is essentially your Elasticsearch root user, so you should create your own user, with limited permissions.

To do this, firstly create an access role with limited permissions (replace password with the elastic user's password you generated above, and replace 10.0.0.3 with your server's private IP address):

curl -X POST -u elastic:password "10.0.0.3:9200/_security/role/mastodon_full_access?pretty" -H 'Content-Type: application/json' -d'
{
  "cluster": ["monitor"],
  "indices": [{
    "names": ["*"],
    "privileges": ["read", "monitor", "write", "manage"]
  }]
}
'

Secondly, create a user with this access role, that Mastodon will use to connect to Elasticsearch (replace super-secure-password with a secure, randomly generated password, as well as replacing password and 10.0.0.3 again):

curl -X POST -u elastic:password "10.0.0.3:9200/_security/user/mastodon?pretty" -H 'Content-Type: application/json' -d'
{
  "password" : "super-secure-password",
  "roles" : ["mastodon_full_access"]
}
'

And that's the Elasticsearch side done.

Configure Mastodon for Elasticsearch

On your Mastodon server, open up .env.production and add the following lines:

ES_ENABLED=true
ES_HOST=10.0.0.3 # replace with IP of your Elasticsearch instance
ES_PORT=9200
ES_PASS=super-secure-password # The password you set above
ES_USER=mastodon

Now restart your services to apply the changes:

systemctl restart mastodon-web
systemctl restart mastodon-sidekiq

Finally, populate your Elasticsearch index. As this command can run for a few hours, I recommend running it with a lower priority, to ensure the rest of the system remains stable.

RAILS_ENV=production nice -n 19 bin/tootctl search deploy --reset-chewy

(The nice -n 19 here means that the search deploy process will cede resources to pretty much any other process on your server. That way your Mastodon experience won't suffer while the index is being deployed.)

And once that's complete (it took about 2 hours for me), you'll have full text search available on your mastodon instance.

Summary

As you can see, setting up Elasticsearch for Mastodon 4.2.x is quite straightforward. It doesn't need to have a beefy server either, if your instance is a single user or very small instance.

Personally I think that full text search make a huge difference to the Mastodon experience, and it's well worth the effort setting it up.