blog.thms.uk

Full-text search in Mastodon

Note: Mastodon 4.2 has been released, and brings improved Full Text Search to Mastodon. As such this post is now out of date. Read Setting up Elasticsearch for Mastodon 4.2.x for details on how to enable Full Text Search on Mastodon 4.2.x

To say search is viewed with great suspicion on Mastodon, is an understatement: There is a feeling that search leads to abuse. That people will search for controversial posts / subjects they don't like, and just pile on.

Whilst I can understand that view, for me personally search is crucial to my enjoyment of social media: I like to know what others may have said about a subject before weighing in, so as not to repeat the same arguments over and over again, which is just tiring. I also like to see if anyone has already posted a cool article, to gauge consensus around it. Overall I love being able to find discussions other than by hashtags.

Unfortunately, default Mastodon search doesn't allow for any of that: even if full-text search is enabled, it only searches your own posts, mentions, favourites, and bookmarks, so doesn't help you discover new stuff at all. But there are options that I'd like to discuss in this post.

Using Google

Using Google to search for mastodon posts actually works pretty well: For example, if you want to search for discussions on full-text search, you can use Google to search site:mastodon.social full text search and get loads of results:

Google search results for mastodon

This works pretty well, but requires Google, and it works less well on less-known instances, which might not be indexed quite as well (e.g. searching with site:mstdn.thms.uk doesn't surface many useful results).

Extended Search on your own instance.

If you are running your own Mastodon instance, there is a great patch by VyrCossont that enables full-text search of all statuses and accounts on your instance. This pull request can be applied to your own code using standard git tools.

I have installed this on mstdn.thms.uk a short while ago, and absolutely love it! Have a look at the sort of search results I get on my instance:

Search results on my mstdn.thms.uk

Here is how you can set it up too, if you want to:

Set up Elasticsearch

Firstly, you need a server to run Elasticsearch on. I did experiment with running Elasticsearch on my mastodon instance, but ultimately decided to create a separate server for it: Mastodon just runs far more reliably if it doesn't have to contend with Elasticsearch for server resources.

Currently, I'm running Elasticsearch on a 'CAX11' server from Hetzner. This has 2 vCPUs and 4GB RAM, and costs me €3.95 per month. (It's on IPv6 only, which saves another few quid, so that's great too.) This seems to be plenty for my single user instance, but depending on the size of your Mastodon instance, you may need a larger server.

Once you have your server, set up Elasticsearch. I simply followed Mastodon's instructions.

Here are a couple of things that I had to change after the initial setup:

  1. Because I'm running elasticsearch on a separate machine, I needed to bind it to my private network address. (Don't bind it to your public network, otherwise the whole world can access it! And apply a Firewall rule, too, to allow access only from your Mastodon server(s).)
  2. To prevent my mastodon logs being filled with 'Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone.' I've enabled password authentication. This isn't really needed for me, because I've locked down the firewall, but just stops that warning.

To do that, open up /etc/elasticsearch/elasticsearch.yml and insert these three lines at the bottom:

network.host: 10.0.0.3 # replace with your own private IP address
discovery.type: single-node
xpack.security.enabled: true

Then restart elasticsearch:

systemctl restart elasticsearch

Finally, create a password for connection to elasticsearch:

cd /usr/share/elasticsearch
bin/elasticsearch-setup-passwords auto

This will generate passwords for all the users. The one you are after is the password for the elastic user:

Changed password for user elastic
PASSWORD elastic = **********

Apply and configure the Extended Search patch.

You can use the standard Git tools to apply the PR:

git remote add vyrcossont https://github.com/VyrCossont/mastodon.git
git fetch vyrcossont
# If you are running vanilla mastodon:
git merge vyrcossont/extended-search-final-vanilla
# If you are running Glitch:
git merge vyrcossont/extended-search-final-glitch

Once done, navigate to the mastodon directory on your server, and open .env.production. Add the following lines:

ES_ENABLED=true
ES_HOST=10.0.0.3 # replace with IP of your elasticsearch instance
ES_PORT=9200
STATUS_SEARCH_SCOPE=discoverable # or public, or public_or_unlisted
ACCOUNT_SEARCH_SCOPE=discoverable # or all, or classic
ES_PASS********** # The password you got above
ES_USER=elastic

Now restart your services to apply the changes:

systemctl restart mastodon-web
systemctl restart mastodon-sidekiq

Finally, populate your Elasticseach index:

RAILS_ENV=production bin/tootctl search deploy --only accounts
RAILS_ENV=production bin/tootctl search deploy --only statuses

Particularly the second command can run for many hours (it ran about 18 hours on my tiny, single user instance), so I recommend using something like tmux to ensure it doesn't stop if you get disconnected, as well as running it during off-peak hours, if possible. You might want to run them with a lower priority, too, to ensure the rest of the system remains stable:

RAILS_ENV=production nice -n 19 bin/tootctl search deploy --only accounts
RAILS_ENV=production nice -n 19 bin/tootctl search deploy --only statuses

This patch also greatly extends Mastodon's existing advanced search syntax. Here are the details on the new operators. All can be inverted by prefixing them with -, except for before:, after:, scope:, and sort:.

Accounts

Posts

Summary

If you want to find mastodon posts using full-text search, Google may be a viable option. But if you are running your own instance, you can also install a patch to enable proper full-text search on it, which made me enjoy Mastodon even more.

As I'm running a single user instance, the risk of harm from enabling extended full-text search on my instance is small: It's only me using it, so I only need to keep an eye on myself to prevent abuse.