blog.thms.uk

Full-text search in Mastodon

To say search is viewed with great suspicion on Mastodon, is an understatement: There is a feeling that search leads to abuse. That people will search for controversial posts / subjects they don't like, and just pile on.

Whilst I can understand that view, for me personally search is crucial to my enjoyment of social media: I like to know what others may have said about a subject before weighing in, so as not to repeat the same arguments over and over again, which is just tiring. I also like to see if anyone has already posted a cool article, to gauge consensus around it. Overall I love being able to find discussions other than by hashtags.

Unfortunately, default Mastodon search doesn't allow for any of that: even if full-text search is enabled, it only searches your own posts, mentions, favourites, and bookmarks, so doesn't help you discover new stuff at all. But there are options that I'd like to discuss in this post.

Using Google

Using Google to search for mastodon posts actually works pretty well: For example, if you want to search for discussions on full-text search, you can use Google to search site:mastodon.social full text search and get loads of results:

Google search results for mastodon

This works pretty well, but requires Google, and it works less well on less-known instances, which might not be indexed quite as well (e.g. searching with site:mstdn.thms.uk doesn't surface many useful results).

Extended Search on your own instance.

Note: Since writing this post, VyrCossont has released a further patch that fully replaces the original and offers extended search of both accounts and posts. I've updated this part of the post throughout, to reference the new patch instead.

If you are running your own Mastodon instance, there is a great patch by VyrCossont that enables full-text search of all statuses and accounts on your instance. This pull request can be applied to your own code using standard git tools (it's specifically for glitch-soc, a fork of Mastodon that provides a host of really helpful additional features, but others have reported installing it on Vanilla Mastodon with just minor adjustments).

I have installed this on mstdn.thms.uk a short while ago, and absolutely love it! Have a look at the sort of search results I get on my instance:

Search results on my mstdn.thms.uk

Here is how you can set it up too, if you want to:

Set up Elasticsearch

Firstly, you need a server to run Elasticsearch on. I did experiment with running Elasticsearch on my mastodon instance, but ultimately decided to create a separate server for it: Mastodon just runs far more reliably if it doesn't have to contend with Elasticsearch for server resources.

Currently, I'm running Elasticsearch on a small 1 vCPU, 2GB DigitalOcean server for $14 per month. This seems to be plenty for my single user instance, but depending on the size of your Mastodon instance, you may need a larger server.

Once you have your server, set up Elasticsearch. I simply followed Mastodon's instructions. But, make sure you set up your firewall to only allow access to port 9200 from your Mastodon instance!

Apply and configure the Extended Search patch.

You can use the standard Git tools to apply the PR. Once done, navigate to the mastodon directory on your server, and open .env.production. Add the following lines:

ES_ENABLED=true
ES_HOST=localhost # or the IP of your Elasticsearch instance
ES_PORT=9200
STATUS_SEARCH_SCOPE=discoverable # or public, or 
public_or_unlisted
ACCOUNT_SEARCH_SCOPE=discoverable # or all, or classic

Now restart your services to apply the changes:

systemctl restart mastodon-web
systemctl restart mastodon-sidekiq

Finally, populate your Elasticseach index:

RAILS_ENV=production bin/tootctl search deploy --only accounts
RAILS_ENV=production bin/tootctl search deploy --only statuses

Particularly the second command can run for many hours (it ran about 18 hours on my tiny, single user instance), so I recommend using something like tmux to ensure it doesn't stop if you get disconnected, as well as running it during off-peak hours, if possible.

Using Extended Search

This patch also greatly extends Mastodon's existing advanced search syntax. Here are the details on the new operators. All can be inverted by prefixing them with -, except for before:, after:, scope:, and sort:.

Accounts

Posts

If you previously had the original Extended search #2 patch applied

This new patch is a superset of the original patch. All features included in #2 are also included in #5. As such, migrating from the one to the other is very straightforward:

  1. Merge the VyrCossont/account-search branch.
  2. In your .env.production replace SEARCH_SCOPE with STATUS_SEARCH_SCOPE, and add your choice for ACCOUNT_SEARCH_SCOPE
  3. Restart services:
sudo systemctl restart mastodon-sidekiq.service
sudo systemctl restart mastodon-web.service
  1. Reindex all accounts and posts
RAILS_ENV=production bin/tootctl search deploy --only accounts
RAILS_ENV=production bin/tootctl search deploy --only statuses

Particularly the last command can run for many hours (it ran about 18 hours on my tiny, single user instance), so I recommend using something like tmux to ensure it doesn't stop if you get disconnected, as well as running it during off-peak hours, if possible.

Summary

If you want to find mastodon posts using full-text search, Google may be a viable option. But if you are running your own instance, you can also install a patch to enable proper full-text search on it, which made me enjoy Mastodon even more.

As I'm running a single user instance, the risk of harm from enabling extended full-text search on my instance is small: It's only me using it, so I only need to keep an eye on myself to prevent abuse.