Full-text search in Mastodon
Note: Mastodon 4.2 has been released, and brings improved Full Text Search to Mastodon. As such this post is now out of date. Read Setting up Elasticsearch for Mastodon 4.2.x for details on how to enable Full Text Search on Mastodon 4.2.x
To say search is viewed with great suspicion on Mastodon, is an understatement: There is a feeling that search leads to abuse. That people will search for controversial posts / subjects they don't like, and just pile on.
Whilst I can understand that view, for me personally search is crucial to my enjoyment of social media: I like to know what others may have said about a subject before weighing in, so as not to repeat the same arguments over and over again, which is just tiring. I also like to see if anyone has already posted a cool article, to gauge consensus around it. Overall I love being able to find discussions other than by hashtags.
Unfortunately, default Mastodon search doesn't allow for any of that: even if full-text search is enabled, it only searches your own posts, mentions, favourites, and bookmarks, so doesn't help you discover new stuff at all. But there are options that I'd like to discuss in this post.
Using Google
Using Google to search for mastodon posts actually works pretty well: For example, if you want to search for discussions on full-text search, you can use Google to search site:mastodon.social full text search
and get loads of results:
This works pretty well, but requires Google, and it works less well on less-known instances, which might not be indexed quite as well (e.g. searching with site:mstdn.thms.uk
doesn't surface many useful results).
Extended Search on your own instance.
If you are running your own Mastodon instance, there is a great patch by VyrCossont that enables full-text search of all statuses and accounts on your instance. This pull request can be applied to your own code using standard git tools.
I have installed this on mstdn.thms.uk a short while ago, and absolutely love it! Have a look at the sort of search results I get on my instance:
Here is how you can set it up too, if you want to:
Set up Elasticsearch
Firstly, you need a server to run Elasticsearch on. I did experiment with running Elasticsearch on my mastodon instance, but ultimately decided to create a separate server for it: Mastodon just runs far more reliably if it doesn't have to contend with Elasticsearch for server resources.
Currently, I'm running Elasticsearch on a 'CAX11' server from Hetzner. This has 2 vCPUs and 4GB RAM, and costs me €3.95 per month. (It's on IPv6 only, which saves another few quid, so that's great too.) This seems to be plenty for my single user instance, but depending on the size of your Mastodon instance, you may need a larger server.
Once you have your server, set up Elasticsearch. I simply followed Mastodon's instructions.
Here are a couple of things that I had to change after the initial setup:
- Because I'm running elasticsearch on a separate machine, I needed to bind it to my private network address. (Don't bind it to your public network, otherwise the whole world can access it! And apply a Firewall rule, too, to allow access only from your Mastodon server(s).)
- To prevent my mastodon logs being filled with 'Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone.' I've enabled password authentication. This isn't really needed for me, because I've locked down the firewall, but just stops that warning.
To do that, open up /etc/elasticsearch/elasticsearch.yml
and insert these three lines at the bottom:
network.host: 10.0.0.3 # replace with your own private IP address
discovery.type: single-node
xpack.security.enabled: true
Then restart elasticsearch:
systemctl restart elasticsearch
Finally, create a password for connection to elasticsearch:
cd /usr/share/elasticsearch
bin/elasticsearch-setup-passwords auto
This will generate passwords for all the users. The one you are after is the password for the elastic
user:
Changed password for user elastic
PASSWORD elastic = **********
Apply and configure the Extended Search patch.
You can use the standard Git tools to apply the PR:
git remote add vyrcossont https://github.com/VyrCossont/mastodon.git
git fetch vyrcossont
# If you are running vanilla mastodon:
git merge vyrcossont/extended-search-final-vanilla
# If you are running Glitch:
git merge vyrcossont/extended-search-final-glitch
Once done, navigate to the mastodon directory on your server, and open .env.production
. Add the following lines:
ES_ENABLED=true
ES_HOST=10.0.0.3 # replace with IP of your elasticsearch instance
ES_PORT=9200
STATUS_SEARCH_SCOPE=discoverable # or public, or public_or_unlisted
ACCOUNT_SEARCH_SCOPE=discoverable # or all, or classic
ES_PASS********** # The password you got above
ES_USER=elastic
Now restart your services to apply the changes:
systemctl restart mastodon-web
systemctl restart mastodon-sidekiq
Finally, populate your Elasticseach index:
RAILS_ENV=production bin/tootctl search deploy --only accounts
RAILS_ENV=production bin/tootctl search deploy --only statuses
Particularly the second command can run for many hours (it ran about 18 hours on my tiny, single user instance), so I recommend using something like tmux
to ensure it doesn't stop if you get disconnected, as well as running it during off-peak hours, if possible. You might want to run them with a lower priority, too, to ensure the rest of the system remains stable:
RAILS_ENV=production nice -n 19 bin/tootctl search deploy --only accounts
RAILS_ENV=production nice -n 19 bin/tootctl search deploy --only statuses
Using Extended Search
This patch also greatly extends Mastodon's existing advanced search syntax. Here are the details on the new operators. All can be inverted by prefixing them with -
, except for before:
, after:
, scope:
, and sort:
.
Accounts
-
#hashtag
: find only accounts that are tagged with#hashtag
-
:emoji:
: find only accounts with bios or display names that contain the custom emoji:emoji:
-
domain:mastodon.social
: find only accounts onmastodon.social
-
is:
-
is:bot
: account is a bot -
is:group
: account is a group -
is:local
: account is on this instance
-
-
scope:following
: restrict search to users that the searching user is following
Posts
-
from:local_username
orfrom:username@domain.tld
: find posts from a specific use- This is actually a vanilla Mastodon feature, but is not documented anywhere.
-
#hashtag
: find only posts that are tagged with#hashtag
-
:emoji:
: find only posts that contain the custom emoji:emoji:
-
domain:mastodon.social
: find only posts frommastodon.social
-
lang:es
: find posts in Spanish -
is:
-
is:bot
: account that created the post is a bot -
is:group
: account that created the post is a group -
is:local
: post is on this instance -
is:local_only
: post is a local-only post- This operator only applies to Glitch and Hometown, vanilla Mastodon doesn't have these.
-
is:reply
: post is a reply to another post -
is:sensitive
: post is marked as sensitive/🔞
-
-
has:
-
has:cw
,has:spoiler
,has:warning
: post has a content warning -
has:link
: post has at least one link that is not a mention or hashtag, as determined by the parser used fetch link preview cards -
has:media
: post has at least one media attachment -
has:audio
,has:gifv
,has:image
,has:video
: post has a media attachment of the specified type -
has:poll
: post has a poll
-
-
before:
,after:
with a date:- Maps to ES date range search
-
scope:classic
: restrict search to current user's bookmarks, favs, boosts, own posts, etc. as in vanilla stodon -
scope:following
: restrict search to posts from users that the searching user is following -
sort:
-
sort:newest
: display newest posts first (default) -
sort:oldest
: display oldest posts first
-
Summary
If you want to find mastodon posts using full-text search, Google may be a viable option. But if you are running your own instance, you can also install a patch to enable proper full-text search on it, which made me enjoy Mastodon even more.
As I'm running a single user instance, the risk of harm from enabling extended full-text search on my instance is small: It's only me using it, so I only need to keep an eye on myself to prevent abuse.