I'm currently implementing a new event store for Nheko. This blog post should explain, why this is necessary, what is required of the new event store and how this actually turns out in practice. This post will be a bit more technical in nature, but I hope I can make it easy enough to understand.
What is an event store?
The life of a client dev basically consists of the following parts (simplified):
- Reading the spec.
- Implementing handling for different event types in the client.
- Answering questions from users and looking at their bug reports.
- Cursing at Synapse, the Spec and other Clients and complaining about their brokenness.
I may have left out most of the things you actually do and have blown some parts way out of proportion, but the important take away is: Matrix is a giant distributed event database and most features require you to parse, send, store, relate or otherwise handle events.
You have a few ways to receive events. The most important ones are:
- Regular event updates by long polling
/sync
- Paginating backwards in history by calling
/messages
- Fetching missing events via
/event
All persistent events can be addressed by their event_id
and for each
pagination token stream there is basically one order the events can be in. By
calling /sync
in a loop, you will receive events usually in the order your
server receives them in. For example consider the flow:
- Your sync is up to date
- Someone sends the message A
- You call
/sync
- You send the message B
- You call
/sync
If you ignore all the effects of network latency, other rooms, etc, your first sync should contain the message A and the second message should contain the message B, both in the timeline section for that specific room.
A message may be a reply or a reaction, edit or otherwise related to some event.
In some of those cases you may decide to fetch a related event by event id via
the /event
endpoint.
If your user scrolls up, they may want to see messages you do not have available
at the moment. To load messages before a certain point, you would call
/messages
with the prev_batch
token you got from /sync
. Be careful, while
synapse allows you to use your sync token here, this is not required by the spec
and other servers may not be as lenient! Messages you fetch by paginating
backwards may not neccessarily be ordered by arrival order on the server. They
may be ordered by their logical order you get from the federation DAG, which I
don't want to go into here. It's just something to be aware of (for now).
Now that is all well and good, but you need to do something with those events. While you can render them to the screen and then throw them away, that means your memory usage will grow proportionally to the amount of events you received. You may alleviate that a bit, by throwing away older rendered messages, but in any case you will need to fetch events from the server again, if the client gets restarted.
Alternatively you may want to store (some) events on disk, so that after a
restart you can show some events until /sync
returns with the up to date
state. This is what this blog post is about. How do you store them efficiently
and how do you access those stored/cached events?
Requirements for the event store
Note: We are talking about an event store form the "Timeline", which shows the messages. This post will not go into how you can store the room state for fast and easy access. Usually a key-value store of state events works well enough for that.
- Events need to be easily accessible by their id. If you need a related event, you just want to store the id for the relation, not the full event.
- You need to store the order of events. If you naively sort them by
origin_server_ts
, they will not be ordered by arrival time. If you couple that with lazily rendering events from the event store, the events will not be appearing at the bottom of the timeline, if an old server comes online, leading to weird notifications and a timeline, that not reflects/sync
- If you want to have a 1:1 mapping of a visible event to an event in the database, you need to somehow store the order of visible events as well as the order of all events relative to each other.
- If you receive or send an event, that relates to another event, this event
will have a key to which event it relates to (usually called
m.relates_to
). The event, that is the target of this relationship, doesn't know about there stalkers (the related events) though, since you received it, before the relations were even sent (usually). On the other hand, in many cases you want to render the event, that is being related to, differently, depending on its related events (think reactions, edits, etc). For that it makes sense to store a relation mapping with all the events relating to a certainevent_id
. - To paginate further backwards, you need to persist the
prev_batch
token. While only the most recent token is neccessary, it may be useful to persist all of them, if you ever want to shrink your event store and drop old events.
Overview of Nhekos implementation
This implementation is specific to Nheko. Specifically, Nheko uses lmdb (a key value store/B-Tree database). While the ideas can probably also be mapped to relational databases, it may not be efficient and there may be a better implementation for those databases.
Database events
This database is very simple. There is one of those per room and it simply
stores the mapping event_id
-> event
, where the first one is a string and
the second one is the full json of the event as it was received.
Databases event_order
and order_event
These are actually two databases (per room). The order of an event is a simple, 64-bit integer. Since you usually don't receive the first event in a room first, you need to be able to insert events before and after an already stored event (but not in between). This means an unsigned integer should start with half of the maximum, that can be represented. (While that limits the maximum number of messages, that can be stored, it is very unlikely you will ever reach that, without receiving a limited response in between, since you would need to receive many messages per nanosecond.)
You then store a mapping of that order
integer to the event_id
, so that you
can iterate events in order and you also store the reverse mapping, so that you
can easily figure out, where an event is positioned.
Additionally you store next to the event_id
the last batch_token of the
currently handled response, so that you can paginate backwards if needed.
Databases message_order
and order_message
This database is very Nheko-specific. Basically the idea is as follows: You
don't need to render all messages at once. You can just render them, when the
message at a certain position is shown. You could calculate the index
to
event_id
mapping on demand from the event_order
store, but this may be
expensive, if you don't want to render reactions, edits and other related events
at the "time" where they arrived, but rather at the location of the event they
relate to. So you need to store an additional order key for "visible" messages
instead of all messages. This is the message_order
database and
order_message
is the inverse mapping.
Database related
Matrix has the concept of related events. These can be replies, reactions,
redactions, edits, in chat verifications, call events and probably more in the
future. Most of those have the event_id
they relate to in m.relates_to
. To
easily access related events, we will store the related event ids as a value
under the event_id
they relate to as the key.
This makes it very easy to show for example reactions. You just have to iterate over the related events of each event and if they are a reaction, show them in a little bubble under that event. Similarly you could show edits that way, verification history as a single event or even implement basic threading on demand, i.e. have a button that focuses the timeline on all events that are a reply to the current event or where the current event is a reply to (transitively until a certain depth is reached).
Handling /sync
/sync
is how you usually receive events. A client usually should call /sync
in a loop. This request will block until a new event arrives at the server. This
is usually called long polling. Once /sync
returns, you will want to persist
the new events. Usually you do that in a loop over every room and then loop over
each message. You can distinguish the following kinds of messages:
- Redactions: When a message gets redacted, a redaction is sent to every
client. Clients generally should then remove the
content
key of the redacted event, so that the event is not displayed to the user anymore. With our event store this is fairly straightforward. Just read the event with the id specified unter theredacts
field, strip out thecontent
and write it again. You also want to inform the UI to rerender that event, if it is currently displayed to the user. - "Normal" messages, that should be displayed to the user. Those will need to
be added to the
events
database. As those events should be shown to the user, you will want to add their event_id to themessage <-> order
databases as well as theevent <-> order
databases. - "Hidden" events: Some events should not be shown to the user. Those should
generally still be added to then
events
andevent <-> order
databases, but you will not want to add them to themessage <-> order
database. - Related events: These can be messages from 2. or 3.. These events may not be
displayed or they are, but they also affect the rendering of other events.
You will, in addition to handling 2. or 3., you will also want to add those
events or rather their event ids to the
related
database. That way you can render for example reactions or edits. Replies work the other way around. Threads may work both ways, depending on how you want to display them. Don't forget to notify the UI, that the related events changed, so edits can update the message text or new reactions do get added and similar.
Additionally to storing the events, you will also want to update room state,
sync token, presence and such. This is out of scope of this document though.
Make sure to not accidentally store the same sync response twice, i.e. after
reconnecting because of a network issue. You can easily dedup /sync
responses
by verifying the sync token matches what you currently are expecting.
If your sync response contains a limited
field and this is set to true
, you
will have a gap in your event store. There are multiple ways to handle that: You
can throw away all other messages or you store the gap and fill it, when the
user views the room/gap. The former is rather easy to implement, the latter is
rather complicated, but may lead to nicer results.
Paginating via /messages
Paginating is pretty much the reverse operation of long polling. This is what
happens, when a user requests older history than currently stored, or when you
need to fill a gap (which can happen in forward and backward direction). When
paginating backwards, you can handle events pretty similar to /sync
, with a
few, small differences. You will not really need to handle redactions. Servers
should not return you a redaction for an event you didn't receive yet or at
least they should never return you the full event, which was redacted, and you
will have to handle the redaction then. Rather you may receive the event with
content
redacted or you don't receive it at all. Also you want to add the
event ids to the top of the timeline of course, when you are paginating
backwards instead of to the bottom. Furthermore you may receive related events,
where the event it relates to was not received yet. Make sure to handle that
correctly.
One thing to remember: You are not allowed to use a /sync
token to paginate.
While some homeservers, like synapse, tend to allow this, others don't. Always
use the prev_batch
token from /sync
or from /messages
.
Filling in missing events
Sometimes you may want to have an additional event available to render another
event, but you have neither received it via /sync
nor via pagination. In those
cases you can use the /<room_id>/event
endpoint to fetch specific events. This
can be useful to render replies or membership events nicely and similar. Usually
you don't know, where those events are positioned, so you only want to add them
to the events
database, until you actually receive them while syncing or
paginating.
Pending messages and local echo
Usually you want to show pending messages to the user, even though they haven't
arrived over /sync
yet. There are 2 ways to do that: Either you keep them in
memory, rendering them after all other messages or you store them in the
database as a normal message under their transaction id instead of event id and
then replace it, once it arrives over /sync
(or you use the event id returned
by /send
). How to keep track of the pending messages, queue them properly,
etc, is left as an exercise to the reader.
Limitations
This is a rather simple implementation of an event store. For this reason it
does not handle gappy timelines that well. Relations are also not stored that
optimized, you may want to store the actual reactions to a message instead of
only related events and iterating. This also limits /sync
responses to the
disk or database speed. As such this may have more CPU and IO overhead, than an
event store, which is mostly kept in RAM. This can be somewhat mitigated via
caches, but not completely. You also lose some of the power SQL databases can
give you. You will need to implement those higher level features yourself.
Benefits
- You basically don't need to keep any events in RAM. You can store everything on disk and only load the necessary events from disk, maybe with some caches. This means you basically should only need RAM to keep the UI and connection logic around. Everything else should just be caches, that can be dropped at anytime. Especially with a memory mapped database, this should have a nice effect on systems with lower amounts of RAM. This requires you to be able the message indices to some lazy models for your views though, which may have some CPU overhead.
- The event store is fairly simple and should be easy to debug, optimize and extend. You can implement it on most kinds of databases, although key-value stores are recommended.
- It is fairly easy to hide events by applying a filter on what gets stored in
the
message <-> order
databases, although changes to those filters may require you to rebuild those databases.
Closing thoughts
I hope you found this small introduction to Nhekos event store interesting. If you have any questions or ideas, come discuss them in #nheko:matrix.org. Maybe you have ideas on how to improve it or you want to compare or introduce your own event store? I'm interested in how other clients handle their events and store them for rendering to users. Thank you for your attention!