Running Hedgedoc v1 with a global history and 1315 notes
Table of Contents
We recently deployed Hedgedoc v1 as a replacement for our aging Etherpad . During this process we encountered two major problems:
- Hedgedoc does not support a global history.
- The Pad search does not work with
-
.
While solving the first two problems we also encountered a third problem:
- Having over 1000 Notes available for a user made Hedgedoc (both frontend and backend) slow.
In this post I will describe how we solved all these problems. I will also go into some of the interesting technical details of our code.
The solution described here is not the sensible solution to the problems we’re faced. The section The sensible solution contains a sketch of a more reasonable solution. We rather took this problem as an opportunity to learn new things. Our solution nonetheless contains some interesting technical details.
What is Hedgedoc?⌗
Hedgedoc, like Etherpad, is a web-based collaborative real time editor. This means that you can edit documents with live in your browser (think Google Docs). The main difference between Hedgedoc and Etherpad is that Hedgedoc supports Markdown. Etherpad only supports Text and basic formating. Hedgedoc on the other hand supports Markdown with all the bells and whistles. You can have a look at the features note for an exhaustive list of all features.
Implementing a global history⌗
A list of all Notes or an explore page are being discussed for the rewrite of Hedgedoc that is currently in progress. But it seems to me that the rewrite won’t be complete for some time. So we decided to take matters into our own hands.
A global history can be patched in relatively few lines of code. When a user is logged a list of Notes that this user has visited previously can be retrieved at /history
. These notes are then display by the UI. We only have to change this API endpoint to instead return all Notes.
diff --git a/lib/history.js b/lib/history.js
index e0c16da5..ae47b380 100644
--- a/lib/history.js
+++ b/lib/history.js
@@ -11,6 +11,7 @@ const errors = require('./errors')
// public
const History = {
historyGet,
+ historyGetAll,
historyPost,
historyDelete,
updateHistory
@@ -63,6 +64,33 @@ function getHistory (userid, callback) {
})
}
+function getAllHistory (callback) {
+ models.Note.findAll()
+ .then(function (notes) {
+ const out = []
+
+ const getId = function (note) {
+ if (note.alias) {
+ return note.alias
+ } else {
+ return models.Note.encodeNoteId(note.id)
+ }
+ }
+
+ notes.forEach(note => out.push({
+ id: getId(note),
+ text: note.title,
+ time: note.updatedAt.getTime(),
+ tags: models.Note.parseNoteInfo(note.content).tags
+ }))
+ logger.info(`read history success: ${out}`)
+ return callback(null, { history: out })
+ }).catch(function (err) {
+ logger.error('read history failed: ' + err)
+ return callback(err, null)
+ })
+}
+
function setHistory (userid, history, callback) {
models.User.update({
history: JSON.stringify(parseHistoryToArray(history))
@@ -132,6 +160,18 @@ function historyGet (req, res) {
}
}
+function historyGetAll (req, res) {
+ if (req.isAuthenticated()) {
+ getAllHistory(function (err, history) {
+ if (err) return errors.errorInternalError(res)
+ if (!history) return errors.errorNotFound(res)
+ res.send(history)
+ })
+ } else {
+ return errors.errorForbidden(res)
+ }
+}
+
function historyPost (req, res) {
if (req.isAuthenticated()) {
const noteId = req.params.noteId
diff --git a/lib/web/historyRouter.js b/lib/web/historyRouter.js
index fa426bbb..97b0c3ef 100644
--- a/lib/web/historyRouter.js
+++ b/lib/web/historyRouter.js
@@ -7,7 +7,9 @@ const history = require('../history')
const historyRouter = module.exports = Router()
// get history
-historyRouter.get('/history', history.historyGet)
+historyRouter.get('/history', history.historyGetAll)
// post history
historyRouter.post('/history', urlencodedParser, history.historyPost)
// post history by note i
These two small patches are enough to satisfy our functionality requirements. But as you see the performance is lacking. The backend takes about 8 seconds to load the history and the frontend takes a few more seconds to be interactive.

This means that there are now two major problems two solve:
- the long response times of the backend
- the slow frontend
The cause⌗
The Markdown of a Note has to be parsed to extract the Notes tags from the frontmatter. Hedgedoc only extracts the tags from a Note once it retrieves the history. Because our history now contains all 1315 Notes this means that all Notes will be parsed for every request to the history endpoint. The extraction of the tags and connected parsing of a Notes’ markdown is what causes the long response times of the /history
endpoint.
The sensible thing⌗
The sensible thing would be to cache the tags of a Note in a database field. The database field is update when the Notes’ content is updated. This is technique is already being used for the title of a Note. The title of a Note also has to be derived from the content of the Note.
The tags are a comma separated string in the Notes’ frontmatter. A very simple implementation could simply cache the comma separated string from the front matter.
- Adapt the Note model to include an the additional field
tags
. - Update the value of the
tags
field when a Note is saved. This could be done together with the title infinishUpdateNote
inlib/realtime.js
.
function finishUpdateNote (note, _note, callback) {
if (!note || !note.server) return callback(null, null)
const body = note.server.document
const title = note.title = models.Note.parseNoteTitle(body)
const values = {
title,
content: body,
authorship: note.authorship,
lastchangeuserId: note.lastchangeuser,
lastchangeAt: Date.now()
}
_note.update(values).then(function (_note) {
saverSleep = false
return callback(null, _note)
}).catch(function (err) {
logger.error(err)
return callback(err, null)
})
}
- Extract the tags from the
tags
field when retrieving the history. This could be done ingetAllHistory
inlib/history.hs
.
diff --git a/lib/history.js b/lib/history.js
index ae47b380..86a7ce88 100644
--- a/lib/history.js
+++ b/lib/history.js
@@ -81,7 +81,7 @@ function getAllHistory (callback) {
id: getId(note),
text: note.title,
time: note.updatedAt.getTime(),
- tags: models.Note.parseNoteInfo(note.content).tags
+ tags: note.tags.split(",").map(tag => tag.trim())
}))
logger.info(`read history success: ${out}`)
return callback(null, { history: out })
Our solution⌗
To solve the problems we created
- a new frontend for the listing page in Svelte
- a new backend that provides the required data via a REST api with Django and django-rest-framework
We overlay the of the original Hedgedoc, the new listing backend and the listing frontend using a reverse proxy. The listing frontend retrieves the data from the backend using a REST api. The listing backend uses the same database as the Hedgedoc instance. It reads the Notes from the database and also saves the cached tags there.
To store our additional information we introduce some new models. The new models only have relations to Notes. Other Hedgedoc models have been omitted for simplicity.
- A Tag has a Name and store the Notes that have this Tag using a
n:m
relation. This is the Model that caches the Tags. - The NoteExtension model stores the date when the tags of a Note have last been updated.
With these models we cache the tags of the Notes. When the tags are read we first whether the note has been modified after the tags have been calculated the last time. If the note has been modified we update the tags. Then we return the tags.
The code for the backend and frontend are publicly available.
Some technical details⌗
Using Hedgedoc’s Database in Django⌗
Remember that Hedgedoc and the listing backend access the database. This means that we have to read the data created by Hedgedoc’s ORM in django. Adapting Django to read the data created by Hedgedoc is surprisingly easy.
- The names of the Models’ field have to match the name of the columns in the database.
- Add a
Meta
class to your Model. And setmanaged
to false anddb_table
to the name of the table in the database. Themanaged
indicates whether Django should modify the structure of the table. Ifmanaged
is false then django will not create, update or delete the table e.g. during migrations. Whether data can be written to the table is unaffected by this option.
class Session(models.Model):
sid = models.CharField(primary_key=True, max_length=36, editable=False)
expires = models.DateTimeField(editable=False)
data = models.TextField(editable=False)
createdAt = models.DateTimeField(editable=False)
updatedAt = models.DateTimeField(editable=False)
class Meta:
db_table = "Sessions"
managed = False
Assume that Hedgedocs Database is configured as hedgedoc
. Then creating the models already is enough to be able to interact with the data. We just additionally have to indicate which database Django should use with the using
method of the queryset.
# Retrieve all sessions
Session.objects.using("hedgedoc").all()
The access to the database can be made more comfortable with database routers . A database router decides for things:
- from which database should a model be read
- to which database should a model be written
- should a relation between two specific objects be allowed
- whether a migration should be applied on a database
With a database router we can automatically direct read and write operations to the correct database. We could also ensure that no or only the required migrations are applied to Hedgedocs database. With this Django’s default tables will never be created in Hedgedoc’s database.
Keeping the Tag cache up-to-date⌗
Out of the options we considered we decided on updating the tags on read (4). We only found at later that Hedgedoc already caches the title of a Note on write (3). Had we known that at the time of the decision we would probably have chosen option 3.
# | Option | Advantage | Disadvantage |
---|---|---|---|
1 | Calculate tags on demand | up-to-date | to high performance impact |
2 | Update tags after some time (e.g. 5 min) | tags may be outdated | |
3 | Update tags when notes are saved | up-to-date, very efficient | requires modifications in Hedgedoc |
4 | Update tags on read if the Note has been changed after the tags have been calculated | up-to-date, efficient |
Option 4 can be implement with a custom getter for the tags field on the Notes model, e.g. using @property
. The getter then behaves
- Check if the Note has been modified after the tags have last been updated. Update the tags if this is the case.
- Return the tags using the backing field.
@property
def tags(self):
if self.updatedAt > self.noteextension.tags_last_updated:
self.update_tags()
return self.tag_set
Managed attributes can be used with django-rest-framework by using a custom serializer that contains the attributes.
These queries span 3 tables (Notes, Tags and NoteExtensions). Take care to preload the related data with prefetch_related
. Otherwise the performance will degrade very quickly.