Buld Your Own: ProseMirror View
Let's build a view layer for ProseMirror from scratch! We won't build support for everything that ProseMirror provides with its official view library, but it will provide a taste of what is involved.
Introduction
We'll start with a simple ProseMirror sample that uses the standard schema. Our implementation won't support marks or decorations, and it won't support custom views, but it will demonstrate how to build the core components of a ProseMirror view library. Our implementation will also be made simple by not supporting all of the browsers that ProseMirror supports.
Open up the glitch for this tutorial and click Remix in the top right corner to get started and follow along.
Getting started
This editor is really basic. It doesn't include ProseMirror's keymap
plugin,
so it's not even possible to add new paragraphs. That's okay. We're going to
build our own EditorView
. It won't support ProseMirror view plugins, but we'll
use modern browser input events and implement sensible defaults.
Comment out the line to require EditorView
, create the file editor-view.js
and add it to the index.html
file. We'll build out this EditorView
step by
step until it works!
Our EditorView
just stores the state and sets the contentEditable
attribute
of the provided DOM element. Notice how this is enough for the browser to make
the content editable. The browser even lets us add paragraph breaks! But at the
moment, there is no relationship between the state and the DOM. Changes to one
won't affect the other. We'll build out this relationship one direction at a
time, starting with updating the DOM to match the state, and then eventually
updating the state in response to user interactions.
Rendering
To start, let's make a function to render one of the basic units of ProseMirror state: a ProseMirror node. Our function is going to return a DOM node representation of our ProseMirror node. For simplicity, we'll start with text nodes.
To add support for other kinds of nodes, we need to use the NodeSpec
for the
node to generate what ProseMirror calls a DOMOutputSpec
. In real schemas, the
DOMOutputSpec
can be quite complicated. The DOM representation can be made up
of a number of HTML elements, but we're going to keep things simple. For now,
let's assume every NodeSpec
returns a very simple structure from its toDOM
function.
We're not trying to practice recursion here, so we'll assume that the "hole",
where ProseMirror renders the children of the node, is the content of the one
and only DOM node. With this simplifying assumption, all we need to do is look
at the first item of the array returned by toDOM
to get the tag name.
The EditorView
is the view for the document, but we are going to need views
for all of the nodes of the model now that we can render them. Let's get that
structure in place.
Note that the EditorView
is just a special case of NodeView
, where the node
is the document node.
Feel familiar?
We're building something not unlike React's fiber tree. We've got a tree that's linked downwards and upwards, with each view having references to its parent and its children.
Taking out the trash
When we destroy the editor view, we have to remove the DOM event listener. Why
why do we set parent
to null
in the base class when we destroy a view? The
JavaScript engine automatically deletes unused objects in a process known as
garbage collection. We can help ensure that it deletes our views by removing
references to them. As long as we break the parent
reference, we've removed
one side of the reference cycle between a parent and a child, which makes it
easier for garbage collection to reason about whether the object is still in
use. Consult the Wikipedia entry for garbage collection for more details.
These views receive their initial DOM representation as an argument to their constructor, but they don't initially have any children. These views need to render their children. The editor, which starts out empty, will render its children, and they will return their children, and so on until we render the whole document!
If everything is working correctly, we should now see our paragraphs and our text in the document!
Updating the view
Now that we can construct an initial empty view and render all the children, we
need a way to update it when it changes. We'll leave this unimplemented in the
base class, but implement it for TextView
and NodeView
. The method will take
a new node, update the DOM to match, and return true
if it is possible to do
so. If it's not possible to update the view, the method will return false
.
We'll also need to make sure that we account for updating in updateChildren
.
When its possible to update a child, we'll do so. Otherwise, we'll destroy it
and recreate it. We'll also need to remove any extra children.
Optimization possibilities
It may not be efficient to destroy and recreate children all the time when some updates might be possible to perform in place. It's not possible to update the text nodes, but if we implemented node attributes we could set the attributes of the existing DOM node to match our new ProseMirror node. You may also note that adding a child anywhere but at the end of a node will destroy and recreate all the children after it. The real implementation tries to keep any children that did not change, even when adding and removing children. We're not going to try to handle that here, but note that this is not dissimilar from what React needs to do and how you can use keys to tell React about the identities of individual children.
Editing
Wow! That was a lot, but we're here. We can construct an editor view and it can render an editor state.
Now we can start working in the opposite direction: updating our state in
response to user input. Right now, our content is editable; the browser does
that for us, with the contenteditable
attribute. There is no standardized
behavior for editable content, though. Where one browser might add a new
paragraph tag in response to a user pressing enter, another might add a hard
break. That's why ProseMirror will prevent you from adding a new paragraph by
default! You have to add a keymap
plugin that says exactly what transactions
should run when you press the "return" key.
To implement our more fine-grained control over our content, we'll rely on the
Input Events standard, which modern browsers implement. It defines a
beforeinput
event and a set of input types that express common editing
activities. And the best part for us is that all of these events are cancelable.
That means we can start from a blank slate, and then start to build out the
interactions we want.
We've taken a really important step! By preventing the default actions of the browser, we'll make that the EditorState the source of truth for what should be in the DOM. The browser is no longer in control. We are!
Remind you of something?
React components create a virtual DOM, a description of what the DOM should
contain. React reconciles any of the differences between successive renders by
updating the real DOM to match. Your application state is the source of truth,
not the DOM. In a similar same way, EditorState
is a description of what the
editor should contain. The EditorView
is responsible for updating the DOM.
Now we can begin handling events, and making those events update our state!
Let's add methods to our EditorView
to dispatch a transaction and set state.
We can also handle our first input event to insert text.
What's going on!? If you've gotten this far, you should be able to type in the editor again. But no matter where you click and type, the letters appear at the top of the editor. We're not handling selection changes! ProseMirror thinks the selection begins at the start of the document, and moves it forward each time we insert text, but we're not telling it when the user sets a new selection. We'll have to tackle that next.
Selection
ProseMirror has a notion of a "position" within a document. Every editor state has a selection that spans a range of positions. Similarly, the HTML document has a selection that spans a range of boundary points within the document, where a boundary point is a point between two DOM nodes or between the characters of a DOM text node. We'll need to be able to change a ProseMirror selection into a DOM selection and vice versa.
The DOM denotes a boundary point using a reference node and an numberic offset
into the children of the node. In other words, boundary points are locations
within a tree. ProseMirror uses a linear reference system. Positions are single
numbers, with no reference to a node. Therefore, ProseMirror incorporates nodes
into its positions by considering certain node boundary to have a size. If the
position before a paragraph is n
, then the position just inside it is n+1
.
Let's begin by giving every view three getters, border
, pos
and size
. The
first of these will say how many positions we cross as we move across an edge of
the node. It will convey the difference between the position just before a node
and the position immediately before its content or, equivalently, the difference
between the position immediately after the content of a node and the position
just after the node itself. The second of these will give the position of the
node and the last of these will give the size of the node, including its content
and its border.
For non-leaf nodes, the border will be 1, indicating that we must increment the position by 1 when entering or leaving one of these nodes.
We'll also need a way to get a ProseMirror view for a given DOM node. We can do that by stashing references to the ProseMirror views on the nodes themselves.
Now we're ready to add a method to convert a DOM selection to a ProseMirror selection.
At this point, you should be able to click inside the editor and add text wherever you like. Things still aren't perfect, though. The cursor that the browser displays is not always where it should be. Inserting text may not always happen where we expect it to. We can tell ProseMirror where the selection should be, but when we render the editor it changes the DOM and that can cause the DOM selection to change. We'll need a way to transform a ProseMirror selection into a DOM selection and make sure to do that after we update the editor view.
This code will have one interesting subtlety. When we refer to a text position there are sometimes two equivalent DOM boundary points for the same ProseMirror position because text nodes have no border. When there are two adjacent text nodes, the boundary point after the last character of the first node and before the first character of the second node are the same ProseMirror position. We'll prefer not to make selections that include zero-length slices of text nodes, so we'll make sure that we only choose a boundary point at the very start or very end of a text node when it provides a good result. For example, we must use the very start or very end of a text node when that node is the very first or very last child, respectively. Otherwise, we'll prefer that selections start on the trailing edge of such a boundary and end on the leading edge.
TextViews are much simpler, because they don't have any children.
And now we can update our EditorView's update
method to keep the DOM selection
in sync with the state.
Now the ProseMirror and the DOM selection are always synchronized! From here, we
could start to implement more editor commands. If you're so inclined, try to use
the prosemirror-commands
package to implement a case for when inputType
is
insertParagraph
. Or you could try to implement a case for when inputType
is
deleteContentBackward
. Happy editing!