Confluence 2.5.6 : HTML to Markup Conversion for the Rich Text Editor
This page last changed on Jul 18, 2007 by tom@atlassian.com.
IntroductionThis component enables the rich Text Editor by converting HTML (created by the renderer, then edited by the user) into Confluence Wiki Markup. It works like this:
This document explains step 2 in some more detail. Most problems with this stage stem from difficulty in determining the correct amount of whitespace to put between two pieces of markup. Classes and ResponsibilitiesThis section briefly describes the main classes involved and their responsibilities. DefaultConfluenceWysiwygConverterConverts Wiki Markup to HTML to be given to the rich text editor, and converts edited HTML back to markup. Creates RenderContexts from pages and delegates the conversion operations to a WysiwygConverter instance. DefaultWysiwygConverterConverts Wiki Markup to XHTML to be given to the rich text editor, and converts edited XHTML back to markup. This class contains the guts of the HTML -> Markup conversion, and delegates the Markup -> HTML conversion to a WikiStyleRenderer, with the setRenderingForWysiwyg flag set to true in the RenderContext. WysiwygNodeConverterInterface for any class which can convert an HTML DOM tree into Markup. Can be implemented to convert particular macros back into markup. The macro class must implement WysiwygNodeConverter and give the macro's outer DIV a 'wysiwyg' attribute with the value 'macro:<macroname>'. StylesAggregates text styles as we traverse the HTML DOM tree. Immutable. Responsible for interpreting Node attributes as styles and decorating markup text with style and colour macros/markup. ListContextKeeps track of nested lists – the depth and the type. WysiwygLinkHelperJust a place to put some static methods for creating HTML attributes describing links, and for converting link HTML nodes into markup. Overview of the HTML to Markup Conversion ProcessPreprocessing the HTML
Converting the Document Fragment to MarkupThis uses the convertNode method, which has the honour of being the longest method in Atlassian (although not the most complex by cyclomatic complexity measures). The signature of this method is: String convertNode( Node node, Node previousSibling, Styles styles, ListContext listContext, boolean inTable, boolean inListItem, boolean ignoreText, boolean escapeWikiMarkup) That is, the method returns the markup needed to represent the HTML contained in the DOM tree, based on the current context (what styles have been applied by parent nodes, are we already in a table or a list and so on). The body of this method is a large case statement based on the type of the current node and the current state. The typical case gets the markup produced by its children, using the convertChildren method, decorates it in some way and returns the resulting string. The convertChildren method simply iterates over a node's children calling convertNode and concatenating the markup returned. In order to determine how much white space separates the markup produced by two sibling nodes we often need to know the type of each node. That is why convertNode takes a previousSibling argument. The getSep method takes the two nodes to be separated and some state information. t uses a lookup table to decide what type of whitespace (or other text) to use. Post-processing the markup
Worthwhile Style Improvements
Rendering in 'For Wysiwyg' ModeThe HTML produced by the renderer to be displayed by the Rich Text editor is not identical to that generated for display. It contains extra attributes which are cues to the conversion process. The following list isn't exhaustive, but gives the flavour of the types of considerations involved.
How To Fix BugsWriting TestsThe first thing to do is to write a failing test. At the moment all the tests are in com.atlassian.renderer.wysiwyg.TestSimpleMarkup. Keeping them al together is reasonable, as they run quickly and you will want to make sure that your fixes don't break any of the other tests. There are two types of test – markup tests and XHTML tests. Use a markup test when you have a piece of markup which doesn't 'round trip' correctly. For instance, perhaps the markup: * foo * bar becomes * foo * bar when you go from wiki markup mode to rich text mode and back again. testMarkup("* foo\n\n* bar");
which will check that the markup is the same after a round trip. Note that it is OK for markup to change in some circumstances – two different markup strings may be equivalent, and the round trip will convert the starting markup to 'canonical markup' which renders identically to the initial markup. There are also pathological cases where a round trip may switch markup between two equivalent strings – these should be fixed, even though they don't break the rendering as they show up as changes in the version history. If a bug is caused by the conversion of user-edited (or pasted) HTML into markup. testXHTML("...offending HTML...", "...desired markup...") This test first checks that the desired markup round-trips correctly, then that the HTML converts to that markup. Finding ProblemsOnce you have written your test you need to find out what the converter is doing. Running the test in debug mode and putting breakpoints in testMarkup/testXHTML is the best way of doing this. As you track down the nodes causing problems you can put breakpoints in the part of convertNode which handles the offending type of node. You can also set 'debug' to true in DefaultWysiwygConverter.java:44 – this will dump the XHTML produced by Neko, turn off the post-processing mentioned above, and print out details of the separator calculations in the generated markup string. So you might see: [li-li false,false] which means that two list items, not in a table and not in a (nested) list get separated by a newline. You can tweak the table of separators as needed. |
![]() |
Document generated by Confluence on Oct 10, 2007 18:36 |