… is the subject of the W3C working draft for future improvements to bidi support in HTML5 and CSS3. Having formally moaned three years ago about the state of bidi in HTML, it looks like something will actually change. A few months ago a group was formed to formalise a proposal to the W3C about bidi, which generated the working draft. This week there was a face-to-face meeting of the most active participants in this proposal to try to bring about some consensus between the different parties and from this will come a formal proposal to the relevant standards bodies.

One problem I have with discussing a proposal is just that - its a proposal not something that you can try, poke and patch up. So I have started to work on a reference implementation in Gecko to bridge that gap (more on this later).

I am starting with the most controversial issue (for me) which is the support for auto direction in HTML.

Why auto direction is so controversial..

Direction in HTML is controlled, mainly, by the dir attribute and the corresponding CSS direction property. Right now the direction is always clear from the markup. At a simple level it is either left or right (unspecified defaults to left). From any view of the markup or CSS alone you can always know what the direction of the underlying text will be.

The only complication is to do with alignment and indents. These may be on the right or left depending on the direction. But as we know the direction while reading the the markup, setting up the indent and alignment is relatively trivial and can be set up while reading the data.

An “automatic” direction throws a small spanner in the works because now the direction cannot be known from the markup alone. One must first pass the content text through an algorithm. Along with this alignment and indents cannot be known as the HTML is parsed. Also CSS does not like this uncertainty so much so that dir=auto will not have a CSS equivalent so the layout can not be made on the CSS alone.

For the browser, this means a major change. Content will need to be parsed twice to set the value for direction-dependent layout. Also you cannot simply translate dir to CSS. Elements with combinations of rtl and ltr content embedded inside dir=auto would have an even harder time knowing their layout or to be drawn intelligibly.

For these reasons the original specification was to strictly limit dir=auto to a single element and single paragraphs.

But wait there’s more..

At the face to face meeting it was suggested that if the auto detection algorithm was the standard Unicode bidi direction detection then there is no problem having embedded elements that mix rtl and ltr as long as we follow some simple rules about what can embed and how. This was resolved to the following:

The values for dir will also include uba, auto, and normal, and the values for unicode-bidi, will also include uba.

1. the default dir for all elements is normal, with the exception of block elements whose parent is uba. These inherit uba.

2. elements with dir=normal have the same resolved direction (both the internal HTML “property” used for CSS purposes and the actual CSS property) as the parent element. It also sets the unicode-bidi CSS property to normal (unless ubi is explicitly on for that element). The primary purpose for explicitly stating dir=normal is to break dir=uba inheritance from the parent.

3. dir=uba sets the resolved direction (as defined above) of the element according to the UBA applied to its textual content. The textual content is the depth-first traversal of all text nodes (even if they have an explicit dir).

4. In the application of the UBA to textual content, if the text contains no characters of the bidi classes L, AL, or R, the resolved direction of the text is inherited.

5. dir=uba sets the unicode-bidi CSS property to uba.

6. The base directionality of a UBA paragraph (which is distinct from CSS direction, which it does not have) whose containing block element has unicode-bidi:uba is set according to the paragraph’s content using the UBA. A UBA paragraph’s lines’ alignment is determined by the paragraph’s base directionality when the text-align of the containing block element is start or end.

7. To clarify, when an inline element has dir=uba, its children do not inherit dir=uba, but do inherit the resolved direction of the inline element.

8. dir=uba implies ubi by default. If ubi is explicitly off on this element, the unicode-bidi value is “uba embed”. Otherwise, unicode-bidi is “uba isolate”.

9. TBD: what happens in textarea when the user sets an explicit direction via browser UI, for all dir values.

10. auto set the CSS direction to either ltr or rtl by a mechanism TBD.

So where do we go from here

If the rules look complex, that is because they are. And this will be a major hurdle to getting such a proposal adopted. I am fairly sure that if I throw these 10 rules at the designers of Internet Explorer with no justification they are going to tell me to take a long walk of a short pier.

So here is the point. For the web programmer, it can mean the freedom to build the web without having to decide on the direction to mark up the content. And, that is a big deal. This would be another step towards the ideal that all software is striving for. Getting to the point where a user of our products will come and say “I do this and it just works”.

The problem is will these 10 rules achieve this? I see problems, firstly the Unicode bidi algorithm, as detailed as it is, is far from perfect. The paragraph direction detection is very simple, in many cases too simple. Further, the HTML and CSS inheritance algorithm may have some fundamental problems. So the real question is how close to wu wei will it get us and is it really worth the effort?

That can only be answered by experience - and this takes me back to the beginning of this post. I will make a reference implementation so we can test it, poke it and patch it.