Yet another cryptic post. I am following the w3c mailing list public-i18n-bidi discussing the draft proposal for improving bi-di languages support in HTML5. The thread on dir=uba and providing auto direction in HTML has grown too long for any new reader to safely traverse. We are approaching the cut-off time sooo, to make life easier, I have extracted just the key points in that thread to track the development of the proposal without the justifications. Hope it helps.
The discussion started with Aharon stating:
I am having some very strong doubts about the per-paragraph auto-direction (i.e. dir=uba) feature proposed by the f2f meeting. [...] I think that the feature is too complicated to justify in the absence of overwhelming need. I think that we simply got carried away at the f2f, and proposing the feature to broader circles will harm the overall proposal’s chances for acceptance and implementation.
We then went on to discuss its possible simplification:
we could allow dir=uba on just two elements: <textarea> and a new element that we could call <textareadiv> or perhaps <plaintext>. The latter would be just like a <pre> except that:
- It would not allow mark-up (in the same way that <textarea> does not allow mark-up).
- It would have the same script-accessible properties as <textarea>, including a settable value property.
So, should I include the <plaintext> version of dir=uba in the next draft of the proposal?
Yet another possibility is to forego <plaintext>, but specify that dir=uba only works in the per-paragraph mode when the element does not have any children that are elements (i.e. it congtains no mark-up). Otherwise, it falls back to first-strong (i.e. gives all its UBA paragraphs the same direction). I am pretty sure that this would be quite simple to implement, but have no idea if such a definition would fly in a spec.
I’ve had sufficient negative feedback on suggesting a new element to kill that idea.
[the] proposal [is] that there is no restriction on the elements on which dir=uba can appear, but it will act as first-strong (not per-paragraph) on any element that has any child elements, and only work in the per-paragraph mode otherwise?
the check for children elements would be made at the time that dir=uba is encountered on an element, before any content has been processed
I think it would make more sense for dir=uba to do first-strong to set the nominal direction (that can be selected against, and inherits to children) but then do per-paragraph during bidi resolution for each paragraph that is directly within the element.
That way, it’s defined what happens for children and I don’t need to recurse the uba settings, but I also don’t need to swap behavior as soon as someone inserts a <div/>. I just ignore the uba setting once I’m in the <div/>.
it might make sense for dir=uba to be restricted to <pre> and <textarea> elements (where it would do per-paragraph) and to inline elements (where it would be the same as first-strong).
1) dir=uba is mostly needed for <pre> and <textarea> elements.
2) All or most of the problems are related to using dir=uba with <pre>.
3) There are alternatives to using dir=uba with <pre> for multiple paragraphs, like separating the text in distinct paragraphs.
4) There is no problem related to using dir=uba with <textarea>.
5) There is no other way than dir=uba to achieve paragraph-based direction for <textarea>.
at least allow dir=uba for <textarea>
Here is a quick spec for dir=auto:
- Using dir=auto with autodirmethod=first-strong|any-rtl would:
- Make the default value for the ubi attribute ubi (i.e. on), as described elsewhere.
- Set the CSS direction to ltr or rtl according to the indicated algorithm.
- Invoke the indicated algorithm on the in-order traversal of the descendent text nodes, with the following exceptions:
- Text nodes under a descendant element with an explicit dir attribute (including dir=auto).
- The part of the text after the first X characters (where the text in nodes excluded above are not part of the count). Do we need this? If so, what’s a good X value? 100?
- Parts of the text between an LRE, RLE, LRO, RLO, and its matching PDF.
- The first-strong algorithm returns the direction of the first strong (L, AL, or R) character it encounters. If it does not encounter any, it returns ltr if it encounters any weak ltr characters (EN or AN). If it does not encounter any of those either, it returns the inherited direction.
- The any-rtl algorithm returns rtl if it encounters any strong RTL character, or ltr otherwise.
- Using dir=auto with autodirmethod=uba would (by default) set unicode-bidi to “uba” and direction according to first-strong. (Note that this includes leaving direction at the inherited value if the content is neutral.)
- For elements other than <textarea>, unicode-bidi:uba is treated as unicode-bidi:isolate.
- On <textarea>, unicode-bidi:uba means that:
- The UBA on the textarea content is invoked specifying only a default paragraph level (in icu4j terminology, either LEVEL_DEFAULT_LTR or LEVEL_DEFAULT_RTL), based on the the element’s own direction value as calculated above. (This makes the all-neutral paragraphs use the same direction as the first paragraph that is not all-neutral.)
- Each UBA paragraph’s lines’ alignment is determined by the paragraph’s resolved base level when the element’s text-align is start or end.
Is this agreeable to everyone?
This is too complicated. If uba cannot in fact be triggered on anything other than a <textarea>, then it should not be allowed on anything other than a <textarea>. I suggest having
I suggest “plaintext” instead of “uba” because it’s clearer what the behavior and the intended use case is. (Since we’re only allowing uba on <textarea> and using dir=auto for first-strong, we don’t need the name to be so cryptically short.)
And I think that for any-rtl having an X value is both better for performance and more likely to give good results. If the first X characters are LTR, where X is longer than most LTR phrases commonly imported into RTL text, chances are any RTL characters after that are not indicating the paragraph’s main direction.
I’m going to advocate X = 63, since I can’t think of any common strings (other than long URLs) that would hit that limit. 100 seems okay, too.
I tend to favor a larger X, let’s say 255.
define ‘uba’ (or whatever) in general terms as “another value for attribute ‘dir’”. In most cases, ‘uba’ behaves the same as ‘auto’. In some special cases (second paragraphs of a <textarea>) it can have special behavior.
The behavior should be deterministic and should be described in terms that do not conflict with (e.g.) CSS selectors. If it cannot, my tendency is to get rid of it and recommend better markup or “active” solutions.
So, do we have to get rid of dir=auto or uba completely?
And if not, should we resurrect uba for elements with no child elements, not just <textarea>?
since there are solid, common use cases for these features, I don’t think we should drop them.
if we do say that a string containing no strong characters, but containing AN should be considered RTL, we have to figure out what to do with a string containing no strong characters, but containing both AN and EN.
I think also that 63 is a good value for first strong directionality determination algorithms.