Mozilla festival: from 2 out of 10 to a straight 10

The day started off badly.

I took my daughter, on Saturday, along to the Mozilla Festival to give her an idea of some of the stuff I work with. Only to get stopped at the entrance to the building because the college hosting the event had a strict ‘no children’ policy. And despite the best efforts of Mozilla, there was nothing we could do about it. Ironic really as a big part of the event was about children and education.

But, she was allowed into the Hive London Pop-Up event at midday. So we went off, I consoled her with a chocolate milkshake at The O2, and went back later.

Then her day really took off. First she had a go creating a dreamyard.

1.png

Then she went on to make a video and podcast of her dream yard. Complete with learning how to cut and edit the report with WNYC Radio Rookies

Screen shot 2011-11-06 at 20.49.07.png Screen shot 2011-11-06 at 20.49.25.png

Then on to the Digital-Me stand where she learnt how to make interviews. And went around interviewing anyone she could find. Including…

The Firefoxes

Screen shot 2011-11-06 at 20.52.03.png

and even Mitchell Baker!

Screen shot 2011-11-06 at 20.52.32.png

She even made the festival news.

And now she is sitting next to me editing CSS after learning how to hack the web with Mozilla’s Hackasaurus project.

As she said, “the day started off 2 out of 10 and went on to 10 out of 10″

Thank you Mozilla for a wonderful event. I think you may have changed another young person’s life for the better.

Canvas pixel data and security

I wanted to add brightness and contrast controls to my web application. However, this is not a standard web feature. CSS just gives control over opacity. While Internet Explorer has the concept of a filter, this only exists in Microsoft’s ever decreasing world.

So I am driven to extracting the pixels from an image and getting all mathematical on them. The way to do this is draw the image in a canvas and extract the pixels. It should be easy, no? No. Not when the image is hosted on another server.

Lets say I get an image from another domain:

var img = new Image();
img.src = "http://another.domain/myimage.jpg";

Then draw it in a canvas:

canvas = document.createElement("canvas");
var ctx = canvas.getContext("2d");
var w = img.offsetWidth;
var h = img.offsetHeight;
ctx.drawImage(dataImg,0,0,w,h);

And get the pixels:

var dataDesc = ctx.getImageData(rect.left, rect.top, rect.width, rect.height);

Boom… security exception. There is a new solution that comes as a side-effect of the same security being applied to WebGL. Now if only I can get the server to apply CORS

Steve Jobs and the Arabic Mac

My deep felt condolences to all at Apple for the loss of its founder, mentor and CEO, Steve Jobs. I, and my company, Diwan, have been working with Apple since 1980 till now - making the first Arabic personal computer in 1981, the Apple ][, all the way to the standard Arabic typeface for iOS.

Who knew that a calligraphy course Steve attended after dropping out of college would have led to a revolution in the whole of the printing and communications industry. It was proportional fonts on the Mac that made professional quality Arabic printing from personal computers possible and that in turn changed forever the way newspapers, books and all information was created and published in the Arab world. Without trying to sound too clichéd, when historians look back on this time, they will see this as significant a milestone in human development as the invention of the printing press. Jobs was the Gutenberg of the modern era.

The company and products he started have been an inspiration throughout my professional career. To me Apple is one of those rare organisations that has become more than a company but a world institution that inspires devotion from everyone who works with it or uses its products. The ideas it nurtured has created whole industries many times more valuable than its current stock price. But this is only the beginning and there is still so much that can be done. Apple now has the basis for an amazing future and I look forward to seeing that develop as much as I feel sad for the present.

pdf.js … very cool

PDF.js is a new project from Mozilla to render PDFs using pure Javascript and HTML. So I thought to put it through its paces - what would happen if I gave it a PDF of one of Diwan’s most complex Arabic fonts to render. This usually stretches Adobe’s own Acrobat. Here is the result (click on the image to see it full size):

Yes, that is a 3000 glyph font with complex transformations rendering in a standard web page. Very cool.

The trials of Bidi IRIs

Bidi IRIs are a divisive issue - how would you display an English URL to to an Arabic user (is it www.goggle.com or com.goggle.www?) and if that URL is in Arabic how would that be ordered? As the old joke goes, put three experts in a room to discuss this, you will get four opinions. Thanks to a new proposal by Mark Davis there has been some narrowing of the consensus on this issue. This created a discussion which can be summarized in the following mind map (click to expand):
Bidi Iri

Now that makes more sense, doesn’t it?

A safe way to keep a github and svn repository in sync

My current project requires me to share code with another team who use svn while my team are on github. To keep commits up to date I have to keep both version control systems in sync so that merges are easy. After some experimentation and failures I have worked out a relatively safe way to let the two coexist without any nasty conflicts. There is varied documentation on the web on how to do this e.g on github help and stackoverflow.

It is generally a bad thing to allow github and svn push changes into the same repo. And scary things can happen to the history, which I found out the hard way (luckily git rebase and reset are real lifesavers).

Git is built on the concept of a local repository which is kept in sync with one or more remote repositories. As far as the svn connection works, git treats it as another remote repository with a difference in the way it syncs with the local repo.

My local git repo is set up with two remotes - one is svn the other is github. To keep the two worlds safe I will make two branches. One tracks only the svn remote and the other tracks only the github remote. Then I treat the two branches as if they were separate forks. So I do not pull from another remote onto a branch but I use git-merge instead.

Setting Up

I start with a git repo that has a github origin and I create a branch for the work that will be tracked on github and shared with the svn branch - this is a normal git branch or it could be master.

git branch mygitbranch
git checkout mygitbranch
git push origin mygitbranch

Separately, I have an svn repo with the same version of the source files as github.

Now, I create the svn-tracking branch as a non-tracked branch in my local git repo:

git branch --no-track mysvnbranch  <id of very first commit>
git checkout mysvnbranch

Next connect to the svn repo, pull the source into this branch and force the svn repo changes to be the only changes in this branch.

git svn init https://my.svnrepo.org/svn/repository/whatever/ -s
git svn fetch
git reset --hard remotes/trunk

So my tree now looks something like this (with the tracked remote in bracket):

_.___ master (remotes/origin/master)
   |._ mygitbranch (remotes/origin/mygitbranch)
_.__._ mysvnbranch (remotes/trunk)

Merging

Down the road, there are changes in the github-tracked branch and I need to sync these to the svn repo. e.g.

git checkout mygitbranch
git pull origin mygitbranch

_.___ master (remotes/origin/master)
   |__._.__._ mygitbranch (remotes/origin/mygitbranch)
_.__._ mysvnbranch (remotes/trunk)

Switch to the svn tracked branch, merge the differences with the github tracked branch and commit the changes. But also remember to rebase on to the svn changes first:

git checkout mysvnbranch
git svn rebase    (need to fetch and rebase any changes that were made in svn)
git merge origin/mygitbranch
git mergetool     (if there are conflicts that cannot be automatically resolved)
git commit
git svn dcommit

So the branches will now look like this:

_.___ master (remotes/origin/master)
   |__._.__._ mygitbranch (remotes/origin/mygitbranch)
_.__.________\._ mysvnbranch (remotes/trunk)

And similarly for the reverse case:

git checkout mygitbranch
git pull origin mygitbranch
git merge remotes/trunk
git commit
git push origin mygitbranch

Th point here is that the svn-tracked branch will only ever track the svn remote. It is possible to push mysvnbranch to github but the danger is that it will receive other pushes and that leads to the problem of keeping two incompatible histories in sync.

Discussions on dir=uba and paragraph direction

Yet another cryptic post. I am following the w3c mailing list public-i18n-bidi discussing the draft proposal for improving bi-di languages support in HTML5. The thread on dir=uba and providing auto direction in HTML has grown too long for any new reader to safely traverse. We are approaching the cut-off time sooo, to make life easier, I have extracted just the key points in that thread to track the development of the proposal without the justifications. Hope it helps.

The discussion started with Aharon stating:

I am having some very strong doubts about the per-paragraph auto-direction (i.e. dir=uba) feature proposed by the f2f meeting. [...] I think that the feature is too complicated to justify in the absence of overwhelming need. I think that we simply got carried away at the f2f, and proposing the feature to broader circles will harm the overall proposal’s chances for acceptance and implementation.

We then went on to discuss its possible simplification:

Aharon:

we could allow dir=uba on just two elements: <textarea> and a new element that we could call <textareadiv> or perhaps <plaintext>. The latter would be just like a <pre> except that:

  • It would not allow mark-up (in the same way that <textarea> does not allow mark-up).
  • It would have the same script-accessible properties as <textarea>, including a settable value property.

So, should I include the <plaintext> version of dir=uba in the next draft of the proposal?

Yet another possibility is to forego <plaintext>, but specify that dir=uba only works in the per-paragraph mode when the element does not have any children that are elements (i.e. it congtains no mark-up). Otherwise, it falls back to first-strong (i.e. gives all its UBA paragraphs the same direction). I am pretty sure that this would be quite simple to implement, but have no idea if such a definition would fly in a spec.

I’ve had sufficient negative feedback on suggesting a new element to kill that idea.

—-

Aharon:

[the] proposal [is] that there is no restriction on the elements on which dir=uba can appear, but it will act as first-strong (not per-paragraph) on any element that has any child elements, and only work in the per-paragraph mode otherwise?

the check for children elements would be made at the time that dir=uba is encountered on an element, before any content has been processed

fantasai:

I think it would make more sense for dir=uba to do first-strong to set the nominal direction (that can be selected against, and inherits to children) but then do per-paragraph during bidi resolution for each paragraph that is directly within the element. 

That way, it’s defined what happens for children and I don’t need to recurse the uba settings, but I also don’t need to swap behavior as soon as someone inserts a <div/>. I just ignore the uba setting once I’m in the <div/>.

it might make sense for dir=uba to be restricted to <pre> and <textarea> elements (where it would do per-paragraph) and to inline elements (where it would be the same as first-strong).

Matitiahu:

1) dir=uba is mostly needed for <pre> and <textarea> elements. 

2) All or most of the problems are related to using dir=uba with <pre>. 

3) There are alternatives to using dir=uba with <pre> for multiple paragraphs, like separating the text in distinct paragraphs. 

4) There is no problem related to using dir=uba with <textarea>. 

5) There is no other way than dir=uba to achieve paragraph-based direction for <textarea>. 

at least allow dir=uba for <textarea>

Aharon:

Here is a quick spec for dir=auto:

  • Using dir=auto with autodirmethod=first-strong|any-rtl would:
    • Make the default value for the ubi attribute ubi (i.e. on), as described elsewhere.
    • Set the CSS direction to ltr or rtl according to the indicated algorithm.
    • Invoke the indicated algorithm on the in-order traversal of the descendent text nodes, with the following exceptions:
      • Text nodes under a descendant element with an explicit dir attribute (including dir=auto).
      • The part of the text after the first X characters (where the text in nodes excluded above are not part of the count). Do we need this? If so, what’s a good X value? 100?
      • Parts of the text between an LRE, RLE, LRO, RLO, and its matching PDF.
    • The first-strong algorithm returns the direction of the first strong (L, AL, or R) character it encounters. If it does not encounter any, it returns ltr if it encounters any weak ltr characters (EN or AN). If it does not encounter any of those either, it returns the inherited direction.
    • The any-rtl algorithm returns rtl if it encounters any strong RTL character, or ltr otherwise.
  • Using dir=auto with autodirmethod=uba would (by default) set unicode-bidi to “uba” and direction according to first-strong. (Note that this includes leaving direction at the inherited value if the content is neutral.)
  • For elements other than <textarea>, unicode-bidi:uba is treated as unicode-bidi:isolate.
  • On <textarea>, unicode-bidi:uba means that:
    • The UBA on the textarea content is invoked specifying only a default paragraph level (in icu4j terminology, either LEVEL_DEFAULT_LTR or LEVEL_DEFAULT_RTL), based on the the element’s own direction value as calculated above. (This makes the all-neutral paragraphs use the same direction as the first paragraph that is not all-neutral.)
    • Each UBA paragraph’s lines’ alignment is determined by the paragraph’s resolved base level when the element’s text-align is start or end.

Is this agreeable to everyone?

fantasai:

This is too complicated. If uba cannot in fact be triggered on anything other than a <textarea>, then it should not be allowed on anything other than a <textarea>. I suggest having 

  dir=ltr|rtl|auto|plaintext   autodirmethod=first-strong|any-rtl 

I suggest “plaintext” instead of “uba” because it’s clearer what the behavior and the intended use case is. (Since we’re only allowing uba on <textarea> and using dir=auto for first-strong, we don’t need the name to be so cryptically short.) 

And I think that for any-rtl having an X value is both better for performance and more likely to give good results. If the first X characters are LTR, where X is longer than most LTR phrases commonly imported into RTL text, chances are any RTL characters after that are not indicating the paragraph’s main direction.

fatasai:

I’m going to advocate X = 63, since I can’t think of any common strings (other than long URLs) that would hit that limit. 100 seems okay, too.

Mati:

I tend to favor a larger X, let’s say 255.

Addison:

define ‘uba’ (or whatever) in general terms as “another value for attribute ‘dir’”. In most cases, ‘uba’ behaves the same as ‘auto’. In some special cases (second paragraphs of a <textarea>) it can have special behavior. 

The behavior should be deterministic and should be described in terms that do not conflict with (e.g.) CSS selectors. If it cannot, my tendency is to get rid of it and recommend better markup or “active” solutions.

Aharon:

So, do we have to get rid of dir=auto or uba completely?

And if not, should we resurrect uba for elements with no child elements, not just <textarea>?

fatasai:

since there are solid, common use cases for these features, I don’t think we should drop them.

Aharon:

if we do say that a string containing no strong characters, but containing AN should be considered RTL, we have to figure out what to do with a string containing no strong characters, but containing both AN and EN.

CE Whitehead:

I think also that 63 is a good value for first strong directionality determination algorithms.

Search neutrality keeps the internet dynamic

My letter was published in the Financial Times today. Here, for posterity, is the letter in full…

Search neutrality keeps the internet dynamic

Published: Financial Times July 20 2010 02:15

From Mr Adil Allawi

Sir, Google’s Marissa Mayer (“Do not neutralise the web’s endless search”, July 15) is right to say that government regulation will stifle innovation in a still rapidly developing market. Regulation seems to be the knee-jerk reaction of the state when the free market fails to counter monopolies. But in the modern world of fast technological development it will simply not work. It will protect more inefficient businesses and throttle competition to the disadvantage of the consumer and the wider economy.

However, Ms Mayer’s suggestion that users be left to choose for themselves in no way addresses the huge imbalance of power that Google wields over large sections of the economy. Even competition from Microsoft’s Bing may not create the neutrality desired, as Microsoft has its own competing services that it may prefer to promote.

A more viable response would be for governments to innovate themselves and invest in open source alternatives. Such investment is most effective not when it is given without condition but when it is targeted at making the software available to ordinary users. In 1999, the German government faced a situation where strong encryption technology was either tightly restricted or unavailable to non-technical users. Its response was to fund the development of an open source project called GPG. This opened the market for software using encryption in Germany and had wider benefits for the rest of the world. The same can be seen when a variety of investors and companies supported the Mozilla project to create a viable alternative to Microsoft’s Internet Explorer. The web would not be the dynamic market it is today without it.

Search has become as important to the world economy as the internet itself and a viable open search engine will have the benefit of ensuring market neutrality. One could argue that the market will invest in this alternative, but, as can be seen by Yahoo’s exit from the search technology business, the size of this investment may be more than private capital can manage.

Adil Allawi,
Technical Director,
Diwan Software,
London SE5, UK

Hacking dir=uba

I spent the past week hacking dir=uba support into Gecko. Here are the problems I have hit:

HTML and CSS incompatibility

HTML will have the dir=uba attribute inherited to child elements but there is no matching CSS direction:uba. The idea is that the direction will be calculated from the content and then set in the CSS. So if one queries an element for its calculated direction from Javascript you should get back either rtl or ltr depending on the content.

dir=uba also sets the following CSS:

[dir="uba"] {
  direction: inherit;
  unicode-bidi: plaintext isolate;
}

Gecko handles attributes that are inherited by setting CSS values and let those get inherited. This currently works for all cases except dir=uba as there is no CSS equivalent to uba. However, it is possible to created a simple HTML attribute inheritance by inheriting dir=uba when an element is attached to the content tree in nsGenericHTMLElement::BindToTree.

What I would like to do is set a flag in the content node but dir=uba sets unicode-bidi and unicode-bidi is a CSS attribute that is not inherited. So I am setting a full attribute on the element - I still need figure out a way to set unicode-bidi on the basis of the inherited dir value without explicitly setting a dir attribute.

Redefinition of unicode-bidi

Gecko treats unicode-bidi as a property that takes a single value. Now it needs to take multiple values. So the internal representation needs to change from eCSSType_Value, to eCSSType_ValueList and all the relevant code should be changed.

Applying the CSS direction

Gecko parses the HTML and CSS code in a number of passes that run in the following order:

1/ read in the HTML content and build the basic content tree and construct the internal frames.

#6	0x1123b0788 in nsCSSFrameConstructor::ConstructFrameFromItemInternal at nsCSSFrameConstructor.cpp:3820
#7	0x1123b0c6b in nsCSSFrameConstructor::ConstructFramesFromItem at nsCSSFrameConstructor.cpp:5465
#8	0x1123b1388 in nsCSSFrameConstructor::ConstructFrame at nsCSSFrameConstructor.cpp:5012
#9	0x1123b173d in nsCSSFrameConstructor::CreateAnonymousFrames at nsCSSFrameConstructor.cpp:3921
#10	0x1123b1890 in nsCSSFrameConstructor::BeginBuildingScrollFrame at nsCSSFrameConstructor.cpp:4273
#11	0x1123b1fa4 in nsCSSFrameConstructor::SetUpDocElementContainingBlock at nsCSSFrameConstructor.cpp:2789
#12	0x1123b3c95 in nsCSSFrameConstructor::ConstructDocElementFrame at nsCSSFrameConstructor.cpp:2325
#13	0x1123b48cf in nsCSSFrameConstructor::ContentRangeInserted at nsCSSFrameConstructor.cpp:6947
#14	0x1123b57d1 in nsCSSFrameConstructor::ContentInserted at nsCSSFrameConstructor.cpp:6844
#15	0x112410b98 in PresShell::InitialReflow at nsPresShell.cpp:2616
#16	0x1126113a8 in nsContentSink::StartLayout at nsContentSink.cpp:1279

2/ process events to parse and apply attributes and CSS

#1	0x112758a3f in nsHTMLDivElement::ParseAttribute at nsHTMLDivElement.cpp:137
#2	0x112683e2a in nsGenericElement::SetAttr at nsGenericElement.cpp:4582
#3	0x11273d575 in nsGenericHTMLElement::SetAttr at nsGenericHTMLElement.cpp:1198
#4	0x1129ef4fa in nsHtml5TreeOperation::Perform at nsHtml5TreeOperation.cpp:461

3/ first part of reflow - to run through the frames and calculate their positions and bidi state

#1	0x112427f8e in nsBlockFrame::ResolveBidi at nsBlockFrame.cpp:6910
#2	0x112434cb3 in nsBlockFrame::Reflow at nsBlockFrame.cpp:966
#3	0x112441297 in nsContainerFrame::ReflowChild at nsContainerFrame.cpp:738
#4	0x11246e957 in nsCanvasFrame::Reflow at nsCanvasFrame.cpp:496
#5	0x112441297 in nsContainerFrame::ReflowChild at nsContainerFrame.cpp:738
#6	0x112465b11 in nsHTMLScrollFrame::ReflowScrolledFrame at nsGfxScrollFrame.cpp:508
#7	0x1124662ee in nsHTMLScrollFrame::ReflowContents at nsGfxScrollFrame.cpp:601
#8	0x1124685de in nsHTMLScrollFrame::Reflow at nsGfxScrollFrame.cpp:807
#9	0x112441297 in nsContainerFrame::ReflowChild at nsContainerFrame.cpp:738
#10	0x1124c5bb7 in ViewportFrame::Reflow at nsViewportFrame.cpp:285
#11	0x112407b6f in PresShell::DoReflow at nsPresShell.cpp:7427

4/ second part of the reflow to reposition the frames that need to move

#0	0x11242250f in nsBidiPresUtils::Reorder at nsBidiPresUtils.cpp:821
#1	0x112422b57 in nsBidiPresUtils::ReorderFrames at nsBidiPresUtils.cpp:814
#2	0x11243105a in nsBlockFrame::PlaceLine at nsBlockFrame.cpp:4146
#3	0x112431b02 in nsBlockFrame::DoReflowInlineFrames at nsBlockFrame.cpp:3651
#4	0x112431e70 in nsBlockFrame::ReflowInlineFrames at nsBlockFrame.cpp:3371
#5	0x1124332db in nsBlockFrame::ReflowLine at nsBlockFrame.cpp:2467
#6	0x112433aad in nsBlockFrame::ReflowDirtyLines at nsBlockFrame.cpp:1907

I handle inheritance of dir=uba at stage 2, and calculate the direction in ResolveBidi() as it is already doing the hard work of extracting the text.

However this is not the right place. At the beginning of the reflow cycle the direction value is cached in a class of type nsHTMLReflowState. So need a new function to walk the content tree to calculate the UBA direction at the beginning of stage 3 and set a flag in the current presentation context. This flag can then be used when resolving the CSS direction.

Anyway at the end of all this I have a functional,if incomplete implementation that can be used for a proof of concept. You can download and try the patch from bugzilla. You do know how to build Firefox right?

Hacking Mozilla in XCode

The MDC instructions for debugging mozilla in XCode are fine but as soon as you try to build Mozilla from XCode, you hit a few obstacles.

Add a new Target of type External Target. For the settings:

Screen shot 2010-06-17 at 11.50.53.png

Note the PATH - needs to point to the MacPorts directory.

Now it will build OK but debugging will fail with a message “”The active architecture i386 is not present in the executable ‘Executable’ which contains x86_64″, but you cannot set the architecture from the menu.

To get around this, edit the project settings and add a user-defined setting: ARCHS with value: x86_64.

Now if I can only figure out why incremental building is not working…

Update 1 - found it in the Build FAQmake -C browser/app … obvious really.

Update 2 - worked out an optimal mozconfig for fast incremental linking and hacking:

. $topsrcdir/browser/config/mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir-ff-debug
mk_add_options MOZ_MAKE_FLAGS="-j4"
ac_add_options --disable-optimize
ac_add_options --disable-tests
ac_add_options --disable-static --disable-libxul
ac_add_options --disable-ipc
export MOZ_DEBUG_SYMBOLS=1
export CFLAGS="-gdwarf-2"
export CXXFLAGS="-gdwarf-2"

disable-static and disable-libxul makes it possible for me to build a single module without having to build the browser or XUL. and enabling debugger flags directly without the full debug build options speeds the link time and does not include a number of tests.