Thursday, September 10, 2009

Pedagogic validation of HTML

I have been trying to make HTML5 better for education by participating in the HTML5 effort at WHATWG for a few years. Recently also joined the W3C HTML5 Working Group. One of the things that might come out of this effort is an option in the HTML5 validator for pedagogic validation. I will try to explain what such an option should check for, and how it will be beneficial to teaching web development.

I have previously written about what I call the value of false XHTML. Now I have been joined by the HTML5 superfriends, who request an option for easy polyglot validation. Henri Sivonen, who no doubt is a parsing rules genius and exceptionally knowledgeable, basically replied that such a thing is very hard, and contains so many minute details, that it might be of no value. Did you know that a line feed immediately following the starting pre-tag is forbidden, when using XML parsing rules? I most certainly did not. (And I am still a bit unsure if I got it right...)

I usually also skip the tbody-tags when I do tables, but they are nevertheless automatically inserted into the DOM in HTML, just like head and body is. That will not work for a true polyglot document, since in XHTML the DOM will not be the same. Henri also suggests that using XHTML syntax might lead to a false understanding, as if one would believe that <script /> would be possible to write in normal HTML.

I commented on Zeldmans blog that from my perspective these are non issues. Let me tell you about the everyday problems I encounter as a teacher of markup languages, in addition to what a normal validation would reveal:

  • Students forgetting to quote attribute values, even though they contain multiple words.<img alt=My dog>
  • Students messing up the balance of the quotation marks: <img src="foo.jpg alt=My dog">
  • Students messing up the DOM since they do not (yet) know all the rules for when an element is implicitly closed by another elements starting tag.
  • Students using document.write (and eval) in their scripts - yes I explicitly tell them not to, but they don't always listen, do they?

For reasons like these I tell them to use XHTML syntax today, since that will catch most of these errors.

document.write and eval is outside the scope of HTML validation, but ECMAScript 5th edition strict mode and JSLint will take care of most such problems. What I would like is for HTML validation to have similar checks, checks that enforce good habits and helps to avoid rookie mistakes.

What's the problem with true polyglot documents then?

  • The minutiae. The stuff Henri Sivonen rigthly reminds us of. The stuff that should be saved for a later class, since it is so highly technical and frankly will scare some away from coding by hand.
  • The boolean attributes. Some HTML5 form elements may have a lot of those! Allowing them to exist in their shortened form would mean less markup to type (= happier students, less bandwidth required).

As you see, I do share some of the concerns about XHTML syntax. But today the benefits clearly outweigh the drawbacks, from a pedagogic perspective. But this naturally leads me to the conclusion that there should be some middle ground, a way to specify a pedagogic profile for validation - and voila - Sam Ruby has started to work on such a feature!

I will now explain what features such additional checks should have, according to my experience, and how they are beneficial. I will grade my suggestions from 1 to 5, in rising order of importance.

Avoid implicit rules

But check that what's explicit comply with them. One should as a newbie see a 1:1 correlation between the DOM and the markup.

All elements should be explicit

This would mean that:

  • Root-element (html), head and body tags should not be optional (grade 5).
  • tbody tags should not be optional (grade 1).

In order to avoid making classes boring, I usually teach HTML together with some CSS from day one. I do not teach HTML first for a few weeks, and then I teach CSS. Besides being boooring, this will lead to some students starting to use presentational elements and attributes, because they will really want to have design features from day one.

CSS rules (usually) apply to the DOM, and not the markup, e.g. you can not have have a table#foo > tr selector even if there are no tbody-tags in the code. The student would think that a tr is a child element of table, since implicitly added elements might not be taught until later. It is, however, not usually so that one starts using tables early on - since they are not used for layout when I teach CSS in conjunction with HTML, so I can live this particular check not being implemented, hence it is graded at 1 only.

Explicitly grouping meta data about a document in the head section, and specifically being able to put some scripts in the head and other scripts in the body, is however very essential. In HTML5 we might also see scoped style-elements, which makes the use of explicit head and body tags even more important.

All elements must be explicitly closed
  • All normal elements must have closing tags.(grade 5)
  • Void elements must have a trailing slash. (grade 3)

The use case for closing tags is really simple. Besides making things easier to understand, it also alerts students about implicitly closed elements. If they would try to include a table in a paragraph the validator would complain when it encounters the closing p-tag. For such reasons I tell my students that closing tags are mandatory, and I want a simple way to enforce that behavior.

All non-shortened attributes must have their values quoted. (grade 5)

As I've said above, this is a very common error. In the worst case scenario it might lead to very unexpected results. Look at this example, where the value attribute is supposed to contain the words Login name:

<input type=text value=Login name name=login>

This code snippet actually produces a DOM as if the markup had read:

<input type="text" value="Login" name="">

Arguably, enforcing quotation marks also leads to better readability.

There are a few attributes that might be exceptions to this rule:

  • If the only possible value is an integer.
  • If the only possible value is a keyword containing only US-ASCII letters.

However, enforcing good habits takes precedence over any other concern. I always start teaching the hardest possible rules, and the I gradually relax them. This works better than doing it the other way round.

Attribute values that contain > or = probably are errenous (grade 3)

<abbr title="Et cetera>etc</abbr>
<abbr title="Et cetera class="foo>etc</abbr>

These are two examples of mismatched quotation marks., Yes, it happens a lot that I look over a shoulder of a student and say that they have forgotten to close the attribute - even though they have syntax highligthing on in the editors. (Not everyone's a genius and some are color blind!) If I could have tools that took care of the easy stuff, I'd be able to spend more time explaining the real issues and everyone in my classroom would be happier.

Since at least the second error might be confused with valid use, this behavior probably should generate a notice, not a real error. Using the equal sign in an attribute value might be an indication of a real error, please check that your markup is correct.

Many of these errors would probably get reported even with todays validation rules. I am gunning for those edge cases where two errors even each other out so that they mask each other.

Language must be specified (grade 5)

This one is self-explenatory. There really should be a lang-attribute set on the root element. This actually should be a regular conformance critera, but since such a rule will wreck a lot of currently valid sites, that is probably not doable.

The alt attribute must always be present om images. (grade 5)

While the jury (maybe) still is out on whether this should be a regular conformance criteria (as I think it should) or not, at least the following could be said without any hesitation or doubt: No single argument against a mandatory alt-attribute applies to the learning situation. Even if we say that sites like Flickr should be able to be conformant even if users do not spply usable alt text and having considered every other option it is decided to make the alt attribute optional, those use cases for sure do not apply in the class room! If HTML5 eventually would go down that route, for this reason alone a pedagogic profile in the validator has earned its right to exist.

I am actually a bit reluctant to add this point. I fear it might re-open a can of worms and be taken as an argument against having alt as a mandatory attribute, since those who wish would now have a way of checking for its existence. However, I hope that everyone realizes that this is not the same discussion. My only point is, that if worst should come to worst, this feature is necessary.

What else?

I am going to give this some thought — and Sam Ruby a few initial test cases… After which I might revisit this subject and alter my list of things to test. Of course all feedback is welcome!

One thing that I've thought about is a check for code indentation. But first of all it is probably not possible to check for this in a reasonable manner. And would it be possible to agree on a standard? Nah! I don't think so.

P.S. If someone wonders why my blog has the word Thinkpad in its name, I do still have an ambition to document my joys and woes about using Fedora Linux on my Z61p (and on my still un-bought W700…). Patience, patience.

Friday, August 21, 2009

Web Education Rocks Indeed

Aarron's hand pointing to the OWEA vision taking shapeI had the privilege and pleasure to attend the WE Rock Summit. To me the meeting was a perfect illustration about the power and limitation (sic!) of the Internet. Living in Sweden and working for a public school where one does not have access to an abundance of money (to say the least), I have not been able to participate in SxSW or similar conferences in the USA. In fact, my only contact with the other participants, except for Chris Mills from England, had been through the net. That fact had no stopped us from getting to know each other and organize around the vision of bringing the best possible standard for Web education to schools, colleges and universities around the world.

OWEA Summit in progress onboard the Delta Queen Meeting in person was a lot like meeting old friends. We did not need to connect instantly, we were already connected! We did not need to get off to a flying start in our work. We had already started. The fact that a lot of discussions were tentative and that a lot of resolutions still remain to be done, is not an indication about anything but the fact that what we are trying to do is to a large extent an adventure into unchartered territory. To launch a Web Education Organization at this level is simply an undertaking that no one has made a realistic attempt to do before OWEA.

Leslie Jensen and her team at the Hunter MuseumAt the same time being able to meet in person took our productivity to a whole new level. There simply is a level of interaction that is made possible by being in the same room. For this opportunity I am deeply grateful to our sponsors. What they made possible was perfectly realized through the skills and the personal care of the local hosts in Chattanooga. Thanks to their welcoming attitude, personal warmth, attention to detail and zeal for our common vision, our meeting never felt like an ardous task. Productivity in discussions remained the product of enthusiasms. Yes, it was intense, and yes, we got tired. But we had fun all the time!

Me at Point Park leaning against a gun from the Civil warWhen the summit was over I stayed a couple of days. I attended a service at Olivet Baptist Church — since I wanted to experience real gospel worship. I looked at city sites like Bluff View and a few parks. I also went to the Tennesse Aquarium and Point Park on Lookout mountain. To round things off I was met with great hospitality from Aaron and Cathy Gustafson during Monday. When our faithful driver, Shaun, left me at the airport on Tuesday, it felt like I'd been away for a month, not a week.

What I bring home is fond memories of meeting wonderful people, renewed passion for Web Education and a great hope that OWEA will make a difference, yes, even make the world a slightly better one! And a longing to return for another visit to Chattanooga!

P.S. More photos are posted on my Flickr page.

Sunday, July 19, 2009

The value of false XHTML

I believe there is a value to using XHTML syntax for documents sent to the browsers as text/html. That seemed like the normal thing to do just a short time ago. Now it is increasingly being met with skepticism and even ridicule. I believe I've encountered every XHTML myth busting argument there is from the good people in the WHAT WG cabal, but I still see a value in using XHTML syntax. My arguments are not centered around forward compatibility, extension mechanisms or XSLT, even though they could apply — server side. My reasons for using XHTML syntax is to avoid errors, misunderstandings and rookie mistakes. Since I teach web development for a living I encounter a lot of those.

The issues

HTML 5 is clarifying what XHTML really is. A lot of web sites are using an XHTML doctype, even though the code will be parsed as just like ordinary HTML, i.e. they use false XHTML. But draconian error handling, including Unicode errors, altered CSS applicability, the breaking of 99 % of all JavaScript code in existence, including all major libraries, and of course, Microsoft's not implementing true XHTML at all, all of this will continue to make true XHTML a non option for anything else but experiments and edge cases for the foreseeable future.

Put in one sentence: Specifying an XHTML doctype does not make a document XHTML. As we know by now, the doctype serves only one purpose in the browser and that is to trigger standards mode (assuming a good doctype). And if a browser treats XHTML 1.0 strict exactly the same as it treats HTML 4.01 strict, why not opt for the latter? And as HTML 5 has no other mechanism to specify XHTML other than the MIME-type, what one might chose to call false XHTML is no longer possible to use.

On the other hand, XHTML syntax that previously was illegal in HTML, like explicitly closing void elements (br, hr, input, meta) with a trailing slash, is now fully permitted, although described as a transitional feature. Judging from the fact that most new sites still use a transitional doctype we can safely assume that there is nothing stopping us from using an XHTML-like syntax even in HTML 5. I will proceed to argue that it often even is a very good choice.

polyglot documents

Pages using such syntax have even got a recently popularized name: polyglot documents. Let us consider a few features a polyglot document will lack, being sent as false XHTML:

  • No namespace support, but HTML 5 will (probably) special case SVG and MathML, so the most sought after compound documents will still be possible to author.
  • No draconian errors. A feature most developers won't miss at all.
  • XML parsers that rely on the MIME-declaration will fail or refuse the document. There should be easy workarounds for that.

The list can be expanded. I just want to illustrate the fact that in the near future, any benefits of using XHTML syntax will only to small degree be related to XML technical features. Indeed, when HTML 5 lib has become widely available and integrated with all server side scripting languages, we are promised that all of today's XML-server side tools, will work equally well for non-polyglot HTML documents.

The continued benefit 1: XHTML syntax works like a coding convention

Every major project that involves more than one programmer will soon run into the need of following agreed upon standards for things like indentation, placement of braces, usage (or non-usage) of a space between arguments in function calls and definitions, etc. A programmer that does not know or care about this will quickly see his contributions be rejected and is probably considered non-employable.

Douglas Crockford has introduced coding conventions for JavaScript to many and his JSLint tool has options that will ensure that you follow them. HTML Tidy has options to clean up code, but other than that I know of no common code convention for HTML. I know that for many of my friends the beauty of XHTML has been the clean syntax. For reasons like the following:

  • Enforcing lower case element and attribute names are easier on the eye than code that SHOUT.
  • Enforcing citation marks around attribute values makes errors easier to avoid or spot.
  • Explicit closing of elements like li, tr, th, td and p, also make code easier to read. No guessing the intention (was the implicit close intended or just sloppiness?) makes it easier to work with other peoples code, or even code that I've written myself a while ago.

Let me elaborate that second point. One particular nasty problem occurs when attribute values are generated using server side scripts. Let's say that for a few iterations in an applications life a particular value is always a single word, like in "login". Suddenly another developer (or you) decide that it is better to use two words, like "login name". And since the code that generates this value might be miles apart, like in another file and module, from the template that outputs the actual HTML, one can not take for granted that such a change would not break anything. In a sentence: Quoting attribute values makes code more robust!

Counterargument: You can do that equally well in regular HTML

The primaryu counterargument usually sounds like this: But you don't need XHTML syntax. Nothing is topping you from using lower case tag names and attributes or the optional closing tags in regular HTML, if you wish. True. But nothing is enforcing it either! And there is no tool available for testing it, at least none that I know of.

Neither does this counterargument apply too all aspects of my second argument.

The continued benefit 2: XHTML syntax is good for beginners!

A few years ago Lachlan Hunt wrote that XHTML is too hard for beginners. There is basically two things that make me take a stance that is exactly opposite of his. The first is that he is talking about true XHTML, I am talking about false. The second would be that my main job for almost a decade has been to teach complete newbies about web development. I would not presume to know even half of what he knows about the minute details of markup languages. I dare say, however, that I know much more about teaching this stuff to students.

Coding conventions should be taught from day one!

Here is a rule for all teachers of all things coding. Demand that students should use strict coding conventions from day one. Do not think that it can be introduced at a later stage. Sloppy habits are formed from day one, and are much harder to get rid of once they have formed. Often when I look over the shoulder of a student, and see ghastly looking code, the student will say that it will be fixed later. Judging from nearly a decade of experience I know that it will not happen!

Bad habits get picked up from day one. They should therefore be punished from day one. Requiring XHTML syntax is one way to enforce such practice.

XHTML is the more pedagogic syntax

Requiring students to close void elements is a very effective way of teaching them what elements are indeed void. In the days when we use named anchors for intra-page navigation (as opposed to setting the id attribute on any element) I had students that forgot, or lazily omitted the closing tag. Their pages worked just fine. The only downside was a more complex DOM and that was not discernible for their pages. In fact, some of them believed that such an anchor was a void element. It even took a while for me to grasp that it was not. XHTML helped me understanding that, and I've seen it help other people come to grips with similar issues.

Explicitly closing elements helps making students understand the concept of semantics. You do not insert an li just to get a bullet point, all things between the starting and ending tag is a list item. You do not insert a p-tag to get some space between your lines. All text between the starting and the ending tag is a paragraph. Etc. Being forced to constantly ask oneself where something should start and where it should end is a good for learning.

I also would like to add, that requiring XHTML is good for the mental health of me as a teacher, since a lot of errors will be caught by the students themselves during validation, and their code will be easier to read for me.

The true dowsides of false XHTML

Nothing in life is so rosy as to have no negative downsides. With every medicine has its side effects. The two most immediate ones for newbies both involve scripting:

Tag names are sometimes uppercased in the DOM

Such things happen when an organization badly applies the biblical principle of the left hand not knowing what the right hand is doing. However, this confusion will exist, no matter which syntax you chose. Using HTML syntax with all uppercase element names is not common practice and it won't be long until the principle has to be explained to a student anyway.

Technically redundant closing tags will cause un-intuitive text nodes to appear in the DOM

Consider this code:

<ul>
  <li>foo</li>
  <li>bar</li>
</ul>

How many child nodes to the ul-element are there in the DOM? To a newbie (and Internet Explorer) it looks like 2. To the trained eye it is 5. But once again new technology comes to the rescue. By introducing new DOM-walking APIs we can (in the future) ignore those white-space only text nodes, in a cross browser consistent manner, using native API-calls.

Note that the first white-space only redundant text node would still be left, even if we had omitted the closing tags. And lets say for a moment that a student had authored a script that relied upon there being no closing tags. How confused would he/she not be when it suddenly stopped working because someone suddenly used closing tags? How bristle would such code not be in real use?

The future

Maybe there are some technical benefits of XHTML as well, but I hope to have shown that even without them, the syntax has clear benefits — enough to tell my students that they should use it, either as XHTML 1.0 strict or as HTML 5 polyglot. So where do I want to go from here? This is my wishlist for the future:

  1. The (X)HTML 5 spec should be strictly serialization and syntax neutral. XHTML syntax is not only something that should be allowed for transitional reasons.
  2. I would love to have a (X)HTML coding convention tool, that could check for even more details than the current validator does. Things like indentation, or allowing shortened attributes for boolean attributes, while enforcing citation marks for all non-bolean ones, ought to be testable. Such a tool might even make me think that there can arise even better alternatives than polyglot documents.

Monday, July 13, 2009

No backwards compatibility = XHTML 2 was doomed to failure

XHTML 2 is the Itanium of web technologies. You remember Intel and HP celebrating the Itanium architecture as the new super-duper technology, that should leave all RISC-based competitors in the dust. VLIW (re-branded as EPIC) was touted as a disruptive innovation. The future was Itanium.

But it was not! Forget the marketing speak from Intel and its only Itanium customer worthy of being mentioned, HP. Itanium has flopped. Yes, other RISC architectures, are struggling. MIPS no longer power high end servers from Silicon Graphics. SPARC is loosing market share. Only IBM Power PC seems to be holding its ground in the server space. But the x86 architecture, that was supposed to die, is reigning more dominantly than ever.

Backwards compatibility is everything

Being backwards compatible is not only a nice feature. It is a prerequisite that simply seems non negotiable. Let's look at a few successful products to get an idea.

Windows 95 and DOS-based games

Before Windows 95 all high end games were run from the DOS-prompt. Windows 3.x was only a nuisance for game developers. In order to achieve the highest possible speeds they often tweaked the hardware interaction in every possible way. Getting these games to run under Windows 95 proved a challenge, to say the least. Microsoft solved this by special-casing game after game. The operating system would recognize a particular piece of software, know that it required special handling and adjust accordingly. Even in ways that broke protocol.

Punch cards

When were punch cards invented? 1725. When did they become a big success? 1890. When did they become surpassed by other technologies. In the early 1960's. When did IBM drop support for punch cards from their operating systems? Not for another 30 years. I would not even be surprised if it was possible to attach a punch card reader to a brand new z-series computer today, and actually have it work.

XHTML 2 was Dead Pre Arrival

It did not die as a markup language for the web. It never lived. The day the decisions was made not to be backwards compatible, it was doomed. It never really mattered that it had every conceivable shiny new feature. Technical merits are simply not enough. Therefore it simply does not matter how much you shout about them, or how much you disdain the fact that HTML 5, due to its legacy, is awful and badly designed.

Yes, I use PHP too, and no matter how much one shouts the relative technical merits of Ruby or Python, PHP seems not to grow weaker. Ugliness just is not that big a factor. A strong user base, re-use of code and know-how, ability to find advice and support, such things matter. Sociology always trump technology.

The future for XHTML 2

I have friends who prefer XHTML 2 to DocBook, for data storage on the server. Reading Steve Pembertons thoughts about the future, he seems to believe that is a viable niche and that it can make a comeback that way. And why not, in a controlled environment the improved semantics of XHTML 2 over legacy HTML may provide significant benefits. Pushing XHTML 2 as a progressive enhancement, server side, might work.

Encouraged by none other than Ian Hickson himself XHTML 2 will continue to be developed in a working group outside of the W3C. Can it make a comeback? Once upon a time the W3C decided to axe HTML. Many developers, myself included, thought it was the end of HTML. I was wrong. I therefore will not say, good bye XHTML 2, but a revoir.

Sunday, June 28, 2009

Do not put experimental features last in your CSS

Today's blog post will be short one. I have lately seen code like this on more than one occasion:

#foo { 
border-radius : 10px; 
-moz-border-radius : 10px; /* Mozilla */
-webkit-border-radius : 10px; /* Webkit */
}

What is the problem?

Right now only one line will have an effect. In Gecko-based browsers, like Firefox, the one starting with -moz-, in recent Webkit based browsers, like Safari and Chrome, the one starting with -webkit-, as indicated in the comments. These are the two current experimental implementations of the coming CSS 3 border radius property.

Hopefully in the near future the specification will thanks to these two implementations reach a level of maturity, where browsers may start to implement it in a non-experimental version. If so one thing is to be expected. The non-experimental implementation will most likely differ (somewhat) from the experimental one! And as a developer you will most probably want the final version to be the one that browsers actually use. And when two rules affect the same property like this (equal specificity, equally placed in the cascade), the last one will override the first one. Therefore you should put things in this order:

#foo { 
-moz-border-radius : 10px; /* Mozilla */
-webkit-border-radius : 10px; /* Webkit */
border-radius : 10px; 
}

Why is this better?

During a transition phase, lasting at least a few years, Webkit and Gecko can be expected to support both implementations. There are sites that use the experimental versions only and in order to give them a grace period, dropping support as soon as the final version is implemented is not an option. Historically Mozilla has let such grace periods last for 2-4 versions of Firefox.

So this is the bottom line. Put experimental features first, standard features last. That will ensure a better forward compatibility.

Saturday, June 27, 2009

Validation and doctype myths and (inconvenient) truths

People like me who support web standards often talk about validation and doctypes. Yet, even within our camp, there seem to be a lot of confusion. I will try to address a few misconceptions, especially a few new ones that has come out of the ongoing debate about HTML 5 and accessibility and RDFa.

Background: How a browser processes markup

Most people tend to think in two stages. There is markup and there is a rendered page on the screen. In reality this is a a more complex process.

First comes parsing. The purpose of this is to convert the markup into an representation inside the program that is so to speak "understandable" to the computer and usable for rendering on the screen as well as to assistive technologies, like a screen reader. This internal representation is accessible also to manipulation through the DOM API. Indeed it is often talked about as the internal DOM representation of the document. From now one I will simply call it the DOM. Just bear in mind that I refer to this internal functionality as a whole, and not to the API in the rest of this article.

Parsing in itself is a multi-step process. It involves the simple mapping of the HTML markup, but also the applying of CSS, handling events, etc.

The rendering (painting on the screen) is in turned made from the DOM. As is the exposure of the page to assistive technologies.

Personally I've found it helpful to think about this process in three stages:

  1. The code (HTML, CSS, etc) arrives as a network stream.
  2. The DOM, constructed by the parsing.
  3. The perceivable results (on the screen, in the speakers, in the braille terminal, on paper from print, etc.)

With this knowledge we can formulate the purpose of validation:

  • Validation provides a secure mapping between the markup and the DOM. You as a developer know what you are going to get.
  • Validation provides easier development, including easier error detection and better maintainability through cleaner and consistent code.

Browsers have always had mechanisms to handle badly written HTML. Indeed they have reverse engineered each other in this regard so much that invalid, tag-soup, piece of shit code usually renders just fine. And the HTML 5 spec goes to great length to explain just how such code should be handled by a browser. If one has supreme knowledge about every little detail of how browsers work internally, one can therefore get predictable results even with code that does not validate at all.

Web developers should, however, not be required to have such in-depth knowledge. Validation is a tool that helps us stay within safe boundaries. Stepping outside them might work, but it will always lead to extra work in the end.

With this in mind we can formulate a few spin-off effects of validation, such as:

  • Validation is an act of courtesy towards other people who one day might be charged with taking over your code base, or indeed only be asked to take a look on a mailing list or forum, to help you solve a problem.
  • Validation is a mark of professionalism, a sign that you care about code quality.

But the main effect is that validation is a tool that helps you as a developer get to your desired results. I always tell my students to validate early and validate often. After every major change to the code, re-run the validator!

Or to put it differently. Valid code is not the end goal, it is a very good tool in order to reach the end goals of predictability, consistency, maintainability and effectiveness.

Myth: You must validate in order to be accessible

Wrong. Validation is advisable of course, but in a pure technical sense not a requirement. It is perfectly possible to write unsemantic code, without proper hooks for assistive technologies, that still validates. And it is perfectly possible to do the other way round, although validation — especially to a strict doctype — will be one help towards accessible web sites.

Validation can check for the presence of accessibility features, such as alt-attributes, table column headings, etc. It can never ensure that the contents within those attributes and elements have been written in a usable way. Valid code is a good starting point for accessibility, not a guarantee of accessibility.

This is especially true for the non-strict versions of HTML 4.01 and XHTML 1.0. These 4 (2 * 2) (X)HTML versions contain a lot of elements and attributes, that should not be used. The validator will complain that they are deprecated, but it will still give the page a green light. Anything but strict doctypes or conformant HTML 5 (see below) should have been verboten long ago for any professional web developer.

Myth: There is no penalty for not validating

There is one camp, primarily accessibility experts, who would like browsers not to render pages that contain markup errors, or at least give clear warnings that they do. Recently it has been advocated that e.g. the HTML 5 canvas element should not render anything to sighted users unless there is a fallback for the blind. They also have argued that any refusal to incorporate ARIA or RDFa into HTML 5 can simply be overridden because validation does not matter.

In this context it is true. As long as browsers and assistive technologies support ARIA and Google, Yahoo and other search providers will honour RDFa, it will work. User agent behaviour is always the bottom line, the true de facto standard in practice.

This takes us back to my main theme for this article. Things might work when using code that is not valid. But you can be much more confident that it will, if it validates. In my experience, the most common error that I catch using a validator is spelling errors in tag and attribute names. Such errors may wreck your page in many ways, maybe even in unseen ways because you have misspelled an ARIA attribute. And such errors are easier to spot if they are not hidden behind hundreds of other validation errors, that by themselves actually are benign.

Let me repeat that. Many validation errors are benign: An un-encoded &, an unnecessary closing tag, and, with HTML 4.01, forgetting to specify the type attribute for a script. In themselves these errors will not harm the execution of the parser, the construction of the DOM or the rendering of the page on the screen and exposure of its contents to assistive technologies.

Actually, one may find oneself in a position where it is beneficial not to be valid. This applies both to HTML and CSS. New features, often available only in their experimental first forms, can be quite useful and add to a page's usability, esthetics or accessibility.

However, the problem with benign errors is that they often obscure the malicious errors. A page that contain several hundred validation errors is much harder to debug, than one that only has a few. It is therefore imperative that one chooses the best possible validator for one's purpose. E.g. validators can be configured to ignore vendor specific CSS-rules or to include ARIA.

Actually, the main reason I use an HTML 5 doctype for all my new sites, and gradually change my own sites to do the same, is so that I can use a validator that supports these new technologies.

Myth: You can use JavaScript to cheat the validator

Technically this is not a myth. Yes you can. Most validators will work on the raw HTML and will not process any scripts. However, the purpose of validation is not validation. The purpose is predictability, consistency, maintainability and effectiveness. Inserting or altering markup through scripts, or to put it better making changes to the DOM, should be made in such a way as not to jeopardize the very reasons we wanted or code to be valid in the first place.

This is just like school. Cheating may get you the grade, but you won't get the benefit of the knowledge. Cheating the validator means that you render the validation practically useless.

There are tools available that will let you see the generated HTML, that is the HTML as the browser has understood it too be, post parsing and post scripts being run on the page. This is sort of like going back from step 2 above to the first step. That code should be equally valid, and not differ from the original input in any unexpected way.

Follow up question: Is it OK to use JavaScript instead of the target attribute?

One of the most common uses of JavaScript to cheat the validator is to replace this:

<a href="http://..." target="_blank">linktext</a>

With this:

<a href="http://...">linktext</a>
<script>
// JQuery code that attaches event to all presumed external links
$("a[href^=http").click(function() {
    window.open(this.href);
    return false;
});

I belive this is good practice. Not because we are cheating the validator, but because we are using DOM-scripting to handle behaviour. We are using the right tool for the right job.

Follow up question: Is it OK to use JavaScript to defeat browser bugs?

My first answer is, is that really necessary? For example, it is quite possible to have the object element work in all browsers, including Internet Explorer 6, to serve Flash or Java Applets. Most JavaScript techniques to include Flash on a page have been obtrusive and not degraded gracefully. And they have used outdated browser-sniffing, potentially making them unlikely to work as newer browser versions or alternate browsers like Chrome get released. Using unobtrusive DOM-script to enhance the plugin experience is of course OK.

There are however bugs and lacking support for modern standards in some browsers (yes, we all know I primarily talk about MSIE now) that can only be alleviated through scripting. Chose carefully what scripts to use, though! And remember that there are many users that might not get your scripts, perhaps since a corporate proxy has stripped out all content from your script elements.

Does the doctype matter?

Tied in to the question about validation is the choice of doctype. It serves two purposes:

  1. It declares what vocabulary a developer intends to use.
  2. It is the main way in which all sane browsers chose their rendering mode. I defer this topic to Henri Sivonen, while asking people to note that Internet Explorer 8 is not sane in any way...

As regards the first point the doctype is of value to other people with whom you co-operate, a social contract between all developers in the team. But its main value is helping the validator see what rules to validate against.

Except for rendering mode switching, the doctype does not in any really discernible way affect how the browser will treat the functionality of your markup. E.g. even if you declare a strict doctype, it will happily honour elements like <font>, attributes like target or even the marquee-element! To a (sane) browser, there is no such things as different versions of HTML. XHTML 1.0 strict or HTML 4.01 frameset or HTML 5 is all the same. Any content sent with the MIME declaration text/html is treated the same.

By dropping the DTD, HTML 5 makes explicit, what so far has been implicit. There are different editions of the HTML standard, but in practice (inside the UA) there is but one HTML.

Even if you have used XHTML syntax and have an XHTML DTD, the markup still will be parsed as usual. To trigger XML parsing, one must change the MIME declaration, which of course will fail miserably with Internet Explorer and thus never happens except for niche web sites. It has also been conclusively proven that there is no benefit in switching modes depending on the UA. The speed difference between HTML and true XHTML is first of all negligible and it only really concerns the first step (parsing), which from a performance perspective is only a fraction of time, compared to whats going on in step 2 (the DOM) and step 3 (rendering). (There may be good reasons to use XHTML, but they are related to workflow, tools and data-exchange.)

Myth: HTML 5 re-introduces bad markup

OK this is not exactly a validation or doctype myth, but it is related. the short answer is of course that HTML 5 does not force bad markup down anyones throat. All good practices are still doable.

This myth started in the early days of HTML 5, when authors started to look at the spec and saw monstrosities like <font>, or even <marquee>! The dual purpose of writing a spec on how to handle bad markup, part of the browser requirements, together with a spec about how to produce good markup quickly turned into a communications debacle. The core team behind HTML 5 is perhaps not the ones with the best people skills in the world. Communication has broken down repeatedly. (On the other hand they are unlikely to change any bad behaviour, perceived or factual, through being bashed.)

In practice, though, HTML 5 has conformance requirements — which basically is a new term to describe validity — that are even more far reaching than the ones in HTML 4.01 strict or XHTML 1.0 strict. Being conformant should ensure that you are closer to adhering to best practice and accessibility principles. Validation thus is not less important, it is even more important than ever. Just remember that validation never really has been about anything else but getting a predictable DOM from your markup and encouraging best practices. The conformance criteria in HTML 5 are being written in such a way as to take your code even further towards these lofty goals.

This is one of the reasons there is no DTD for HTML 5. These new rules for validity are sometimes so precise or require such processing logic, that they can not be expressed through an DTD. As a side effect we get a doctype one actually can learn by heart:

<!DOCTYPE html>

That will make my students very glad!

Monday, May 4, 2009

Rotating column headers using CSS only

Note: This article now has a follow-up from July 2010, explaining how to get this working in more browsers and in a more reliable way.

For a while I have wanted a solution that would rotate table column headers on web pages. In Excel or Open Office Calc, this is a breeze. However, the only way to do it in todays browsers is using images, or perhaps SVG or even the Canvas element from the HTML 5 spec. However, this is a pure design issue, and therefore falls into the domain of CSS. And I think that I've come up with a solution, or at least an idea for a solution.

Screenshot of Firefox 3.5b4pre and Safari 4.0 beta showing column rotated headers

The image above should give you an idea about what I mean. The benefits of this design is that one can keep all columns relatively narrow, while at the same time use long words to describe them in their headers. If you use Firefox 3.5 or Safari 4.0 or another browser that supports -moz-transform or -webkit-transform you can also look at my experimental page. View its source to get the full picture of what I have done.

The technique

One can not simply rotate the th-elements. It will look like this: Screenshot of failed solution. Only text rotated.

There is a number of problems:

  • The rotation is applied after the browser has allocated width for the columns. Our intention was to save horizontal space. We need to remove the text content out of the normal flow, using absolute positioning.
  • Any borders and background color will not be rotated. The text will extend out on top.

My solution is to wrap the text in three spans. Yes that's an awful lot, but they each serve a purpose. The first span is the holding area, relative to which the second span will be absolutely positioned. Its CSS is as follows:

    th > span {
      position: relative;
    }

The second span is rotated as well as skewed. Rotation is counter clockwise, hence it's set to a negative degree. The skewX is set so that the border originally to the left, now to the bottom, is completely horizontal. Mathematically the formula is abs(rotationdegree) + skewdegree = 90.

In order for all headers to be of equal height, we set a width. Remember that the visible height is the un-rotated width. For some yet un-investigated reason Firefox will put the span a bit further up than Safari, so I'll add a CSS filter to fix that. Border and color is added, as well as some padding, just for the appearance.

th > span > span {
  /* Must remove span from normal flow in order to keep columns from widening */
  position: absolute;
  white-space: nowrap;
  top: 1em; /* Firefox 3.5. Safari is reset below */
  -moz-transform: rotate(-65deg) skewX(25deg);
  -webkit-transform: rotate(-65deg) skewX(25deg);
  -moz-transform-origin: 0% 0%;
  -webkit-transform-origin: 0% 0%;
  border: 1px solid;
  padding: 0.5em;
  height: 1.3em;
  width: 120px;
  /* Illustrate where it's at with color */
  background-color: yellow;
  /* If one wants centered text, this is the place to reset it */
  /* From a design point of view it might not be desirable */
  text-align: center;
}
/* CSS filter for Safari */
@media screen and (-webkit-min-device-pixel-ratio: 0){
  th > span > span {
    top: 0;
  }
}

The text will be a bit hard to read. I therefore un-skew it in a third span. This does not work, however in Safari 4.0 beta:

th > span > span > span {
  /* Rotate the text back, so it will be easier to read */
  -moz-transform: skewX(-25deg);
  /* Safari 4.0 beta won't skew back, so the next line is actually redundant right now */
  -webkit-transform: skewX(-25deg);
}

For the full HTML and CSS, look at my experimental page and view source.

Problems

  1. This solution is quite fragile. Widths and heights, margins and paddings might mess things up. Columns must not be of a flexible width.
  2. For pixel perfection, there is a slight nuance, where Firefox positions the spans 1 pixel further to the right, than Safari.
  3. As I said, there is an awful lot of spans...
  4. And worst of all. So far I have not developed a fall back for browser that do not support CSS transformations. In those the table will just look awful and the table headers will be unreadable.

Anyway, I think this solution has some potential. Until the CSS WG and browser vendors gives us an even better solution, this is my best effort. Is there a better solution somewhere, please let me know!

Sunday, April 5, 2009

Blogger and accessibility

Starting to edit my new blogger-blog, I see some glaring mistakes done by the developers. There is a ton of inline CSS for example. Even worse is the fact that they are hiding the skip-links from screen readers. It is a well known fact that "display: none" in your CSS will make all the block disappear not only from sighted users view, but also from what is being exposed to a screen reader. Thus, the following in my template is bad code:

  <span id="skiplinks" style="display: none;">
    <a href="...">skip to main </a> |
    <a href="...">skip to sidebar</a>
  </span>

I decided to improve upon that. Here is my agenda:

  • Begin removing inline CSS.
  • Make the skip-links accessible as they should be.
  • Add ARIA-landmark roles.

Fixing the skip links and the CSS

Notice that the skip links are still part of the page tab order. In order to avoid confusion I want to make them reappear visually when they receive focus. And while we are at it, lets experiment with some new CSS-techniques as well:

#skiplinks {
    position: absolute;
    top: -100em;
    left: -100em;
}
#skiplinks a:focus {
    position: absolute;
    top: 103em;
    left: 100em;
    background: yellow;
    width: -moz-max-content;
    width: -webkit-max-content;
    width: -o-max-content;
    width: max-content;
}

Max-content is a planned addition for the CSS 3 box model. Currently it is only supported by Firefox and that through the preliminary -moz- prefix. I suppose Opera and Webkit will add support in a similar fashion, so I have anticipated that in my code.

ARIA Landmark roles

The breakfast on Saturday that concluded European Accessibility Forum I sat next to Peter Krantz and Steve Faulkner, as they discussed ARIA Landmark roles. Steve has written a nice wrap-up about them. In the future they will make skip-links redundant, as this is a much better solution. Look at the source code on this blog, and you'll see my contribution. Look for all "role" attributes. I have yet to add the role "navigation", but that seem to be impossible without using JavaScript. Since landmarks do not define dynamic content, I think static additions to the HTML is the preferred solution.

HTML 5 doctype

I have changed the doctype as well. Personally I am a fan of HTML 5, at least in comparison to many of my friends. The reason now however is that I intend to use Henri Sivonens conformance checker, that includes ARIA and RDFa support, instead of the W3C validator. Adding subheadings to my blog posts also really makes me wish that I could use HTML 5 sections in order to make them work even if the document structure changes drastically. Right now I am using h4, as that seems appropriate on the front-page, but will h4 be the correct choice also on other pages or in the future?

Update: Blogger keeps messing with the doctype. Aargh! Will leave it as is for now, but it does not help me stop thinking that Blogger's CMS is really annoying.

Friday, April 3, 2009

Declaration of dependence

I am starting this blog to document the joys and woes of using my Thinkpad (currently a Z61p, dreaming of a W700ds).

This documentation might be of use for someone stumbling across similar issues and through the magic of search engines he or she might find my solutions.

I also need a place where I can regularly communicate in English. My main domain, keryx.se , should remain a Swedish language site.

Finally I wanted to experiment with blogger. What is it like, usability and accessibility wise? Soon I'll know.

Why now?

I recently upgraded my Thinkpad from the standard 100G drive to a 320G drive. I chose a Seagate Momentus 7200 rpm drive, since it offered both superb specs and a reasonable price.

The discussion I had on Lenovos forums, leading up to my decision confirmed my suspicion that it was no problem at all chosing a drive from a non-Lenovo source.

Very first impressions of Blogger

Why can I not turn off automatic insertions of br-tags? It would have been soo easy for them to look for two line breaks and wrap things between p-tags instead. And inserting an image means a whole lot of intrusive and unnecessary JavaScript. I'll stick to manual editing of all my HTML!