Westciv Logo

These materials are copyright Western Civilisation Pty Ltd.

www.westciv.com

They are brought to you courtesy of Style Master CSS Editor and Westciv's standards based web development courses.

Please see our website for detailed copyright information or contact us [email protected].

articles

The state of the Art in Australian web development

Read more westciv articles

Abstract

Author: John Allsopp

History: Presented by John at WE05, 30th September, 2005.

Westciv's John Allsopp takes a good hard look at just exactly how major Australian sites are developed, and how well (or otherwise) they adhere to best practices.

Current practices in Web Development in Major Australian Sites - a survey

How are major companies and government departments in Australia developing their sites today? Are they adhering to best practices in development and accessibility? This presentation looks at major Australian sites, to determine whether they are using best practices, and where they are falling down. We'll see what patterns emerge, where things are going well, or otherwise. And we'll conclude with some recommendations based on this cold hard evidence.

Methodology

My aim with this survey was first to develop an objective, if a little unsophisticated, measure of a site's adherence to current best practices in web development and accessibility. This could then be used to gauge sites against one another, and over time.

Now, if you are going to investigate best practices, of course we'll first need to define what these are. The main areas which standards based developers are concerned with, and which perhaps differentiates the standards based approach to development from more "traditional" approaches are

  1. Validity of XHTML
  2. Use of and validity of CSS
  3. Use of semantic, structural HTML, and separation of content structure from presentation
  4. Accessibility

Each of these areas is, to varying degrees, reasonable non contentious. Each of them, to a greater or lesser extent are amenable to machine checking, hopefully making any survey like this at least reasonably objective. We'll look at the criteria and method of assessment in a moment, but one reasonably glaring omission from the list above is usability. It was excluded for a couple of main reasons. Firstly, unlike the others, its not easily amenable to machine checking, so any objective testing would be prohibitively expensive and time consuming, if possible at all. It's also an area where there is perhaps less consensus as to just what best practices are, and certainly how you might objectively measure them.

I only assessed the front page of the sites - front pages arguably represent the "best effort" of a developer. You might also argue it's where every man and his dog (marketing, senior management, and so on) gets to put their oar in, and in some ways, is less likely to reflect a developers best effort. We need to start somewhere however, so this is as good a place as any.

So the areas I used to assess best practices were

  1. HTML/XHTML
  2. CSS
  3. Semantic and Structural HTML
  4. Accessibility

All these areas are scored out of 5, giving a total of 20 points. Here is how they are allocated

HTML/XHTML

This assesses the extent to which a site uses valid XHTML/HTML.

The criteria
Doctype

Valid pages require a document type. The W3C recommended doctypes are HTML 4.01, XHTML 1 and XHTML 1.1 If no doctype is specified (or a doctype prior to 4.01) subtract 2 points

Validation Errors

For each type of validation error, subtract 1 point - to a minimum of 0 points.

For example, not quoting attribute values where required, even if done 50 times, is a deduction of one point.

CSS

The criteria

No use of CSS - 0 points

Validation Errors

For each type of CSS error, deduct 1 point (this is rather generous) We also deduct a point for avoiding the recommended practice of using a generic font family when specifying fonts.

Semantic and Structural HTML

The criteria
  1. Use of Tables for layout - deduct 2 points
  2. Use of font elements - deduct 1 point
  3. Use of presentational attributes - deduct 1 point
  4. Each use of a type of element in a semantically inappropriate way - 1 point
  5. Use of inline style - subtract 1 point

The first point might be a little contentious to some. Tough. While it is a reasonably hard line to take, it is clear from several angles that using tables for layout has for some time been far from a best practice. Similarly, penalizing the use of inline CSS appears harsh, given that it is a perfectly acceptable practice according to the W3C recommendations. This section looks at the issue of separating content and appearance, and inline CSS certainly is far from perfect in that regard.

Accessibility

Accessibility is notoriously difficult to assess, particularly mechanically. The aim of this survey is not to comprehensively determine the accessibility of a site, but rather to gauge its basic adherence to core accessibility recommendations. The reasoning is that failure to adhere to even simple accessibility guidelines indicates a deeper failure in this area.

The criteria

Deduct 1 point for each non trivial, non controversial, type of accessibility error reported by Cynthia Says, such as lack of alt attributes, and so on.

The Sites

The aim of the survey was to look at how major Australian sites are adhering to best practices. We know that there will be many sites, developed by developers who adhere to standards and best practices, but how far do these best practices extend to the most significant sites on the web in terms of their user base.

To this end, the following sites were chosen. They represent the biggest companies in Australia, with market capitalzations of between 2 and 70 billion dollars. They represent major government sites that millions of people use every day. They represent the most visited Australian sites by Australians. In short, they represent the mainstream of web development.

ASX top 50

Major banking sites

Popular travel sites

All TV channels

Main government sites for several states and federally

Other Australian sites among the Top 100 sites visited by Australians, such as Sensis and Yellow Pages.

The results

If you are a developer who pays more than a passing interest in the issues we have discussed, ask yourself how well or badly you think these major sites are doing.

To be honest, I thought the result of this survey would be "nothing to see here people, move along". That after a few sites I'd be left with the realization that it was all a bit pointless, that everyone out there had digested the lessons, used the validators, cared about (or at least had some basic understanding of) accessibility issues, and I could actually go ahead and foprget the whole issue.

And it all started so well.

I began with the ASX top 50. I felt that if these sites did well, well surely other more popular sites would be doing it right as well.

The biggest company in Australia, "The Big Australian" is BHP Biliton. And lo and behold their website is, in the terms of the criteria, a good effort.

18/20

Things were looking pretty good. From then on it was almost literally downhill. Only one other site scored higher, and one other as high. The next two weeks became a nightmare where every spare moment was spent validating mangled and broken HTML, CSS, wading through at times dire source code, and despairing at the thought of all the lawyers who were going to make a fortune out of the disability discrimination act of 1992.

A total of 6669 HTML validation errors on just 83 pages. Only 9 of the 83 sites validated to any doctype.

I had wanted to do 100 sites but simply ran out of time and enthusiasm, indeed stamina at 83. I had seen and had enough.

Frankly, I never want to see the Firefox toolbar again. I never want to see the results of the W3Cs validators, or Cynthia Says. I don't want to look at the source code of one more table based site. It's a shame that most developers don't seem to want to either.

So what can we learn from this carnage?

Well, that we still have some way to go until even the most basic aspects of the best practices that the W3C and others having been developing for a decade are even moderately adopted.

But in a more practical sense, a number of very strong patterns emerged from all this, which I think we can draw some good lessons from.

And in part, its a bit like driving past a car wreck. Few manage to not take look.

HTML

doctypes

Let's start at the top. Doctypes. What doctypes (if any) are major sites using?

Well, the surprising thing is that they are using any at all. Given only 9 of the surveyed sites actually validate, and the average number of errors on each page is over 80, I find it a little surprising that 52 of the 83 sites actually have specified doctypes (and a couple of others try to).

You'll see from this graph that its loose all the way. 1 each for XHTML Strict, XHTML 1.1 and HTML 4.01 Strict (only the XHTML 1.0 Strict one actually validates for what its worth).

graph of doctypes used

Hats off too to the HTML 3.2 hold out. Now that is old school. If you are unfortuna te enough to have to buy petrol, you will most likely will have done so from them of late. Probably leaded.

So what kind of errors are people making?

It's interesting to look at the frequency of the errors. In fact, while there are several thousand errors among the sites surveyed, and an average of 80 per page, fewer than 25 turn up with any significant frequency (3 or more times across the sites). Only 15 turn up on 10 or more sites. People are making the same 15 to 20 mistakes over and over again. Let's take a look in some more detail at these major, oft repeated errors. On the whole they are simple to fix, but some of them are rooted in common misunderstandings that need to be cleared up.

Does it really matter?

Before we get into practicalities, does it all really matter? Afterall, these pages work in browsers (that is IE) right?

Here are some of the companies and organizations surveyed.

Technology is essential, fundamental to what they do. Lives and livelihoods depend on their technology. Is it not reasonable to conclude that if they fail to adhere to even basic best practices when it comes to the most public of their technologies, we have the right to be skeptical of how well they adhere to best practices in other technological areas?

That alone is a significant reason why it matters.

I imagine too that many if not most of the organizations whose sites I surveyed are ISO 9001 compliant. So they clearly feel best practices are important. Why does this not carry over to the web?

The errors

Here, in order of their frequency, are the most common of the HTML errors, with some simple ways in which most of them can be cleared up.

Graph of Major HTML Validation errors

Missing alt attributes

Not only are alt attributes a WAI guideline (1.1) for many elements, they are also required for these elements by document types.

More than 50% of sites fail this validation test.

Fixing this problem is essentially trivial, and suggests both a lack of knowledge, coupled with the failure to even attempt validating a page. This conclusion will be drawn, sadly, again and again.

Script (and style) elements with no type

Well over half of all sites surveyed had this problem with script elements. The type attribute is required. Often script elements included the language attribute, often included neither attribute.

At first glance, this extraordinary rash of scripting (if I were to redo this survey I would look at how many of these sites used scripting) appeared to achieve little that could not be achieved without scripting. Given the significant problems scripting caused validation (and javascript problems don't stop here) I'd suggest scripting were used a lot less frequently than it is.

I'm also sorry I didn't include a count of how many sites used browser sniffing via javascript. If I had a lot of graduate students working for me I'd do that.

Style elements without type were much less frequent, but worth noting. We'll see how script and style elements can get us in trouble in other ways a little later.

TOPMARGIN

If anything indicates the belts and braces, kitchen sink, old ways are the best ways, she'll be right attitude of many many developers it is the incredible frequency of the pseudo HTML attributes TOPMARGIN LEFTMARGIN MARGINHEIGHT and MARGINWIDTH.

I never want to see these again as long as I live. OK!

As far as I am aware, these were never part of any published specification, and indicate a developer who really needs some bushing up on their modern web development practices. Like from the beginning.

38 out of the 83 sites used these properties. That's 39 too many.

Unescaped ampersands

To some extent, many of the problems we see recurring, I suspect, are as much a function of CMS and tools as they are of hand coders. I suspect a lot of old school CMS and Tools are adding TOPMARGIN and other properties like them. That does not exonerate the developers, more than it explains the prevalence of such poor practices.

35 of the 83 sites have unescaped ampersands. Very often these occur in urls (much less commonly in text, which suggests that developers on the whole know about the problem, but ignore it in URLs). I suspect it is often poorly developed apps that are producing these URLs.

In HTML, even in URLs & must be written as & for a page to validate.

HTML Help has a good article on common validation errors which covers this issue, and a number of the others we encounter here. There is a link to this article in the resources section.

Malformed documents

Above all else, HTML, and particularly XHTML documents must be well formed. To be well formed, elements must be properly nested, and properly closed. These are the absolute foundation of good HTML.

Far to many sites fall down in this area. Sites displayed missing start or end tags. Self closing elements like meta elements that have close tags. Overlapping elements. In short, this basic, fundamental aspect of HTML is a shemozzle.

In addition to this many sites fall foul of the containment rules of HTML/XHTML.

Inline elements may only contain other inline elements Paragraphs are an exception to the basic rule that block elements may contain any other kind of element List items must be contained within a list Inline elements must not appear directly inside the body but must be contained within a block element Forms may not be contained directly within a table, but within TD elements

To illustrate the problem, here are some of the containment errors I found

and many more.

Together, fundamental problems like these appear in the majority of documents.

30 of the 83 sites are generally malformed documents, or break basic containment rules. In addition, 14 of the 83 have containment problems associated with tables and forms. As well, 20 documents are malformed specifically in their use of tables - these malformations in addition to any general problems mentioned above. When it comes to a valid page, nothing impacts on the likelihood of a problem than using tables. As we will see shortly, 71 of the 83 sites use tables for layout to some extent. 20 of these 71 have problems with malformed tables (and in addition 14 have problems with forms contained in tables.)

For heaven's sake, almost 10% of the sites (7 out of 83) even display malformed comments!

And there are other specific containment issues we'll get to shortly.

Style and link elements and containment

Style and link elements must be contained in the head of a document.

10 sites have style elements in the body of their document. Another 3 have link elements in their bodies.

Again, it may "work" but it is not valid HTML and is far from a best practice.

XHTML and HTML

XHTML causes a number of different, significant problems for developers.

Case

We'll start with XHTML and case. As XML, XHTML is case sensitive. Element and attribute names are all lower case. <HEAD> is not an XHTML element start tag. HREF is not an XHTML attribute. 19 documents use an XHTML document type of some description. 10 of these have problems with case.

XHTML also requires a slightly different syntax, particularly for self closing elements like meta and link. These must end with />

Effectively half, or 9 of the 19 XHTML documents make syntax errors along these lines. In effect, they use HTML syntax.

While it is poor practice to have invalid HTML documents - it is potentially disastrous to have invalid XHTML documents. A choice to use XHTML should be accompanied by an absolute commitment to valid documents. Why? XHTML is XML. A validating XML parser, when encountering an error must stop parsing and return an error. IE is not a validating XML parser, so will continue to handle sloppy XHTML as it has done HTML. Other, newer browsers won't be so lenient. Now you might rejoin that my pages work well in Safari and Mozilla browsers, so what's the problem? At present unless XHTML pages are served as application/xhtml+xml, they are in effect treated as HTML by these browsers. Should the pages ever be served as application/xhtml+xml (and that may be something outside your control) your pages will look something like this

The beige screen of death.

Be very careful with XHTML. Using it is in essence a commitment to a fully valid page. Make sure you honor that commitment.

HTML

It's usually said that XHTML is backwards compatible with HTML. So using XHTML syntax with an HTML doctype will give you a valid page. But this is not necessarily the case.

XHTML syntax for link and meta elements in the head of a document, namely, <link ... /> or <meta ... /> will cause an error. This is not a problem for self closing elements in the body of a page.

This is a very little known error, and really not alluded to in the general documentation, official or otherwise, associated with the backwards compatibility of XHTML.

If using XHTML syntax with HTML doctypes, don't use the XHTML syntax for self closing elements in the <head> of the document.

Unquoted attribute values

It is good practice, always valid (for HTML and XHTML), and so recommended for consistency to always quote attribute values. IN HTML, "quotes are optional if the attribute value consists solely of letters in the range A-Z and a-z, digits (0-9), hyphens ("-"), and periods (".")" - HTMLHelp

In XHTML, all attribute values must be quoted.

20 of the 83 sites had errors associated with unquoted attribute values.

Unescaped Javascript

In HTML end tags are recognized within SCRIPT elements, but other kinds of markup--such as start tags and comments--are not" - HTMLHelp

"Authors should therefore escape "</" within the content. Escape mechanisms are specific to each scripting or style sheet language" HTML 4.01 specification

http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data

So

<SCRIPT type="text/javascript">
      document.write ("<EM>This won't work</EM>")
    </SCRIPT>

is invalid, while

<SCRIPT type="text/javascript">
      document.write ("<EM>This will work<\/EM>")
    </SCRIPT>

is valid

In XHTML, unlike HTML the contents of a script element are not CDATA. We can solve this problem in the following way

<script type="text/javascript">
<![CDATA[
// Javascript
]]>
</script>

But this causes problems with some older browsers , which we can get around like this

<script type="text/javascript">
/* <![CDATA[ */
// content of your Javascript goes here
/* ]]> */
</script>

http://javascript.about.com/library/blxhtml.htm

Easier and better still is to link to an external Javascript files as well as CSS files.

TR and TD elements with background and border properties

The background and border attributes are not part of any W3C specification for TD or TR elements, regardless of what certain browsers might think. Many of these errors are a function of the "it works" mentality of 1995 (2005-1995= a long time ago ok) which went out with flying cars and jet packs. Validators and not browsers tell us what is right.

20 of the sites demonstrate this problem.

If anything indicates the "she'll be right it works in [a certain browser]" attitude (and plenty does) it is this.

The dregs

The remainder of the reasonably common errors are a bit of a grab bag. We'll go through them quickly. All of them could be quickly and easily identified using the validator, and fixed with minimal fuss.

Forms with no action attribute

Forms require an action attribute. 10 of the 83 sites fail to have this required attribute for form elements.

Repeated ID attribute values

A given ID value must appear at most once in a given document. 14 of the 83 pages reuse an id value. I suspect that this is often associated with tool or CMS generated content.

Illegal characters

The most common of these is the use of illegal characters in HTML documents. 9 of the 83 documents had this problem. For documents using a small number of non roman characters, this is easily alleviated using character entity references. That is the case for all of the documents surveyed. Proper internationalzation for non roman character sets is a considerably more difficult issue and way beyond the scope of this survey and discussion.

Malformed color values

5% of documents features malformed hex color values in their HTML, with a missing #.

We'll return to this issue with CSS (where color values actually belong), and the issue of the use of presentational attributes in HTML.

Deprecated attributes, invalid attributes and values

Many sites feature deprecated attributes, or even invalid (invented) attributes, and values for attributes. I literally gave up counting the incidence of these, so common were they. Another strong example of the "but it works" approach.

The envelope please

So, how did Australian sites fare? One our scale of 0 to 5, we have the following results.

Graph of HTML Validation Results

The full results are available as a table here

These sites validated
Honorable mention

mycareer, with a single error (the use of a name attribute with an image) which I suspect some errant tool (software, not human) threw in.

CSS

Fortunately, on the whole, sites fared quite a bit better with CSS than HTML. In part this reflects perhaps less opportunity to make errors, and the relative simplicity (at least for basic styling and syntax) of CSS over HTML. On the whole, coupled with the, as we shall see, overwhelming use of tables for layout still prevalent in most of these sites, and the significant use of font and other presentational HTML elements and attributes, the conclusion is that CSS is being used in a reasonably simplistic way at present, for basic text styling, and sparingly for page layouts, and more sophisticated purposes.

Only one of the sites failed to use CSS at all, encouraging to see.

As we will see a little later, 50 of them use inline CSS, mostly extensively.

The errors

21 of the sites were without any CSS error, and on the whole, the median and average score for CSS was higher than for any other area. This does of course leave 62, or 75% of all sites surveyed with CSS errors, most of which could easily be identified using a CSS validator (or good tool) and fixed in a few moments. Often the most difficult thing with CSS is spotting the errors that creep in.

The majority of CSS errors appear infrequently, and so are difficult to categorize. Two broad categories however are

Time and again, properties like font-color (for color) align (for vertical align), and other less common non CSS properties (often simply presentational attributes from HTML like alink, bgcolor and so on) appear.

Don't make up properties please.

Similarly, legitimate properties and values are mixed up (so we see text-decoration: bold )

Further, property names or values are mispelled (text-tranfrom, for instance)

All of these are simply spotted using a validator (and not made in the first place using a good css development tool.

Some broad categories of difficulty also emerge.

Several of the sites use scroll bar properties, specific to IE. Several also use IE sopecific expression and filter "properties". Only one site uses a mozilla specific properties, -moz-opacity, which follows the convention of using the -browser prefix for browser specific properties.

Color, which we saw causing difficulty in HTML (where, in best practice terms it does not belong) causes even more difficulty with CSS. seagreen, indianred, limegreen and lightgrey are not CSS color keyword values (though are doubtless supported by some browsers). I didn't have the heart to check. As with HTML , hexadecimal colors must start with the # character, and may contain only 3 or 6 characters

10 of the 83 sites demonstrated one or both of these problems. Easily spotted and rectified, or avoided with validators or good tools.

The dregs

!important has no space. Since user !important declarations in CSS 2 take precedence over author !important declarations when of the same specificity, its arguable that author style sheets really ought not use !important declarations, as these exist to override user style sheets, and frankly if a user has a declaration (with or without !important) there is probably a very good reason for that.

Selector groups don't have a comma after the last selector.

CSS in external style sheets must not be wrapped in a style element

CSS comments are different from HTML comments

All of these occurred at least a small number of times, and so are worth noting.

The envelope please

Graph of CSS Results

As mentioned, on the whole, sites scored much more highly for CSS than any other area. A number of sites were saved from very low scores simply because their CSS was reasonably valid.

However, as mentioned, the CSS used is on the whole reasonably unsophisticated, largely reserved for text styling. CSS for layout is still some way off being mainstream, as we will see in a moment.

Structural and Semantic HTML

I imagine the most contentious "head" of best practice which I have decided upon is this one. The other three fall squarely under W3C recommendations. This is more amorphous, bringing together recommended practices which to an extent cut across all three.

In essence, it has long been recognized (even before the WWW) that separating content from its presentation is a valuable practice. We don't have the time to go into the reasons for this here.

Similarly, it is recognized that there is great value in using semantic markup.

Both of these are recognized explicitly in the WAI WCAG 1.0 accessibility guidelines and have been guiding principles in the ongoing development of HTML and XHTML going back over several years at least.

But how to measure whether sites embody this approach.

I must repeat here that my approach was not to develop an exhaustive methodology. Rather a much more threshold one, and one which is relatively easily and quickly administered, with little ambiguity. This would allow its use for comparative surveys across sites and time by independent individuals and groups.

The methodology I propose here is a score out of 5, like for the other heads, with points deducted as follows.

  1. Use of Tables for layout - deduct 2 points
  2. Use of font elements - deduct 1 point
  3. Use of presentational attributes - deduct 1 point
  4. Use of elements in a semantically inappropriate way - 1 point
  5. Use of inline style - subtract 1 point

The first point might be a little contentious to some. Tough. While it is a reasonably hard line to take, it is clear from several angles that using tables for layout has for some time been far from a best practice. Similarly, penalizing the use of inline CSS appears harsh, given that it is a perfectly acceptable practice according to the W3C recommendations. This section looks in part at the issue of separating content and appearance, and inline CSS certainly is far from perfect in that regard.

Tables and Layout

Dispiritingly, 71 of the 83 sites used tales for layout to some extent. A small number of these used them sparingly, associated with small parts for a page. For the most part though, all layout is still be created using tables in mainstream big league development.

Font elements

29 of the 83 sites, over a third, used font elements.

Do I need to comment at all on this?

Presentational HTML

47 of the sites used presentational HTML attributes (and I ignored basic table attributes like height, border and so on). Properties like background, border, bgcolor and so on.

33 sites used presentational elements, most notably <b> but also <i> and <u>. While <b> and <i> are not a deprecated, and indeed are part of the strict doctypes, these were invariably used in place of emphasis or headings. So semantically inappropriately. If something is a heading, mark it up as such.

inline CSS

50 of the 83 sites, 60%, used inline CSS. As mentioned, you might argue that since it is perfectly valid, then this is nitpicking at best. However, given that this quite strongly violates a well accepted principle of separating the presentation of content from the content itself, I'd argue that it is a reasonable criticism.

The envelope please

Graph of Structural/Se,amtic HTML results

With a median of 1, and an average of just .66 out of 5, this was a weak area in the survey.

Of all the areas too, it would require the most effort to improve, potentially requiring, unlike the other three areas, a complete overhaul of the underlying code to remove layout tables and presentational HTML and transfer these to CSS.

Accessibility

The criteria I used for this section were simple, mechanical, and essentially uncontroversial. Priority 1 and 2 checkpoints from the WAI WCAG, as determined by Cynthia Says. Again, I'll admit that the results are not necessarily perfect, but they do, I argue, offer a reasonably objective, repeatable benchmark for how well a site adheres to the very basics of accessibility best practices.

As with other areas, a pattern of shortcomings emerged. As with HTML and CSS, these shortcoming, when understood and when tested are on the whole quite readily fixed.

the problems

In order of frequency, the following major issues emerged.

WAI WCAG checkpoint 11.2 Avoid deprecated features of W3C technologies

76 of the 83 sites had this problem. It's easily fixed. Don't use deprecated HTML elements or attributes.

WAI WCAG checkpoint 1.1 Provide a text equivalent for every non-text element

68 of the sites fell down in some respect here. Missing alt attributes were extremely common. Validators, or Cynthia Says will let you know when a required alt attribute is missing. As a note, although alt attributes are required on image elements, they need not have any value. In fact, where an image is entirely decorative, alt="" is the recommended attribute, as this means screen readers won't read aloud the contents of the alt attribute. Also, not only img elements require alt attributes.

WAI WCAG checkpoint 3.4 Use relative rather than absolute units in markup language attribute values and style sheet property values

45 of the 83 sites specified font sizes in CSS using pixels or points.

Use ems or percentages for specifying font sizes (but be mindful of IE5. 5.5 and 6 in quirks mode, where font sizes specified in ems of less than 1em appear as very very small). At present % would be recommend for font sizes under 100% or 1em.

WAI WCAG checkpoint 12.4 Associate labels explicitly with their controls

With 51 of the 83 sites reporting this error, clearly it is a significant problem (as are forms more generally, with their common lack of an action attribute, and their common containment problems within tables, both as noted above).

The WAI guidelines suggest the following approach to both implicitly and explicitly associating a form element with its label.

LABEL for="firstname">First name: 
     <INPUT type="text" id="firstname" tabindex="1">
   </LABEL>

http://www.w3.org/TR/WAI-WEBCONTENT-TECHS/#tech-associate-labels
http://www.w3.org/TR/WCAG10-HTML-TECHS/#forms-labels

WAI WCAG checkpoint 3.2 Create documents that validate to published formal grammars

Well. The less said about this the better.

WAI WCAG checkpoint 7.4 Until user agents provide the ability to stop the refresh, do not create periodically auto-refreshing pages

6 of the sites fell down in this department.

WAI WCAG checkpoint 6.3 Ensure that pages are usable when scripts, applets, or other programmatic objects are turned off or not supported

5 of the sites were reported as not supporting this by Cynthia Says. I suspect that the actually number might be somewhat higher, given the number of document.write pieces of script I saw while looking at code validation issues.

The envelope please

Graph of accessibility results

Accessibility is there area in which sites did the worst. A median score of 0, average of just .45 our of 5, and well over half the sites scoring zero for accessibility is not a great result.

1 site received 5, australia.gov.au, none 4, and only 10 even 3.

Certainly building accessible sites is not easy, but nor is it so difficult as the results, and folklore, suggest. I would argue they reflect as much a lack of will and consequent effort as anything else. This really has to change. Ironically, the Disability Discrimination Act essentially mandates the adherence to accessibility best practices, all the other areas of best practice are voluntary. It is disappointing that we are doing so badly.

Overall

So how do things stack up overall? We've seen the individual categories. What about the scores out of 20?

Graph of overall results

Well, it should come as no great surprise that things don't look too good.

An average of just over 5, a median of 5, and only 15 of 83 sites scoring 10 or more.

A detailed breakdown and count of the errors can be found here

The top 10 sites were

SiteScore/20
australia.gov.au19
BHP Billiton18
ABC16
Bureau of Meteorology15
Yellow Pages15
Rio Tinto15
NAB13
qld.gov.au12
Telecom New Zealand12
Brambles12

But overall, given the tests were not overly exacting, nor designed to do anything other than determine the adherence to core, on the whole well understood and non controversial practices in web development, most of which can be objectively machine tested using free, readily available tools we really ought to be doing much much better.

What's going wrong?

I suspect at the bottom what is going wrong is that the ancient entrenched attitude of "but it works in my browser" is still central to many web developers attitudes. There is little evidence of using the HTML, CSS and Accessibility validators at all, borne out by even the most basic errors being repeated, in HTML, CSS and accessibility, and typographical errors riddling many pages and style sheets. I wonder whether the content writers used spell checkers (and suspect they did). So why aren't we doing the equivalent as developers? Arrogance? Ignorance? Apathy?

In the area of structural and semantic HTML, the widespread use of font elements, and tables for layout also underscore a problem of philosophy. Developers and designers simply aren't taking modern web practices to heart when designing and developing their sites. As long as "it works" then "she'll be right" seems to be the order of the day.

Conclusions

In a way, I found the results somewhat depressing. I had expected quite a bit better, to be frank.

In terms of validation, structural and semantic HTML and accessibility, there is little evidence that the significant majority of sites are doing things any differently than half a decade ago.

But on reflection, if we had done this survey or five years ago, we would have found little if any CSS, few if any doctype declarations, even fewer alt attributes, even more use of images for text.

At least we are moving in the right direction.

Let's hope when we do this survey in the future, we'll find more to be upbeat about.

References and resources

Results

HTML Errors

Most frequent HTML errors
ErrorCount
missing alt attribute 46
script element no type45
TOPMARGIN38
unescaped &s35
generally malformed documents30
unescaped script content29
unquoted attributes values20
malformed tables20
background/border on TR/TD20
form/table containment14
ID value reused14
XHTML in HTML head13
form with no action10
style in body10
xhtml case problems10
xhtml using html syntax9
illegal characters9
malformed comments7
nobr element5
color missing #4
link in body3
style no type3
image with name3
class class3

CSS Errors

Most frequent CSS errors
Errorcount
generic font family39
cursor hand22
syntax problems7
malformed color6
poorly formed comments4
font-color4
expression/filters4
scrollbar properties3
! important2
style element2
malformed groups2
problems with property and value namesmyriad

Structural/Semantic HTML

Most frequent Structural/Semantic HTML problems
problemcount
tables for layout71
inline CSS50
presentational attributes (not tables)47
presentational elements33
font element29

Accessibility

Most frequent accessibilty problems as reported by Cynthia says
errorcount
11.2 Avoid deprecated features of W3C technologies76
1.1 Provide a text equivalent for every non-text element68
12.4 Associate labels explicitly with their controls51
13.1 Clearly identify the target of each link42
3.2 Create documents that validate to published formal grammars27
7.4 Until user agents provide the ability to stop the refresh, do not create periodically auto-refreshing pages6
6.3 Ensure that pages are usable when scripts, applets, or other programmatic objects are turned off or not supported5

Site Scores

OrganisationHTMLCSSStructAccessTotHTML ErrorsDoctype
australia.gov.au5455190HTML 4.01 Tr.
BHP Billiton5553180XHTML 1.0 Tr.
ABC5542160HTML 4.0 Tr.
Rio Tinto5523150XHTML 1.0 Tr.
Bureau of Meterology5523150HTML 4.01 Tr.
Yellow Pages5451150XHTML 1.0 Tr.
NAB05531339HTML 4.01 Tr.
qld.gov.au05431229HTML 4.01 Tr.
Telecom NZ055212193HTML 4.0 Tr.
Brambles05521216XHTML 1.0 Tr.
nsw.gov.au3413111HTML 4.0 Tr.
mycareer.com.au4520111XHTML 1.0 Tr.
whereis.com.au3323114HTML 4.0 Tr.
Origin Energy5311100XHTML 1.0 strict
Jetstar05501066HTML 4.01 Tr.
AGIMO0513921XHTML 1.0 Tr.
Qantas0512839HTML 4.0 Tr.
vic.gov.au302384HTML 4.01 Tr.
Centrelink0422811HTML 4.0 Tr.
Tabcorp2510827HTML 4.01 Tr.
Commonwealth Bank0340715XHTML 1.0 Tr.
Virgin Blue0430736XHTML 1.0 Tr.
Fosters Group2203723HTML 4.01 Tr.
Santos0511750HTML 4.0 Tr.
Telstra0322762XHTML 1.0 Tr.
Wotif0322724XHTML 1.0 Tr.
Woolworths05016171HTML 4.0 Tr.
Sensis0411644HTML 4.01 Tr.
smh.com.au04026264XHTML 1.0 Tr.
Macquarie Bank0510666no doctype
Coles Myer05106181no doctype
St George Bank1500627XHTML 1.0 Tr.
3 Mobile1410655HTML 4.01
CSL150069HTML 4.01 Tr.
Alumina05005219no doctype
CSR0410541no doctype
Macquarie Infrastructure02215128XHTML 1.0 Tr.
Boral04015221XHTML 1.0 Tr.
AGL0410531no doctype
Foxtel0410576no doctype
Westfield0212574XHTML 1.0 Tr.
Suncorp0410552HTML 4.01 Strict
Bigpond05005229XHTML 1.0 Tr.
AMP0410546no doctype
ING Direct0122515HTML 4.01 Tr.
Optus0310452HTML 4.01 Tr.
SBS03014539HTML 4.0 Tr.
flight center0211447XHTML 1.1
Tradingpost0310474no doctype
ANZ0400467no doctype
Channel 100310417HTML 4.0 Tr.
Coca Cola Amatil0210321no doctype
greengrocer.com.au0300385no doctype
Fletcher Building01023267HTML 4.0 Tr.
Promina0201331no doctype
Westpac02013189no doctype
Caltex0111349HTML 3.2
NRL02103191no doctype
ATO01012101no doctype
baggygreen.com.au02002151no doctype
AXA01012144no doctype
Orica0101218HTML 4.0
Greater Union02002339no doctype
shopfast.com.au0200233no doctype
whitepages.com.au01012155no doctype
Paperlinx0020277HTML 4.01 Tr.
Burns Philp0002224no doctype
AFL02002315HTML 4.0 Tr.
IAG0020222no doctype
QBE insurance0110221no doctype
Lendlease01001147no doctype
Channel Seven0010121no doctype
Bluescope Steel0100167no doctype
PBL0001137no doctype
Woodside Petroleum01001103no doctype
General Property Trust01001142no doctype
Amcor0100141HTML 4.0 Tr.
ozemail01001110XHTML 1.0 Tr.
Wesfarmers0100174no doctype
news.com.au00000149HTML 4.01 Tr.
yahoo.com.au0000065HTML 4.01 Tr.
Ticketek0000030HTML 4.0 Tr.
9msn0000044no doctype

Breakdown by sector

OrganisationScore/20

Financial Services

NAB13
Commonwealth7
Macquarie Bank6
St George6
suncorp5
AMP5
ING5
ANZ4
Promina3
Westpac3
AXA2
Orica2
IAG2
QBE2

Telcos

Telstra7
3Mobile6
Optus4

Media

ABC16
smh.com.au6
foxtel5
SBS4
Trading Post4
Channel 104
Greater Union2
Channel 71
News Limited0

Online

Yellow Pages15
mycareer11
whereis11
wotif7
sensis6
bigpond5
whitepages.com.au2
ozemail1
yahoo.com.au0
9MSN0

Retail

Tabcorp8
Woolworths6
Coles myer6
Westfield5
Greengrocer3
shopfast.com.au2
Ticketek0

Energy Mining

BHP Billiton18
Rio Tinto15
Origin Energy10
AGL5
Caltex3
Woodside1

Government

australia.gov.au19
Bureau of Meterology15
qld.gov.au12
nsw.gov.au11
AGIMO9
vic.gov.au8
Centrelink8
ATO2

Travel

Jetstar10
Qantas8
Virgin7
Flight Centre4

Industrial

Brambles16
Fosters7
Santos7
CSL6
Alumina5
CSR5
Macquarie Infrastructure5
Boral5
Coca Cola Amatil3
Fletcher Building3
Paperlinx2
Burns Philp2
Lendlease1
Bluescope1
PBL1
General Property Trust1
Amcor1
Wesfarmers1

Sport

NRL3
baggygreen.com.au2
AFL2

Sector Results

SectorCountAverageMedian
Financial144.64
Telcos35.66
Media94.64
Online105.85
Retail74.35
Energy68.75
Government89.49
Industrial1843
Sport32.32

Similar Work

Miles Burke at Port 80 has done a similar study of Western Austrlain Sites. His results are here

Roger Johansson at 456 Berea Street reports on two surveys of public sector sites and HTML validity. One of Swedish sites (in Swedish), and one of U.S. public sector sites

Discuss this?

If you have any observations, notes, criciticsms, please feel free to discuss this at my blog

John Allsopp is a director at westciv and the lead developer of Style Master CSS editor. He writes widely on web standards and software development issues and maintains the blog dog or higher.

Read more westciv articles