HTML Tags

Well, after taking a long break from writing anything on this blog, I’m back and better than ever. I’ll try to post more regularly from now on, with much better content. I hope you, my loyal readers, didn’t miss me too much while I was gone, but anyway, let’s get on with the good stuff. ?

One thing I’ve come to notice a lot of people believe is that in HTML, everything is a tag (or at least can be called a tag). This is most certainly not the case. The most recent offender I’ve seen, and the reason I decided to write this, is the author of Firefox, ALT Tags, and Tooltips, which, as you can see by the title, incorrectly refers to attributes as tags. The article itself is quite good, and I fully agree with its message about tooltips for alt attributes, it’s just the incorrect references to the attributes as tags that bugs me. This author is not the first, nor the last to make the mistake, but it is about time people learn to call things by their real names.

If you read part 5, Terminology, of Joe English’s humorous document: “Not the comp.text.sgml Frequently Asked Questions List”, you will see the common name for everything except a tag, is a tag. The common name for a tag being a command, which, of course, makes perfect sense! ?

    --------------------------------------------------
    ISO/W3C terminology			Common name
    --------------------------------------------------
    attribute				tag
    attribute value			tag
    attribute value literal		tag
    attribute value specification	tag
    character reference			tag
    comment				tag
    comment declaration 		tag
    declaration				tag
    document type declaration		tag
    document type definition		tag
    element				tag
    element type			tag
    element type name			tag
    entity				tag
    entity reference			tag
    general entity			tag
    generic identifier			tag
    literal				tag
    numeric character reference		tag
    parameter entity			tag
    parameter literal			tag
    processing instruction		tag
    tag					command
    --------------------------------------------------

So what exactly is a tag then? Well, before I get to that, I’ll just explain what some of the more common SGML and XML terminology means and what a tag is not.

Firstly, tags are not commands. People believe they are commands because of the misconception that HTML is a presentational language, or even a programming language. HTML is certainly not a programming language, and while it is true that presentational features have crept in, they have already been deprecated and/or removed (X)HTML, or at least will be in future versions.

It is the presentational elements and attributes that could be seen as commands or instructions to display the content in a certain way; however, they are in fact suggestions, just like CSS properties – the only difference being that these presentational suggestions are mixed in with the markup, and have no real semantics that indicate what the content is, only what the author wants it to look like, usually in a visual medium. Any presentational feature, whether done with CSS or the presentational elements and attributes, can be overridden by a user with a user stylesheet (assuming the user agent supports that facility), therefore, they are only suggestions that a user does not have to accept, not commands that a user agent, nor user must obey.

HTML, since it has been formally based on SGML, is intended to mark up the structure and semantics of the content by saying what it is, not what it does, nor how it looks (with the exception of the afore mentioned presentational features). Basically, HTML is not a procedural programming language; it is a descriptive markup language, so tags are not commands.

Attribute Tags

There’s no excuse for calling attributes tags, other than complete laziness and/or ignorance, but as already shown, calling attributes tags is a common mistake. An attribute is a property of an element that is written within the start-tag of an element, and should be referred to as simply an attribute. eg. The alt attribute… is the simplest way of referring to an attribute, and is only slightly longer than writing tag. However, a shorthand method of referring to attributes, which I occasionally see within plain text e-mails, is to write it within vertical bars, or some other delimiter. eg. |alt|.

Character Tags (or Entities)

Character references are sometimes called tags, but are more often called entities. Just like attributes, they are not tags either, but what’s wrong with calling them entities?

According the section 3.2.3 of the HTML 4.01 recommendation, Character references are numeric or symbolic names for characters that may be included in an HTML document. Section 5.3 also states:

Character references in HTML may appear in two forms:

  • Numeric character references (either decimal or hexadecimal).
  • Character entity references.

The numeric character references take the form &#nnnn; (decimal) or &#xnnnn; (hex). Character entity references are the named entities for the ISO-8859-1 characters (from 160 to 255), symbols, mathematical symbols and Greek letters, and finally, markup-significant and internationalization characters.

Based on that, you may think that it is only the numeric references that are incorrectly referred to as entities; however, it is indeed both forms. In SGML and XML there are several types of entities, and the simplest explanation of what an entity is, is that which comes from ISO-8879 itself, the SGML specification: an entity is a collection of characters that can be referenced as a unit. The purpose of entities can be easily understood, but understanding exactly what an entity is and separating that concept from the markup, is more difficult.

An entity is a concept that is defined in a DTD using an entity declaration defining both the name, and the replacement text. The entities are referred to within a document using an entity reference in the form: &name;. The entity declaration and the entity reference are just the markup for the entity, but they are not the entity itself.

Generally, when people say entities in regard to an HTML document, they are actually referring to the character entity references and/or the numeric character references; not the entity itself. Though, this is not always the case, SGML and XML experts will usually get it right, but luckily, the intended meaning of the speaker can generally be understood from the context of its use.

The DOCTYPE Tag

The Document Type Declaration, or simply DOCTYPE, is often referred to as the DTD, or the DOCTYPE tag. The acronym, DTD, can be mistakenly used to refer to the Document Type Declaration, since it has the same initials as the acronym’s defined meaning: Document Type Definition.

The DOCTYPE is not a tag either, it is a declartion, so calling it the DOCTYPE tag is incorrect. However, more often than not, is easier to simply refer to it as just the DOCTYPE.

The <?xml?> Tag

The XML declaration, often referred to as a Processing Instruction or Prolog, is also sometimes called the <?xml?> tag. As you can probably guess, it is not a tag. It is also not a processing instruction either, but that, at least, is forgivable, since it does have the appearance of an XML PI, though it is defined separately as the XML Declaration. It is not the prolog either, but it is part of the prolog.

Elements and Tags

An element is not a tag, as noted at the end of section 3.2.1 Elements, in the HTML 4.01 recommendation:

Elements are not tags. Some people refer to elements as tags (e.g., “the P tag”). Remember that the element is one thing, and the tag (be it start or end tag) is another. For instance, the HEAD element is always present, even though both start and end HEAD tags may be missing in the markup

Tag only refers to either the start- or end-tags. Every element has a start-tag (eg. <p>) and, with the exception of empty elements, an end-tag (eg. </p>). Empty elements never have an end-tag in HTML, though one is required in XML, and thus XHTML (which can use the special empty element tag syntax). As noted, in HTML, the start- or end-tags may be omitted for some elements, but those elements are still present.

An element is more of a concept that is defined using an element declaration, and comprises an element name, that appears within the start- and end-tags, any attributes within the start-tag, and (with the exception of empty elements) its content model and finally, its content. An element is included in a document by writing its start and end tags, as needed, but (like entity declarations and references) the element declaration and tags are only the markup for an element; they are not the element itself. It is important that this distinction be made and understood by authors – I just hope I’ve explained it well enough.

2 thoughts on “HTML Tags

  1. I really did enjoy this article and it is rather interesting that this is a common side-effect of any type of development. Eventually, enough lazy people start calling something a tag and pretty soon, everyone else is just because they’re able to speak the same language. Ugh. Not many people really care enough any more. Thanks for your work (and you’re on my list now).

Comments are closed.