Unimpressed by NodeIterator

I just posted a run down of some of the new DOM Traversal APIs in Firefox 3.5. The first half of the post is mostly a recap of my old Element Traversal API post.

The second half of the post is all about the new NodeIterator API that was just implemented. For those that are familiar with some of the DOM TreeWalker APIs this will look quite familiar.

It’s my opinion, though, that this API is, at best, bloated, and at worst incredibly misguided and impractical for day-to-day use.

Observe the method signature of createNodeIterator:

var nodeIterator = document.createNodeIterator(
  root, // root node for the traversal
  whatToShow, // a set of constants to filter against
  filter, // an object with a function for advanced filtering
  entityReferenceExpansion // if entity reference children so be expanded
);

This is excessive for what should be, at most, a simple way to traverse DOM nodes.

To start, you must create a NodeIterator using the createNodeIterator method. This is fine except this method only exists on the Document node – which is especially strange since the first argument is the node which should be used as the root of the traversal. The first argument shouldn’t exist and you should be able to call the method on any DOM element, document, or fragment.

Second, in order to specify which types of nodes you wish to see you need to provide a number (which is the result of the addition of various constants) that the results will be filtered against. This is pretty insane so let me break this down. The NodeFilter object contains a number of properties representing the different types of nodes that exist. Each property has a number associated with it (which makes sense, this way the method can uniquely identify which type of node to look for). But then the crazy comes in: In order to select multiple, different, types of nodes you must OR together the properties to creating a resulting number that’ll be passed in.

For example if you wanted to find all elements, comments, and text nodes you would do:

NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT

I’m not sure if you can get a much more counter-intuitive JavaScript API than that (you can certainly expect little, to no, common developer adoption, that’s for sure).

Next, the filter argument accepts an object that has a method (called acceptNode) which is capable of further filtering the node results before being returned from the iterator. This means that the function will be called on every applicable node (as specified by the previous whatToShow argument).

Two points to consider:

  • The filter argument must be an object with a property named ‘acceptNode’ that has a function as a value. It can’t just be a function for filtering, it must be enclosed in a wrapper object. Update: Actually, this isn’t true – at least with Mozilla’s implementation you can pass in just a function. Thanks for the tip, Neil!
  • The argument is required (even though you can pass in null, making it equivalent to accepting all nodes).

The last argument, entityReferenceExpansion, comes in to play when dealing with XML entities that also contain sub-nodes (such as elements). For example, with XML entities, it’s perfectly valid to have a declaration like <!ENTITY aname "<elem>test</elem>"> and then later in your document have &aname; (which is expanded to represent the element). While this may be useful for XML documents it is way out of the scope of most web content (thus the argument will likely always be false).

So, in summary, createNodeIterator has four arguments:

  • The first of which can be removed (by making the method available on elements, fragments, and documents).
  • The second of which is obtuse and should be optional (especially in the case where all nodes are to be matched.
  • The third which requires a superfluous object wrapping and should be optional.
  • The fourth of which should be optional.

None of this actually takes into account the actual iteration process. If you look at the specification you can see that all the examples are in Java – and when seeing this a lot of the API decisions start to make more sense (not that it really applies to the world of web-based development, though). In JavaScript one doesn’t really use iterators, more typically an array is used instead. (In fact a number of helpers have been added in ECMAScript 5 which make the iteration and filtering process that much simpler.)

I’d like to propose the following, new, API that would exist in place of the NodeIterator API (dramatically simplifying most common interactions, especially on the web).

// Get all nodes in the document
document.getNodes();

// Get all comment nodes in the document
document.getNodes( Node.COMMENT_NODE );

// Get all element, comment, and text nodes in the document
document.getNodes( Node.ELEMENT_NODE, Node.COMMENT_NODE, Node.TEXT_NODE );

I’d also like to propose the following helper methods:

// Get all comment nodes in the document
document.getCommentNodes();

// Get all text nodes in a document
document.getTextNodes();

Beyond finding elements, finding comments and text nodes are the two most popular queries types that I see requested.

Consider the code that would be required to recreate the above using NodeIterator:

// Get all nodes in the document
document.createNodeIterator(document, NodeFilter.SHOW_ALL, null, false);

// Get all comment nodes in the document
document.createNodeIterator(document, NodeFilter.SHOW_COMMENT, null, false);

// Get all element, comment, and text nodes in the document
document.createNodeIterator(document, 
    NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT,
    null, false
);

This proposed API would return an array of DOM nodes as a result (instead of an NodeIterator object). You can compare the difference in results between the two APIs:

NodeIterator API

var nodeIterator = document.createNodeIterator(
    document,
    NodeFilter.SHOW_COMMENT,
    null,
    false
);

var node;

while ( (node = nodeIterator.nextNode()) ) {
    node.parentNode.removeChild( node );
}

Proposed API

document.getCommentNodes().forEach(function(node){
    node.parentNode.removeChild( node );
});

Another example, if we were to find all elements with a node name of ‘A’.

NodeIterator API

var nodeIterator = document.createNodeIterator(
    document,
    NodeFilter.SHOW_ELEMENT,
    {
        acceptNode: function(node){
          return node.nodeName.toUpperCase() === "A";
        }
    },
    false
);

var node;

while ( (node = nodeIterator.nextNode()) ) {
    node.className = "found";
}

Proposed API

document.getNodes( Node.ELEMENT_NODE ).forEach(function(node){
    if ( node.nodeName.toUpperCase() === "A" )
        node.className = "found";
});

Almost always, when finding some of the crazy intricacies of the DOM or CSS, you’ll find a legacy of XML documents and Java applications – neither of which have a strong application to the web as we know it or to the web as it’s progressing. It’s time to divorce ourselves from these decrepit APIs and build ones that are better-suited to web developers.

Update: An even better alternative (rather than using constants representing node types) would be something like the following:

 document.getNodes( Element, Comment, Text );

Just refer back to the back objects representing each of the types that you want.

Posted: June 19th, 2009


Subscribe for email updates

54 Comments (Show Comments)



Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.


Secrets of the JavaScript Ninja

Secrets of the JS Ninja

Secret techniques of top JavaScript programmers. Published by Manning.

John Resig Twitter Updates

@jeresig / Mastodon

Infrequent, short, updates and links.