Biz & IT

Manipulating XML at the command line with xmlstarlet

Linux.Ars returns with a tutorial on how to mess around with XML using …

Ryan Paul – Nov 16, 2005 2:30 AM | 0

Introduction

Welcome to another exciting edition of Linux.Ars! Some of you may remember a previous Linux.Ars which featured a Ruby/Pcap example of how to monitor network traffic. Today, Martin Colello provides us with an introduction to network monitoring with Big Brother, an excellent tool for enterprise-level monitoring solutions. For this week's Tools, Tips and Tweaks segment, I wrote an introduction to command-line XML manipulation with the xmlstarlet utility. I also wrote a brief review of SuperTux, a highly addictive, Mario-inspired screen scroller.

Linux.Ars is all about you, so don't be afraid to get involved! Want to do a section for a future edition? Have a suggestion for a topic that you want us to write about? I would love some feedback. We want your comments, complaints, suggestions, requests, free hardware, death threats, or disparaging remarks about my assorted deficiencies. Send me an e-mail or instant message, or post a comment in the discussion thread!

Developers Corner

Creating custom nmap-based tests for Big Brother

There are many companies utilizing Big Brother for their network/server monitoring needs. That is in no small part due to the fact that with a standard Big Brother installation you can become aware of problems before your customers do, thereby avoiding the dreaded Monday morning meeting with management.

Ars Video

One of the most useful features of Big Brother is the ability to create custom tests that reflect your particular environment. In describing how to create those tests, I will not be going over the install procedures for Big Brother in this article, as the documentation which accompanies the software is more than adequate. Also please note that the custom nmap script presented here is also compatible with hobbit, which is a more advanced open-source monitoring tool based on Big Brother.

Big Brother comes with a relatively strong list of network tests that can be performed with no customization. With nmap we can add any services we like to the monitor, including things like Lotus Notes, Oracle, VOIP, and so on. The nmap-tests.pl script requires a data file, nmap-data, which looks like this:

nmap-data

email.company.com:notes:1352:lotusnotes:Contact Bob the Notes Guy.
iSeries.company.com:ora_listener:1521:oracle:Contact Fred the iSeries Guy.
voice.company.com:signal:1720:H.323:Contact Linda the VOIP Lady.

This data file is colon delimited, the fields are as follows:

Server Name the name of the server on which the service resides. The fully qualified domain name should be used, and it should match exactly the entry in bb-hosts for this server.
Big Brother Column Name the column header text to be rendered on the Big Brother display for this test. This text should be as short as possible to maximize available screen space.
Port Number the number of the port on which the service is hosted. The port number can be determined by running "nmap servername" from your Big Brother server.
NMAP Text this is the text the script will expect nmap to provide when scanning the defined port. For example, if you run "nmap website.company.com" you will see some text indicating http is available on port 80. And so for the NMAP Text, you would choose a word from the nmap output that indicates the service is available.
Contact Info this text will be displayed as is on the Big Brother display. Here you can define any information you like, such as the person responsible for the server, phone numbers, e-mail addresses, etc.

The nmap-tests.pl script follows:

`nmap-tests.pl`

#!/usr/bin/perl -w
# nmap-tests.pl

use strict;

# Initialize some variables
my $servername;
my $testname;
my $port;

my $output;
my $results;
my $color;
my $machine;
my $line;

my $stat;
my $errormsg;
my $date = `/bin/date`;

chomp($date);
my @temp;

# Read in data from config file
my $configfile = '/usr/local/bb/ext/nmap/nmap-data';# Change this to suit your config

open CONFIG, $configfile or die "Cannot open config file $configfile: $!";
my @configdata = &lt;CONFIG&gt;;

close CONFIG;
chomp(@configdata);

# Start main loop to run once for each line in config file
foreach(@configdata){

# Break up config line into separate variables
@temp = split /:/, $_;

$servername = $temp[0];
$testname = $temp[1];
$port = $temp[2];

$output = $temp[3];
$errormsg = $temp[4];

# Get results of nmap scan
$results = `/usr/bin/nmap $servername -p $port | /bin/grep $output`;


# Set color for results
$color = 'red';# Set to red by default, then check for green
$stat = "Service $testname not seen on $servername.  Check $servername.  $errormsg";

if ($results =~ /$output/) {
        $color = 'green';
        $stat = "Service $testname is active on $servername.";
}


# Create machine name with commas instead of dots.
# (Required by big brother)
$machine = $servername;
$_ = $machine;
(s/./,/g);

$machine = $_;

# Create line to send to big brother server.
$line = "status $machine.$testname $color &lt;BR&gt;&lt;BR&gt;$date &lt;BR&gt;&lt;BR&gt;$stat";


# Use bb command to send results to big brother server.
system("/usr/local/bb/bin/bb 192.168.1.100 "$line"");# Change to ip address of your bb server

}# End of main loop and program

While the logic of the above script is fairly simple and easy to understand, it adds a powerful tool to your monitoring capability. Any service can be monitored for availability so long as you know the port number where the service is provided.

To run this script automatically as part of your Big Brother installation, add it to your bb-bbexttab file. Many advanced scripts for montoring specific applications can be found at deadcat.net

Happy monitoring!

Tools, Tips, and Tweaks

Manipulating XML at the command line with xmlstarlet

In the world of open-source software, where open data formats are a necessity, XML is poised to become the de facto standard. A number of popular open-source applications already use XML as their primary data format, and many developers utilize it extensively in specialized, personal-use applications. There is a clear need for powerful and effective tools that facilitate dynamic and interactive manipulation of XML content stored in files on the local drive or acquired from remote locations.

Xmlstarlet is a versatile command-line utility that enables users to manipulate, filter, edit, search, validate, and apply stylesheets to XML content. Unfortunately, the versatility of xmlstarlet comes at the expense of usability. It is extremely unintuitive, and many users struggle with the obfuscated command line parameters and peculiar scripting idiom. There are too many features to cover here, but I would like to introduce this powerful utility and show you a few ways that you can use it to simplify some basic, everyday tasks.

For these examples, I have constructed a simple XML file that contains information about several of the astronaut monkeys launched into space by NASA. Each monkey element contains a name attribute that specifies the name of the individual monkey, a date element that contains the date of the monkey's first flight, and a species element that describes the monkey's species.

`monkeys.xml`

<spaceapes>
  <monkey name="Gordo">
    <date>12/13/58</date>

    <species>Squirrel</species>
  </monkey>
  <monkey name="Able">
    <date>5/28/59</date>
    <species>Rhesus</species>

  </monkey>
  <monkey name="Baker">
    <date>5/28/59</date>
    <species>Squirrel</species>
  </monkey>

  <monkey name="Sam">
    <date>12/04/59</date>
    <species>Rhesus</species>
  </monkey>
</spaceapes>

The xmlstarlet command enables users to extract information from XML content with simple XPath queries. Xmlstarlet can generate plain text or filtered XML. Let's start with a simple data extraction experiment. We will use xmlstarlet to determine how many monkeys are described in the monkeys.xml file:

$ xmlstarlet sel -t -v "count(//monkey)" monkeys.xml

The sel instruction tells xmlstarlet that we plan to extract or filter data. The -t parameter indicates that the following parameters are part of the output template, and the -v parameter is used to output the value of an xpath expression. In this case, our xpath expression will count all the monkey element nodes. The xpath syntax is beyond the scope of this brief introduction, and interested readers can learn the entire xpath language from this helpful tutorial at the Zvon web site.

Now we will generate a table that lists the name of each monkey as well as its species:

$ xmlstarlet sel -t -m "//monkey" -v "species" -o " " -v "@name" -n monkeys.xml

Squirrel Gordo
Rhesus Able
Squirrel Baker
Rhesus Sam

In this example, we iterate over each monkey element in the XML file, and display the relevant data. The -m parameter tells xmlstarlet to iterate over all nodes that match the provided xpath expression, which is "//monkey" in this case. The template parameters that follow the xpath expression will be evaluated and output for each matched node. In this example, we display the species element of each monkey element, as well as the name attribute. Note that the value xpath expressions all assume that the current context is the matched node, rather than the top level of the xml document: "species" is used instead of "//monkey[x]/species" . The -o parameter tells xmlstarlet to output a text string, and it is used in this example to include a space between the two values associated with each monkey. At the end of our template, we include the -n parameter, which tells xmlstarlet to include a new line character. If we omitted the -n parameter in this example, all the data would appear on one line of text.

Xmlstarlet can also operate on remote XML content. Let's abandon our monkey example, and try to extract some content from the Ars Technica RSS feed:

$ xmlstarlet sel --net -t -m "//item" -o "Title: " -v "title" -n 
   -o "Author: " -v "author" -n  http://arstechnica.com/index.ars/rss

Title: Microsoft server software to go 64-bit only
Author: jeremy@arstechnica.com (Jeremy Reimer)
Title: Firefox 1.5 release expected soon
Author: segphault@sbcglobal.net (Ryan Paul)
Title: Online DVD rentals have bright future
Author: eric@arstechnica.com (Eric Bangeman)
...

In this example, we include the --net parameter to tell xmlstarlet to download the XML content from a remote location. The example iterates over every item element in the XML document, and displays the title and author elements for each item.

Xmlstarlet can also process remote html content. If you use the --html parameter in addition to the --net parameter, you can extract data from web sites. To generate a list of image files used in a web page, simply iterate over each img element and display the src attribute:

$ xmlstarlet sel --net --html -t -m "//img" -v "@src" -n http://xmlstar.sourceforge.net

img/xmlstarlet.png
/img/libxml2-logo.png
http://sourceforge.net/sflogo.php?group_id=66612&type=1
http://sourceforge.net/dbimage.php?id=3426
http://images.sourceforge.net/images/xml.png
http://www.zvon.org/site/graphic/zvon.gif

Now let's try a more sophisticated example. As many of you know, the Open Document Format, which is utilized by OpenOffice 2 and other open source office applications, is based on XML. With a little bit of clever trickery, you can use xmlstarlet to extract content from your OpenOffice documents right at the command line. Open Document files are essentially compressed zip archives that contain all the relevant files associated with a document. The actual document text is stored in a file called content.xml within the archive. In order to use xmlstarlet to extract data from content.xml , you have to use the the unzip command to pipe the contents of content.xml into the xmlstarlet utility.

In our next example, we will list all the headings in the document and the associated heading level values, a technique that could be used to automatically generate outlines of open documents. The Open Document format uses many different XML namespaces for different kinds of content. Various text elements use the "urn:oasis:names:tc:opendocument:xmlns:text:1.0" namespace, so we will need to use that one to get the headings. Xmlstarlet allows you to establish namespace keywords with the -N parameter. In our example, we will assign the Open Document text namespace with the keyword text:

$ unzip -p test.odt content.xml | xmlstarlet sel 
  -N text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" 
  -t -m "//text:*[@text:outline-level]" -v "@text:outline-level" -o " " -v . -n

Our example iterates over every text element that has an outline-level attribute, and it displays the associated level value and the text of the node itself. Note that we do not tell xmlstarlet which file it should use for this operation, because we pipe in the relevant content.

As you can see, xmlstarlet is an extremely useful tool for command line XML operations. There are many other features that I have not presented here, and interested users should take a look at the documentation for additional examples.

Cool App of the Week

SuperTux

I don't know about you folks, but I'm a hardcore Mario fanatic. My obsession with Super Mario World for the SNES borders on religion, I have played Mario 3 so many times that I can probably beat the first world with my eyes closed, and I have unraveled virtually every hidden feature in Mario RPG. My dreams are filled with plumbers, mushrooms, and funky flying turtle things that inexplicably pursue my destruction. For all those reasons, I have become hopelessly addicted to SuperTux, an outstanding, open source, Mario-inspired screen scroller for Linux.

SuperTux is essentially a Mario clone with unique art, original audio content, and well-designed levels. The main character is, of course, an adorable penguin that hops his way through tricky levels filled with walking bombs and evil snowballs. The current release contains all the content associated with Milestone 1, which includes 9 different enemies and 26 playable levels that feature the obligatory winter theme. Milestone 2, which is currently under active development, will add new enemies, up to 30 new levels with a forest theme, support for penguin "flapping" (doesn't that sound cute?), and internationalization support.

SuperTux in action

I have now beaten every level in the first world, and about a third of the levels on the bonus island. Despite a few subtle bugs and the amatuer quality art, this game is highly entertaining and woefully addictive. The developers are very creative, and some of the concept art illuminates other features planned for future releases. If you are a Mario fan, or you are looking for a fun way to waste some time on your Linux system, you might want to check out SuperTux. Warning: it will decimate your productivity, so play at your own risk.

/dev/random

Gaim-vv developers claim that Google has too much control over Gaim development.
Oooh shiny! OSDir has a screenshot tour of KDE 3.5 RC 1.
Microsoft's Charles Fitzgerald thinks that open source users are "dorks."
MIT turns down free copies of OS X for its US$100 laptop project because Apple isn't willing to distribute its operating system under an open-source license.
OSTG announced a patent pledge web site that makes it easy for open source developers to find out which patents companies like IBM have made available for royalty-free usage.
Linux.com has a tutorial that introduces netcat, the hacker swiss army knife.

Ryan Paul Ars Editor Emeritus

Ryan is an Ars editor emeritus in the field of open source, and and still contributes regularly. He manages developer relations at Montage Studio.

0 Comments