SimpleXML – Less than Simple?

27 03 2011

Shot with own Camera

Image via Wikipedia

I like SimpleXML and it is usually my first choice when it comes to having PHP read XML. I was perfectly content with this PHP extension, until the day I heard someone complain that it was challenging to use it with XML namespaces.

The author of SimpleXML, Sterling Hughes, recognized that XML namespaces may be problematic.  In an article  from 2004 (see http://devzone.zend.com/article/688), he suggests that one use the SimpleXMLElement’s children() method.  The following code illustrates how to do so:

$str = 'xmlns:env="http://www.example.com/envelope">';
$str .= 'Dear John ...';
$sxe = simplexml_load_string($str);
foreach ( $sxe->children("http://www.example.com/envelope") as $s) {
echo $s->getName()  . ' -  ' . $s;
}

The above seems like a  simple and reasonable solution. Could anything be found wanting? Then I discovered a post that Paul Reinheimer made back in 2005 (see http://blog.preinheimer.com/index.php?/archives/172-SimpleXML,-Namespaces-Hair-loss.html). In that post, Reinheimer attempts to extract an eBay timestamp from some XML that contains a nested namespace.  I’ve excerpted the xml he presents as follows:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<GeteBayOfficialTimeResponse xmlns="urn:ebay:apis:eBLBaseComponents">
2005-10-28T01:01:04.668Z
<Ack>Success</Ack>
<Version>429</Version>
<Build>e429_intl_Bundled_1949355_R1</Build>
GeteBayOfficialTimeResponse>
soapenv:Body>
soapenv:Envelope>

When you review the above code,  the timestamp element’s text may seem strange; it is in a special format that is explained at http://pds.nasa.gov/documents/sr/stdref3.7/Chapter07.pdf. Essentially your looking at a date, plus time in what used to be GMT and is now called UTC format.   The time portion consists of the hours, minutes, seconds and fractions of a second.

The following is another way of getting at the timestamp which is really only slightly different from Reinheimer’s solution:

$sxe = simplexml_load_string($str);
$child = $sxe->children('soapenv',TRUE);
$offspring = $child->children("urn:ebay:apis:eBLBaseComponents");
echo $offspring->children()->Timestamp;

The above code works, although it is necessary to invoke the children() method of three simplexml elements. What if the alias ‘soapenv’ were to change, for example what if someone made a typo? Let’s see if we can do better. How about the following:

$ns = $sxe->getNameSpaces(true);
foreach ($ns as $key => $val ) {
		$uri[] = $val;
}
$children = $sxe->children( $uri[0] );
$c = $children->children( $uri [1] );
$elems = $c->children();
foreach ($elems as $e) {
   $names[] = $e->getName();
}
if (in_array('Timestamp', $names)) {
   echo $elems->Timestamp;
}

The code starts by getting all the namespace names actually used in the XML. The method getNameSpaces() has been a feature since PHP 5.1.2. By passing in a true parameter, the method will search recurseively for all namespaces in use. I then pass these names in from the array uri to the different children() methods.  Notice that on the third invocation of children(), I have a SimpleXMLElement $elems which I subsequently iterate through to get the name of each of its elements using the method getName(), a PHP 5.1.3 feature. If Timestamp is among the element names, then I display its text node.

One nifty way of getting the result with SimpleXML is to use its xpath() method. Xpath is a powerful way to perform queries on XML data and support for it has been built into the SimpleXMLElement.  If you are unfamiliar with xpath see http://www.w3schools.com/XPath/xpath_syntax.asp.

Here’s the code:

$sxe->registerXPathNamespace('d', 'urn:ebay:apis:eBLBaseComponents');
$r = $sxe->xpath('//d:Timestamp');
foreach ($r as $result) {
   echo $result;
}

According to the manual, you need to have PHP 5.2.0 or higher to use the resgisterXPathNamespace(). What makes this method so convenient is that we can supply it with the relevant namespace and use the first parameter as a shorthand reference to it. So the code ‘//d:Timestamp’ means find all the Timestamp elements in the namespace referenced by ‘d’ which we know refers to urn:ebay:apis:eBLBaseComponents. This solution is short, to the point and is easy to use.

We could also achieve the same result using  the PHP5 DOM extension with that of SimpleXML.  According to Rob Richards, Dom and SimpleXML are able to work together on the same XML, even at the same time (see his book Pro PHP XML and Web Services, Chapter 11, p. 434. The reason is that each extension uses the same parser, libxml2 (also see http://stackoverflow.com/questions/4803063/what-the-difference-between-phps-dom-and-simplexml-extensions, (answer by Josh Davis)).

The following code is essentially what I previously posted at http://us.php.net/manual/en/intro.simplexml.php with a minor tweak:

$dom = new DOMDocument();
$dom->loadXML( $str );

$ns='urn:ebay:apis:eBLBaseComponents';
$domlist=$dom->getElementsByTagNameNS($ns,'*');
$sxe=simplexml_import_dom($domlist->item(0));
echo $sxe->Timestamp;

After creating a new DomDocument and assigning it to, $dom, that element loads in the  XML as a string.  Next I assign the target namespace to $ns. I use the $dom element’s getElementsByTagNameNS() method, passing in $ns. The second parameter uses an asterisk so that all local elements in namespace $ns are matched. This is really just a convenience so that one can avoid  typing in the specific element GeteBayOfficialTimeResponse. The results are contained in $domlist which if you do a print_r() on it, you’ll find it contains five elements, the first of which is the element GeteBayOfficialTimeResponse. I import that dom element into SimpleXML which converts it into a simplexml element.  I use that element  to access its child element Timestamp.

In answer to the question, appearing in the title, with respect to using SimpleXML with XML namespaces, probably the best answer is that it all depends on the XML what you need to do with it, which SimpleXML method(s) you select and whether  you  use PHP’s DOM extension.

This work is licensed under a Creative Commons License

Advertisements

Actions

Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: