The two widely used methods for parsing an XML document are SAX and
DOM. A SAX (Simple API for XML) parser is event-driven. It reads
the XML document incrementally and calls a delegate method whenever it
recognizes a token. Events are generated at the beginning and end of the
document, and the beginning and end of each element. A DOM (Document
Object Model) parser reads the entire document and forms a tree-like
corresponding structure in memory. You can then use the XPath query
language to select individual nodes of the XML document using a variety of
criteria.
Most programmers find the DOM method more familiar and easier to
use; however, SAX-based applications are generally more efficient, run
faster, and use less memory. So, unless you are constrained by system
requirements, the only real factor when deciding to use SAX or DOM parsers
comes down to preference.
1. Parsing XML with libxml2
The wrappers offer two methods. The only difference between the two is that one expects an
HTML document and is therefore less strict about what constitutes a
“proper” document than the other, which expects a valid XML
document:NSArray *PerformHTMLXPathQuery(NSData *document, NSString *query);
NSArray *PerformXMLXPathQuery(NSData *document, NSString *query);
If you want to return the entire document as a single data
structure, the following will do that. Be warned that except for the
simplest of XML documents, this will normally generate a heavily nested
structure of array and dictionary elements, which isn’t particularly
useful:
NSString *xpathQueryString;
NSArray *nodes;
xpathQueryString = @"/*";
nodes = PerformXMLXPathQuery(responseData, xpathQueryString);
NSLog(@"nodes = %@", nodes );
The XML
document had a structure that looked like the following snippet:
<forecast_conditions>
...
<icon data="/ig/images/weather/chance_of_rain.gif"/>
</forecast_conditions>
<forecast_conditions>
...
<icon data="/ig/images/weather/chance_of_rain.gif"/>
</forecast_conditions>
<forecast_conditions>
...
<icon data="/ig/images/weather/chance_of_rain.gif"/>
</forecast_conditions>
<forecast_conditions>
...
<icon data="/ig/images/weather/chance_of_rain.gif"/>
</forecast_conditions>
To extract the URL of the icons, we carried out an XPath
query:
xpathQueryString = @"//forecast_conditions/icon/@data";
nodes = PerformXMLXPathQuery(responseData, xpathQueryString);
The nodes array returned by the
PerformXMLXPathQuery method looked
like this:
( {
nodeContent = "/ig/images/weather/mostly_sunny.gif";
nodeName = data;
},
{
nodeContent = "/ig/images/weather/chance_of_rain.gif";
nodeName = data;
},
{
nodeContent = "/ig/images/weather/mostly_sunny.gif";
nodeName = data;
},
{
nodeContent = "/ig/images/weather/mostly_sunny.gif";
nodeName = data;
}
)
This structure is an NSArray of
NSDictionary objects, and we parsed
this by iterating through each array entry and extracting the dictionary
value for the key nodeContent, adding
each occurrence to the icons
array:
for ( NSDictionary *node in nodes ) {
for ( id key in node ) {
if( [key isEqualToString:@"nodeContent"] ) {
[icons addObject:
[NSString stringWithFormat:@"http://www.google.com%@",
[node objectForKey:key]]];
}
}
}
2. Parsing XML with NSXMLParser
The official way to parse XML on the iPhone is to use the
SAX-based NSXMLParser class. However,
the parser is strict and cannot take HTML documents:
NSString *url = @"http://feeds.feedburner.com/oreilly/news";
NSURL *theURL = [[NSURL URLWithString:url] retain];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:theURL];
[parser setDelegate:self];
[parser setShouldResolveExternalEntities:YES];
BOOL success = [parser parse];
NSLog(@"Success = %d", success);
We use the parser by passing it an XML document and then
implementing its delegate methods. The NSXMLParser class offers the following
delegate methods:
parserDidStartDocument:
parserDidEndDocument:
parser:didStartElement:namespaceURI:qualifiedName:attributes:
parser:didEndElement:namespaceURI:qualifiedName:
parser:didStartMappingPrefix:toURI:
parser:didEndMappingPrefix:
parser:resolveExternalEntityName:systemID:
parser:parseErrorOccurred:
parser:validationErrorOccurred:
parser:foundCharacters:
parser:foundIgnorableWhitespace:
parser:foundProcessingInstructionWithTarget:data:
parser:foundComment:
parser:foundCDATA:
The most heavily used delegate methods out of all of those
available methods are the parser:didStartElement:namespaceURI:qualifiedName:attributes: method and the parser:didEndElement:namespaceURI:qualifiedName:
method. These two methods, along with the parser:foundCharacters: method, will allow you
to detect the start and end of a selected element and retrieve its
contents. When the NSXMLParser object
traverses an element in an XML document, it sends three separate
messages to its delegate, in the following order:
parser:didStartElement:namespaceURI:qualifiedName:attributes:
parser:foundCharacters:
parser:didEndElement:namespaceURI:qualifiedName:
Returning to the Weather application: to replace our XPath- and
DOM-based solution with an NSXMLParser-based solution, we would
substitute the following code for the existing queryService:withParent: method:
- (void)queryService:(NSString *)city
withParent:(UIViewController *)controller {
viewController = (MainViewController *)controller;
responseData = [[NSMutableData data] retain];
NSString *url =
[NSString stringWithFormat: @"http://www.google.com/ig/api?weather=%@",
city];
theURL = [[NSURL URLWithString:url] retain];
NSXMLParser *parser = [[NSXMLParser alloc] initWithContentsOfURL:theURL];
[parser setDelegate:self];
[parser setShouldResolveExternalEntities:YES];
BOOL success = [parser parse];
}
We would then need to delete all of the NSURLConnection delegate methods, substituting
the following NSXMLParser delegate
method to handle populating our arrays:
- (void)parser:(NSXMLParser *)parser
didStartElement:(NSString *)elementName
namespaceURI:(NSString *)namespaceURI
qualifiedName:(NSString *)qName
attributes:(NSDictionary *)attributeDict {
// Parsing code to retrieve icon path
if([elementName isEqualToString:@"icon"]) {
NSString *imagePath = [attributeDict objectForKey:@"data"];
[icons addObject:
[NSString stringWithFormat:@"http://www.google.com%@", imagePath]];
}
// ... add remaining parsing code for other elements here
[viewController updateView];
}
Warning:
This example parses only the icon element; if you wanted to use NSXMLParser here, you’d need to look at
connectionDidFinishLoading: in the
original Weather app, and add parsing code for each of those elements
before you call [viewController
updateView] in this method (otherwise, it will throw an
exception and crash the app because none of the data structures are
populated).
Unless you’re familiar with SAX-based parsers, I suggest that
XPath and DOM are conceptually easier to deal with than the event-driven
model of SAX. This is especially true if you’re dealing with HTML, as an
HTML document would have to be cleaned up before being passed to the
NSXMLParser class.