Samstag, 20. Oktober 2007

XML API's

The two most common XML API's are DOM and SAX. Now there is a third 'major' API evolving: StAX.


Concepts
Push - SAX
The API 'pushes' data to the application. The data is sequentially processed.
Pull - DOM
The API offers to 'pull' data out. Full random access to the data is available.


SAX uses a hierarchical event handler and DOM uses a nested iterators. Finally StAX uses a hierarchical iterator. So while SAX is very fast and efficient the API is a little bit difficult to use. You have to maintain  where you are in your document yourself on the other hand an iterator gives you a common control abstraction. In most cases you don't need access to the complete XML data tree but to a distinct subset and right here StAX fits in. 
So I think this illustrates pretty nicely that it's not only important to look at the 'speed' of your XML processing application but also about maintainability and extensibility. If it's really hard for you to implement your solution with SAX and performance is not a problem StAX  is a nice alternative.  
This is the first time where I realized what huge impact the design of an API has on code. Or to be more precise, on the actual result of what you had intended to code.