Week of Sun, 2007-03-04 07:00 to Sun, 2007-03-11 06:59

Pull Based APIs with Iterators and Streams

The use of iterators and streams in the design of an API/interface can be a powerful way to structure APIs for improved system performance, while maintaining simplicity for code that uses the API. These can be used to implement a design methodology, called "Pull Based" or "Pull" APIs. The basic idea behind a pull methdology is, rather than presenting the application code with an object model or a collection of state to be navigated or processed by the application, the application code asks for the data a piece at a time. Some variants of this design methodology allow the application to specify what kinda of data it is interested in, which can be used to push filtering into lower layers to improve performance.

The StAX (Streaming API for XML) is an example of a pull based API, which realizes significant performance gains. In the Telecom Web Services Server (TWSS) product, a pull based API was used to implement SOAP attachment extensions to the WebSphere ESB product. This approach was modeled after the pull based SAAJ SOAP attachment APIs.

The basic idea behind improving performance is to hide the I/O (network or disk) behind the abstraction of the iterator and stream objects. Application code can begin processing in parallel with reading from the network. This is done through the implementation of the API or underlying platform. Each call of the application code to the iterator will block until the data becomes available. As soon as a chunk of data is available, the application can process in parallel while the next chunk is read. If the application code is performing writing, then this mode of processing will form a pipeline. Output can be written as soon as input has been processed. For network based applications, this can be a useful means of processing large amounts of data with limited memory requirements, by pushing the stream of data into the network.