![]() |
Michael Gilfix Online |
Navigation |
Pull Based APIs with Iterators and StreamsThe use of iterators and streams in the design of an API/interface can be a powerful way to structure APIs for improved system performance, while maintaining simplicity for code that uses the API. These can be used to implement a design methodology, called "Pull Based" or "Pull" APIs. The basic idea behind a pull methdology is, rather than presenting the application code with an object model or a collection of state to be navigated or processed by the application, the application code asks for the data a piece at a time. Some variants of this design methodology allow the application to specify what kinda of data it is interested in, which can be used to push filtering into lower layers to improve performance. The StAX (Streaming API for XML) is an example of a pull based API, which realizes significant performance gains. In the Telecom Web Services Server (TWSS) product, a pull based API was used to implement SOAP attachment extensions to the WebSphere ESB product. This approach was modeled after the pull based SAAJ SOAP attachment APIs. The basic idea behind improving performance is to hide the I/O (network or disk) behind the abstraction of the iterator and stream objects. Application code can begin processing in parallel with reading from the network. This is done through the implementation of the API or underlying platform. Each call of the application code to the iterator will block until the data becomes available. As soon as a chunk of data is available, the application can process in parallel while the next chunk is read. If the application code is performing writing, then this mode of processing will form a pipeline. Output can be written as soon as input has been processed. For network based applications, this can be a useful means of processing large amounts of data with limited memory requirements, by pushing the stream of data into the network. In addition to hiding I/O, iterators can also be used for generation of data as needed. Rather than corresponding to an actual collection of data, each call to the iterator's next method computes the data to be returned. This data may be based on the state of the iterator itself (for example, a key stream), or may be based on other server state. I/O may also be viewed as generational in nature, where each advancement of the iterator pulls more input. The idea being that only the input that's needed is kept in memory (although read ahead would be needed for performance). This is the approach taken in the JDBC APIs for processing result sets. This concept is closely tied to parallelization of I/O. The abstraction of the iterator and streams approach allows for staged development. The first version of the API need not implement parallelization or generation. It can be added later, transparent to the application. It may also turn out that the particular use case for the API does not warrant such implementation complexity. Still, a pull based design can serve as a point of future extensibility should the performance gains become worth it. A decision point may be the size of the chunks of data being processed. For example, the SOAP attachment support extensions that were included with the TWSS product assume a certain range of SOAP attachment sizes that correspond to typical service provider usage around Multimedia Messaging Service (MMS). Such attachment sizes are typically on the order of 150-300k, as they are destined to mobile devices with limited bandwidth, memory, and processing. Thus, the implementation complexity for parallelization wasn't worth it for the first round of the API. However, as service provider use cases evolve and attachment sizes grow, such as in support of full blown MP3s, parallelization can have significant benefits for attachment processing. So the caveat for this design approach is that it is not a substitute for the power of asynchronous APIs. However, all this can occur without the application being aware of this fancy parallelization using a straight-forward synchronous processing logic. Then to scale in throughput, you can scale up in terms of processing threads, where each thread can handle a different data stream simultaneously. Clearly this approach scales nicely on multi-processor and multi-core systems for heavily multi-threaded apps. It used to be that a disadvantage to this approach is that type safety was lost. The object returned by the iterator would need to be cast to the appropriate type, and this could only be specified by contract in the API. For some, this could be deemed a liability. Support for generics in Java 1.5 solves this problem by allowing for incorporating typing into the interface -a feature that makes the use of iterators in an API much more attractive. By Michael Gilfix at 2007-03-08 05:46 | Design | Michael Gilfix's blog | login or register to post comments
|
SearchRecent blog posts
|