![]() |
Michael Gilfix Online |
Navigation |
Going Beyond Relational DatabasesI just read two interesting papers (sharing the same core set of authors) that argue that the database industry is on the cusp of a shift to evolve beyond the "one-size-fits-all" motto that has been the relational database industry for quite some time. Here is a link to the two papers:
The crux of the argument is that the market has begun shifting towards applications where the performance benefits of domain-specific database engines cannot be ignored. Together, the papers cover a wide variety of ground: (a) column-centric storage rather than row-centric storage, (b) stream processing, (c) text searching, (d) XML databases, and (e) array-centric processing of large data-sets. Approach (a) has benefits for applications like large data warehousing, which are optimized for ad-hoc, read-centric querying. Data is scraped off operational systems in an offline basis, mitigating the need for write performance. In addition, data sets with a large number of columns (increasingly common) benefit performance-wise from column-centric storage. Queries with several predicates need only read the columns addressed by the query, filtering large amounts of data before returning the candidate row set -as opposed to row-centric storage used by most RDBMS, where each row's data must be processed during the query, requiring sifting through large amounts of data. Approach (e) is clearly problematic for a straightforward relational system, as it requires introspective abilities within text data types containing XML, which should have ideally been stored or indexed differently in the first place. However, support for two of these domains has infiltrated the existing database market with built-in data warehousing support, and support for XML storage and query. Approach (e) is clearly of value to scientific processing. It's debatable whether pure array like data should have ever been stored within an RDBMS in the first place. There's definitely a case to be made for supporting such datasets however, given the operational, high availability benefits, and consistency of architecture provided by a DB infrastructure. The authors have made some progress developing such a system, although the performance comparison to the RDBMS is of debatable value since it's clearly an apple/oranges comparison. Approach (b) & (c) are rapidly becoming of increasing interest to business users. Text processing provides a means for a business to mine its own internal intelligence through the reams of knowledge (data) stored within its networks. The enterprise search market is blooming as these benefits are being realized. Already a fair number of players are positioning themselves to capitalize on this market, although the emerging delivery model and architecture is still being worked out. The area that seems most untapped -I agree with the authors -but of such huge promise is stream processing and the existing RDBMS market seems ill-equipped to handle this new application. Data continues to become more and more real-time, and distributed over many different channels (content syndication in Web 2.0, presence in the telecom industry, business dashboards, etc.). As the rate of new data increases, it starts to look more like the financial feed example, rather than the store and query. Indeed, business integration will be focused more on the current state of affairs for its real-time business integration and decision processes. The multi-tier architecture cited combining an application server with a bu, and an RDBMS backing is a commonly used architecture for implementing real-time business data today. This architecture is far from optimal and is indeed heavy-weight as the author concludes. Interesting times indeed for the database industry... By Michael Gilfix at 2007-01-14 21:05 | Databases | Michael Gilfix's blog | login or register to post comments
|
SearchRecent blog posts
|