If information is truth, then are we going to be less prudish in the future? The future of HTML, XML, and Java look similar
Apr 20

The new attack on the RDBMS

Tech with tags: , , , Add comments

Data in the cloud

Remember when the Object database was going to kill the Relational database?

OOP was the sexy programming model, and relational set theory seemed so quaint. Once you are using Objects, why wouldn’t you just want to persist them instead of having to drop down to this crazy SQL? Inner joins instead of just person.name.first? Fools.

Well, it didn’t quite work out that way of course. Instead we got half way measures such as object-relational systems, and the huge quagmire (as Ted Neward would put it) of the ORM years, which continue to do well.

Why did Object databases fail? If I remember correctly, it feels like there were a couple of problems:

  • They were slow at first
  • People had a crap load of tools around the relational world.

It was fine to do some simple work, but what about reporting? Where was the Business Objects for the Object database? I remember working with a huge bank that used Versant AND Oracle, and they had a nightmare involving syncing between the two.

Ok, so the Object database failed, so what is the new attack?

The Cloud-y Web

SQL is an enterprise victory that managed to make its way into the consumer Web and application space. A lot of people knew SQL, and it seemed obvious to have a LAMP stack or a Java / .NET stack backed by a RDBMS.

Is this really the right choice for Web applications? Why was Rails so successful? It was due to the productivity gain. How much of that is due to ActiveRecord vs. the other Action* pieces that make up Rails? I would argue a large percentage. Working with the database was actually a big pain in the tuches. ActiveRecord together with migrations helped a lot. It gave us a nice middle man between a full ORM and the SQL that we know and …. know.

What if the database piece didn’t need to be that painful? The source of the pain can be the paradigm shift between the various worlds, but also a huge part of it is scalability. When you have to scale your website, it can be fairly easy to make your application stateless, and then the bottleneck becomes the poor database. This is when you break out the master / slave relationships, think about partitioning of the application, and caching layers (Tangosol Coherence, memcached). Now you have to really think about an architecture ;)

Google had to do this thinking a long time ago, as they obviously have to scale their applications to a huge degree. Scaling the fairly read-only search operation is one thing, but as soon as you get to read-write operations you have a lot more of a head-ache. Scaling a MMORG astounds me. To be that real-time, and having the world constantly changing. Wow. At least there are the separations of locations (world X can be this cluster of machines).

Now we get to Bigtable, the engine that Google built to scale in the cloud. Amazon has their new SimpleDB, and there are others.

What these guys are all doing, is revisiting the database story. Maybe it is time to think about if a RDBMS is the no-brainer choice.

When Google App Engine launched, I thought there would be a lot of people saying “oh man, I just want MySQL instead of this new thing”. I barely heard that, and instead heard more thoughts along the lines of “It is great to be able to use the scalable database that Google uses internally.” In fact, when you start using it and see that it is schema-less, you get a bit of a relief. You can build your model, and even use an Expando to be highly dynamic on the data in the backend. You go along your way, iterating on your code and model and you don’t have to spend time working on up and down migration methods. Doesn’t that remind you a little of the OODBMS dreams? But this time it is fast and scalable!

Resting on the Couch

With the interest in Bigtable via App Engine pushing thought, we also have CouchDB pushing from the other end. The end that says, what would a RESTful approach to a database be?

Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API.

JSON built in. JavaScript right there. A database built for the Web?

It is great to see new ideas and thought about the storage of data. The RDBMS isn’t going anywhere of course. There are still a ton of tools out there for it and legacy code, and we all know that:

Data stays where it lies.

It is much easier to implement a new application talking to the old datastore, than migrate the datastore itself. It is like taking out the foundation. Also, SQL is getting new life in places too.

SQLite

I recently saw an application that used GWT on the client, and JavaScript on the server, which reminded me of my comic above. I wonder if we may end up with another flip, having SQL being used in the client, and other systems like CouchDB, Bigtable, etc being used in the enterprise / on the server.

It is happening on the client. SQLite seems to be everywhere. Your operating system, phone, browser, applications, everywhere. I bet I have around 20 SQLite engines on my system right now, and growing. Why is this happening? Well, instead of coming up with your own data format, parser, and search engine, why not just use SQLite and be done. It is very faster, perfect for single user mode, so everyone is a winner.

So, SQL has a looooong future ahead of it, but it will be interesting to see how the RDBMS weathers the latest storm.

What do you think?

22 Responses to “The new attack on the RDBMS”

  1. Geoff Says:

    Don’t forget http://www.nextdb.net :-) Check out our AJAX API.

  2. Edwin Khodabakchian Says:

    Awesome article!

    The jury is still out. It would be great to have a server side JS framework on the AppEngine to be able to build easily RESTful JSON services! Please!

  3. Ludovic Dubost Says:

    Indeed great article.. There is also HBase based on Hadoop.

    Indeed a REST api on a open source fault tolerant distributed database is what is needed for startup these days.

  4. Markus Says:

    What about migration of data? With SQL it is fairly easy o move from, say Oracle to MySQL. If I build my app on AppEngine and later want to move to another provider, what can I do? Vendor lock in?

  5. Rickard Says:

    FWIW the persistence API in Qi4j(.org) is entirely based on the schema-less notions of SimpleDB et al. I’ve never had to use an ORM to this date, and I haven’t missed them one bit whereas my friends who use them seem to not be so happy. We’ll see how it goes.

  6. Martijn Faassen Says:

    I agree that object databases haven’t become very popular. As someone who’s been using a Python-based object database (the ZODB) for about 10 years now, I wouldn’t say they’ve been a complete failure. Plone is a rather popular system that uses an object database. So, they are still alive and well in various niches. That’s not to say relational databases don’t have advantages compared to an object database, and the other way around as well. I also agree SQL isn’t about to go away, though it’s good to see various alternative approaches to data storage being explored.

    Here is a recent article about the ZODB: http://www.ibm.com/developerworks/aix/library/au-zodb

  7. Rob Says:

    That’s a pretty good summary.

  8. ryan Says:

    > What about migration of data?

    True enough, migration is already painful moving from MySQL to PostgreSQL. But think of it this way. With a painless ’schema’ and the JSON-or-similar methods of getting data out. You can rather simply just _write an application_ to migrate the data. Pull it out of CouchDB (or whatever) and insert it into an RDBMS (or other ODB) however you can (ORM, more JSON, etc).

    Great article!

  9. Fred Says:

    The only logical organization would use the same language in the browser as the server. I’d love to force the browser makers to implement Ruby, but I think they have too much power in place. So the only logical organization uses JavaScript everywhere.

    Fortunately, the world of computers is run by people, and not by logic…

  10. Spocke Says:

    Great stuff. I remember one big problem with the ODB systems in their early days where the lack of a standard query language like SQL. Sure there where some standards but they where far from the SQL standard.

    I think it neat to see more of these flexible storage platforms. One I love when it comes to just indexing contents for searches is http://lucene.apache.org I’m very impressed with it’s performance.

  11. j Says:

    …………but what about reporting?

  12. Caleb Cushing Says:

    Why do so many people have problems with SQL? it makes more sense to me than OOP.

  13. Dave Says:

    SQL is just incredibly powerful as a query language. Nothing has ever come close. OO querying (as in hibernate) is just unreadable and unwritable and not nearly as powerful as SQL.

    The main problems with RDBMs have to do not with SQL, but with the schema. Creating a schema, foreign keys, etc. is a pain. Changing it is a pain. Add to that the corporate policy that goes with changing such schemas and forget about it.

    Elimination of the schema, or simplification of it’s creation should be the goal.

  14. JeanHuguesRobert Says:

    Object Oriented Databases were doomed.

    Object Oriented Programming was invented to control the increasing complexity that resulted from ever growing programs in a world where the quantity of memory doubled every 18 months or so. Intellectual capacity of the programmer was the limit, not size/speed of memory.

    Things were totally different in the Database world. Limit there is not the complexity, there is another limit first, it is the limit due to the slow speed of access to the huge set of required data.

    Did OODB provide a response to that problem of speed of access ? Not at all.

    Quite the contrary actually, OODB just made the problem bigger by increasing the size/complexity of the database entities, thanks to its ability to hide complexity with encapsulation!

    SQL is the machine language of databases. Databases are not fast enough to support a higher level concept.

    Things are changing. The amount of ram is rapidly increasing to the point where lots of databases can reside in memory, not on disk. OTOH the focus shifted from single user programs to massive multi users programs that required distributed systems.

    I foresee a future for OODB, but first we need to solve the multi-core issue in OOP, with some new mean to handle the complexity of parallelization. That’s a tricky issue too, maybe more so than the database one.

  15. Keios Says:

    You should also check out, http://getschevo.org for a great Python ODBMS.

  16. Jonathan Ellis Says:

    “this time it is fast and scalable!”

    well, scalable, anyway. From what I have seen neither BigTable nor SimpleDB is fast by modern RDBMS standards. And that’s a big problem at the low end, which Google at least seems to care about.

  17. E Utrilla Says:

    Another option is to implement in a RDBMS a generic structure able to store any type of data object (maybe with some restrictions) and the relations between the different instances. Add a Java DAO layer to access it and you have the best of both worlds: An object oriented abstraction for flexibility and ease of development when working in a graph-oriented approach, and a SQL backend to perform more complex, not sequential searches.

    Believe or not, I’ve done it. And it was a pain to develop, but performance, while not amazing, wasn’t that bad once we wrote the PL/SQL code to access an object. Any object. And it is really easy to maintain: we can add attributes and even new types of entities without changing the table definitions, just the associated (custom) metadata. That includes relationships between objects, without any need to set new foreign keys. Since the structure is standard for all objects, we can use the same PL/SQL (and a bit of dynamic SQL) to perform custom searches.

    Of course migration to another database is extremely easy. The hard point is the migration of the PL/SQL procedures. Everything can be done in plain SQL, though (our first version was), but then performance is reduced.

    And I have to admit that it gets funny when I have to look at the data directly using a SQL client instead of our DAO layer.

  18. Richard Kimber Says:

    My approach to SQL, and RDMS specifically, has been on my mind quite a lot recently. I am an ASP.NET developer and technologies such as LINQ have made me rethink how I interact with storage. I actually like working with SQL and feel safe within structured stored procedures, but I can’t help thinking “there’s a better way.”

    This was a good post and it has given me food for thought. I’m not sure I’m ready to ditch SQL Server just yet though.

  19. Nati Shalom Says:

    You should check another approach which i refer to as Persistence as a Service (PaaS) which combines In-Memory-Data-Grid and existing databases. I think that it addresses many of the issues you outlined above in a very different way and with little compromises on performance or consistency.
    http://natishalom.typepad.com/nati_shaloms_blog/2008/03/scaling-out-mys.html

  20. Tuber Says:

    @ Jonathan Ellis

    Yup fast too. Do a google search for anything you want. In the top right corner of the page, I usually get my query done in a tenth of a second. Not bad for a database that stores a good portion of the Internet. No RDBMS has done that. Not saying bigtable is perfect (relational DBs will always have its place) but if you have problems that map well to this space then you have quite the tool at your disposal.

  21. Corne Oosthuizen Says:

    What about post-relational database like Cache - It provides a OO interface and you can still use SQL or even dig down into direct storage for lots of speed.
    The database does have a build in Ajax type solution to get your data in/out.

    http://www.intersystems.com/cache/index.html

    Doesn’t have a client side storage :( so will still have to use something like Gears for that.

  22. rektide Says:

    I tend to think that OODBMS just got eaten by the web. It wasnt just OODBMS that got eaten alive, it was most any heavyweight db including RDBMS that got eaten as developers flocked to extremely lightweight tools like MySQL. OODBMS in particular had issues but I tend to think that they would have been addressed and OODBMS would have seen a slow ascent, had it had a market to continue growth.

Leave a Reply

Spam is a pain, I am sorry to have to do this to you, but can you answer the question below?

Q: What are the first four letters in the world British?