Apr 20

The new attack on the RDBMS

Tech with tags: , , , 34 Comments »

Data in the cloud

Remember when the Object database was going to kill the Relational database?

OOP was the sexy programming model, and relational set theory seemed so quaint. Once you are using Objects, why wouldn’t you just want to persist them instead of having to drop down to this crazy SQL? Inner joins instead of just person.name.first? Fools.

Well, it didn’t quite work out that way of course. Instead we got half way measures such as object-relational systems, and the huge quagmire (as Ted Neward would put it) of the ORM years, which continue to do well.

Why did Object databases fail? If I remember correctly, it feels like there were a couple of problems:

  • They were slow at first
  • People had a crap load of tools around the relational world.

It was fine to do some simple work, but what about reporting? Where was the Business Objects for the Object database? I remember working with a huge bank that used Versant AND Oracle, and they had a nightmare involving syncing between the two.

Ok, so the Object database failed, so what is the new attack?

The Cloud-y Web

SQL is an enterprise victory that managed to make its way into the consumer Web and application space. A lot of people knew SQL, and it seemed obvious to have a LAMP stack or a Java / .NET stack backed by a RDBMS.

Is this really the right choice for Web applications? Why was Rails so successful? It was due to the productivity gain. How much of that is due to ActiveRecord vs. the other Action* pieces that make up Rails? I would argue a large percentage. Working with the database was actually a big pain in the tuches. ActiveRecord together with migrations helped a lot. It gave us a nice middle man between a full ORM and the SQL that we know and …. know.

What if the database piece didn’t need to be that painful? The source of the pain can be the paradigm shift between the various worlds, but also a huge part of it is scalability. When you have to scale your website, it can be fairly easy to make your application stateless, and then the bottleneck becomes the poor database. This is when you break out the master / slave relationships, think about partitioning of the application, and caching layers (Tangosol Coherence, memcached). Now you have to really think about an architecture ;)

Google had to do this thinking a long time ago, as they obviously have to scale their applications to a huge degree. Scaling the fairly read-only search operation is one thing, but as soon as you get to read-write operations you have a lot more of a head-ache. Scaling a MMORG astounds me. To be that real-time, and having the world constantly changing. Wow. At least there are the separations of locations (world X can be this cluster of machines).

Now we get to Bigtable, the engine that Google built to scale in the cloud. Amazon has their new SimpleDB, and there are others.

What these guys are all doing, is revisiting the database story. Maybe it is time to think about if a RDBMS is the no-brainer choice.

When Google App Engine launched, I thought there would be a lot of people saying “oh man, I just want MySQL instead of this new thing”. I barely heard that, and instead heard more thoughts along the lines of “It is great to be able to use the scalable database that Google uses internally.” In fact, when you start using it and see that it is schema-less, you get a bit of a relief. You can build your model, and even use an Expando to be highly dynamic on the data in the backend. You go along your way, iterating on your code and model and you don’t have to spend time working on up and down migration methods. Doesn’t that remind you a little of the OODBMS dreams? But this time it is fast and scalable!

Resting on the Couch

With the interest in Bigtable via App Engine pushing thought, we also have CouchDB pushing from the other end. The end that says, what would a RESTful approach to a database be?

Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API.

JSON built in. JavaScript right there. A database built for the Web?

It is great to see new ideas and thought about the storage of data. The RDBMS isn’t going anywhere of course. There are still a ton of tools out there for it and legacy code, and we all know that:

Data stays where it lies.

It is much easier to implement a new application talking to the old datastore, than migrate the datastore itself. It is like taking out the foundation. Also, SQL is getting new life in places too.

SQLite

I recently saw an application that used GWT on the client, and JavaScript on the server, which reminded me of my comic above. I wonder if we may end up with another flip, having SQL being used in the client, and other systems like CouchDB, Bigtable, etc being used in the enterprise / on the server.

It is happening on the client. SQLite seems to be everywhere. Your operating system, phone, browser, applications, everywhere. I bet I have around 20 SQLite engines on my system right now, and growing. Why is this happening? Well, instead of coming up with your own data format, parser, and search engine, why not just use SQLite and be done. It is very faster, perfect for single user mode, so everyone is a winner.

So, SQL has a looooong future ahead of it, but it will be interesting to see how the RDBMS weathers the latest storm.

What do you think?

Apr 16

Gears Future APIs: Database 2.0 API meshes with HTML5 Storage API

Gears, Tech with tags: , , 4 Comments »

Aaron Boodman wrote a fantastic post on Gears and Standards which I am very passionate about myself.

In it he talks about the HTML5 Storage API and how we are all working together to unify the database access semantics.

You can see the Database 2 API which aims too:

  • Enable Javascript developers to easily write code that works with both Gears and browser database APIs
  • Reduce developer “mind-print” by implementing the same API that is available in browsers
  • Support the proposed HTML5 database standard with an implementation available for all browsers that Gears supports
  • Implement an asynchronous API that can be called from the UI thread without freezing the UI
  • Implement a synchronous API to simplify usage inside workers
  • Implement a thread pool abstraction that can be used in other modules for asynchronous operations (bonus)
  • build a new module from scratch using the new Dispatcher model (bonus).

It would allow you to write code such as:

var dbman = goolge.gears.factory.create('beta.databasemanager');
var db = dbman.open('pages', '0.0.1.0',
  'Collection of crawled pages', 3000000);
 
function renderPageRow(row) {
  // insert page row into a table
}
 
function reportError(source, message) {
  // report an error
}
 
function renderPages() {
  db.transaction(function(tx) {
    tx.executeSql('CREATE TABLE IF NOT EXISTS Pages(title TEXT, lastUpdated INTEGER)', 
      []);
    tx.executeSql('SELECT * FROM Pages', [], function(rs) {
      for(var i = 0; i < rs.rows.length; i++) {
        renderPageRow(rs.rows.item(i));
      }
    })
  })  
}
 
function insertPage(text, lastUpdated) {
  db.transaction(function(tx) {
    tx.executeSql('INSERT INTO Pages VALUES(?, ?)', [ text, lastUpdated ],
      function() {
        // no result returned, stub success callback
      },
      function(tx, error) {
        reportError('sql', error.message);
      });
  });
}

There is a full technical design, so you can get involved and take part in the open source / open community process that we have going on in Gears land.

I will again end with my visualization of the zipper :)

Feb 05

Google Gears Database API on the Server

Gears, Google, JavaScript, Tech with tags: , , 2 Comments »

As soon as I started to play with Aptana Jaxer, I saw an interesting opportunity to port the Google Gears Database API (note the Gears in the logo!)

If I could use the same API for both client and server side database access, then I can be enabled to do things like:

  • Use one API, and have the system do a sync from local to remote databases
  • If the user has JavaScript, use a local database, else do the work remotely
  • Share higher level database libraries and ORMs such as Gears DBLib for use on server side data too

I quickly built a prototype to see if this would all work.

The Jaxer shim of the Gears API was born, and to test it out I took the database example from Gears itself and made it work.

To do so, I only had to make a few changes:

Run code on the server

I changed the main library to run on the server via:

<script type="text/javascript" src="gears_init.js" runat="server"></script>

I wrapped database access in proxy objects, such as:

function addPhrase(phrase, currTime) {
  getDB().execute('insert into Demo values (?, ?)', [phrase, currTime]);
}
addPhrase.proxy = true;

This now allows me to run the addPhrase code from the browser, and it will be proxied up to the server to actually execute that INSERT statement.

This forced me to separate the server side code from the client side code, which is a better practice anyway, but it does make you think about what goes where. In a pure Gears solution I can put everything in one place since it all runs on the client.

Create the new gears_init.js

A new gears_init.js acts as the shim itself. Instead of doing the typical Gears logic, it implements the Gears database contract. This wasn’t that tough, although there are differences between the Gears way, and the Jaxer.DB way. The main difference is to do with the ResultSet implementation, where Gears goes for a rs.next()/rs.field(1) type model versus the Jaxer.DB rs.rows[x] model.

I actually much prefer Gears DBLib as it hides all of that, and just gives the programmer what he wants… the rows to work on.

oncallback magic

In the current Jaxer beta, I ran into an issue where I wanted the Gears library to just “be there” for any proxy requests.

You have to think about the lifecycle of a Jaxer application, and the documentation tells you what you need to know to work around the issue.

In this case, I wrapped the code in:

function oncallback() {
  // create the wrapper here
}

This is less than idea, and Aptana is playing with nice scoping which would enable you to just say “hey, load this library once and keep it around for the lifetime of the server | application | session | page”. That will be very nice indeed.

You can do a little bit of this by opening up your jaxer_prefs file and adding the resource for your file:

// This option sets up an html document that will be loaded
// everytime a callback is processed.  This has to be a local file.
// If not specified, an empty document will be loaded.
// pref("Jaxer.dev.LoadDocForCallback", "resource:///framework/callback.html");

Future…

This is just the beginning. As I mentioned at the beginning, I am interested to see where you can take this to handle clients who do not support JavaScript, and also to deal with synchronization with minimal code (sync from local to remote with exactly the same SQL API).