Browser storage: Do we need SQL? Or would a JSON approach be better?
Ian Hickson: “I expect I’ll be reverse-engineering SQLite and speccing that, if nothing better is picked first. As it is, people are starting to use the database feature in actual Web apps (e.g. mobile GMail, iirc).”
When I read that comment to Vlad’s post on HTML 5 Web Storage I gulped. This would basically make SQLite the HTML 5 for storage in the browser. You would have to be a little crazy to re-write the exact semantics (including bugs) of SQLite and its dialect. What if you couldn’t use the public domain code?
Gears lead out strong with making a relational database part of the toolbox for developers. It embedded its own SQLite, in fact one that was customized to have the very cool full text search ability. However, this brings up the point of “which SQLite do you standardize on?”
The beauty of using SQL and SQLite is that many developers already know it. RDBMS has been mainstream for donkey’s years; we have tools to manage SQL, to view the model, and to tweak for performance. It has gone through the test of time.
However, SQL has always been at odds with many developers. Ted Neward brought up ORM as the vietnam of computer science (which is going a touch far ;). I was just lamenting with a friend at Microsoft on how developers spend 90% of their time munging data. Our life is one of transformations, and that is why I am interested in a world of JavaScript on client and server AND database. We aren’t there yet, but hopefully we can make progress.
One of Vlad’s main questions is “Is SQL the right API for Web developers?” and it is a valid one. I quickly found that for most of my tasks with the DB I just wanted to deal with JSON and hence created a wrapper GearsDB to let me insert/update/select/delete the database with a JSON view of the world. You probably wouldn’t want to do this on large production applications for performance reasons, but it works well for me.
Now a days, we have interesting APIs such as JSONQuery which Persevere (and other databases) use. I would love to see Firefox and other browsers support something like this and let us live in JSON throughout the stack. It feels so much more Webby, and also, some of the reasons that made us stay with SQL don’t matter as much in the client side world. For example, when OODBMS took off in some Enterprises, I remember having all of these Versant to Oracle exports just so people could report on the darn data. On the client the database is used for a very different reason (local storage) so lets use JSON!
That being said, at this point there are applications such as Gmail, MySpace search, Zoho, and many iPhone Web applications that use the SQL storage in browsers. In fact, if we had the API in Firefox I would have Bespin using it right now! We had a version of this that abstracted on top of stores, but it was a pain. I would love to just use HTML 5 storage and be done.
So, I think that Firefox should actually support this for practical reasons (and we have SQLite right there!) but should push JSON APIs and let developers decide. I hope that JSON wins, you? I also hope that Hixie doesn’t have to spec SQLite :/
Related
It was also interesting to just read this post Abusing Web Storage via Sam Ruby:
Alberto Trivero: The aim of this white paper is to analyze security implications of the new HTML 5 client-side storage technology, showing how different attacks can be conduct in order to steal storage data in the client’s machine.
April 10th, 2009 at 12:01 pm
Well, have you seen CouchDB?
It’s exactly the kind of storage you want to see inside a browser, it’s javascript-based and it’s tiny!
See http://couchdb.apache.org/ , the JSON document-oriented storage.
April 10th, 2009 at 12:39 pm
I would suggest that before the non-relational approach is taken, someone actually architect a real application, like an offline GMail with 10,000 messages stored locally to make sure this stuff performs adequately.
I agree that the requirements are different, and you might be able to get away with a BigTable-eqse or btree store with indexing on the columns, but the only way to really hash out these specs is to build real applications, and not simply do things because they ‘feel webby’. Sometimes stuff that feels good can still suck.
April 10th, 2009 at 1:34 pm
I think RDBs are probably not the best choice for browsers. Rather than one huge JSON document, something similar to CouchDB might be nice. (Document oriented, single key per document, pre-calculated views, etc.) It is a very different model from RBDs. It might be too different, but the fact that people are not use to having RDBs in their browser it might make acceptable. It has the advantage of having a small interface surface area compared to SQL.
April 10th, 2009 at 1:36 pm
There are two layers to this issue. Layer one is the developer interface/language (i.e. SQL or JSONQuery). Layer two is the storage implementation (i.e. SQLite, object databases, etc.) that the browser vendors adopt to provide the interface. Choosing JSONQuery as the interface would not rule out an internal SQLite storage implementation (not that I would architect it that way). Shouldn’t languages and their implementations be considered separately, with the language taking priority so that the outcome is better for developers that ultimately use the end product?
April 10th, 2009 at 2:14 pm
Yes. JSON would be great. Ideally JSON + a fast JSON query language. A small big table.
April 10th, 2009 at 10:48 pm
SQL is good for other reasons as well. As far as I know, the relational schema and the querying techniques are by far the best abstraction for any form of textual data. Since storage on the browser will essentially target (at least for another two years) this form of data, its better to let SQLite be the default on HTML5. We could may be write libraries to extend a JSON interface to it.
April 11th, 2009 at 1:45 pm
I heartily agree, and am (somewhat) actively working on an ORM style tool to wrap SQLite with in the chrome://URI space: https://bugzilla.mozilla.org/show_bug.cgi?id=394732
Of course it would also be nice to have a library like this in content space.
I think having SQL “under the hood” is a good thing – especially if you are trying to complement an online application offline and need to use the same schema/relations.
April 11th, 2009 at 11:11 pm
Also, couchdb have a tiny spec, compared to the monstrous task of replicating sqlite behavior, including bugs in a standard.
April 12th, 2009 at 2:16 am
I would really like to have js-only (as in http-only) cookies with well defined storage availabilities and a sane not-O(n) getter/settter-API.
I don’t think that the argument holds that RDBMs are good because people already know how to work with them. The HTML5 database API really is a new programming paradigm for many programmers because all operations are fully asyncronous. This makes buildings things like ORMs even harder because now you also have a method call impedance mismatch.
What has worked really well for me too within the Joose world was to simply serialize objects to JSON and and simply deserialize them to JS objects on the next page view (with a jsonpickle-like protocol)
http://joose-js.blogspot.com/2008/08/automatic-serialization-and.html
This works quite well for non-so-large datasets that do not need concurrent access from different browser windows (OK, thats a small subset of applications)
With respect to bespin: I have a near 100% compatible HTML5 on Gears API ready. If we need it we could finally add a Joose dependency to bespin and use it or I could spend the 5 minutes on porting it to native JS :) Together with our WorkerFacade it would probably really easy to make this actually async under Gears. This would enable us to do DB operations on the main page without having to worry about UI block under Gears.
April 12th, 2009 at 4:18 pm
The thing that matters about CouchDB in this deployment is the offline replication. With Sql tools you have to manage sync in your application, with CouchDB you get it for free.
April 13th, 2009 at 4:01 pm
SQL has done nothing to make it easier to sync unless you are willing to pay $$ to license complex sync software. Why can’t we make simpler sync software for Web? Well the answer to that question lies in our ability to simplify data access abstractions in Web applications. Sync/offline can be made easy, cheap, and automatic if the following are acceptable:
1. Identify data through URL
2. Operate on data with HTTP methods
3. Provide JavaScript interception on XHR (just like Apple provides Objective-C interception on NSURLRequest and URLMON provides Asynchronous Pluggable Protocol).
Read more about this and related work at Oracle on my Web site http://o-micron.blogspot.com
April 14th, 2009 at 6:59 am
With JSON in the browser and JSON on the wire, JSON in the database is an obvious requirement! Up to now, my REST APIs mostly converting DTOs from Java/Python/etc. to JSON and vice-versa, after the translation between SQL/GQL/etc. and programming languages. So much time and CPU wasted :(
Will JSON be the final format? I hope not because I would like to start carrying enhanced messages, a-la RDF (http://en.wikipedia.org/wiki/Resource_Description_Framework) with triples for example…
April 14th, 2009 at 7:14 am
I think that SQLite is the way to go.
SQLite has a rock-solid storage engine under the covers. They have integrity checks and rollback journals. It is very hard to corrupt the SQLite data file. Any other solution would have to do a lot of work to get the same ACID characteristics that SQLite has.
Back in the dark ages before the internet, but after flushing toilets, simple databases with the same philosophy as CouchDB, SimpleDB, BigTable, Berkeley etc.were the norm. Over time, people discovered that they were writing reams of code because the databases did not have the building blocks to avoid various kinds of data duplication, and the duplicated data had to be kept in sync, and kept searchable. Eventually people like Codd and Chen characterized the styles of data duplication and figured out how to avoid it through the various normal forms for program data. They also developed SQL as the building blocks for describing and manipulating that data. SQL allowed applications to become much more powerful and feature-rich without having to write proportionally more code. I have seen various data query languages come and go. Most came to reduce the “impedance mismatch” between the object world and the relational world. They go because eventually the owners realize that it would be too much work to give the customers the SQL that they really want.
As a side note on query languages, I have been watching the Ruby On Rails/ActiveRecord language. ActiveRecord was originally designed as a object-relational-mapper so that the designer could avoid the complexity of SQL. However, with each subsequent release of Rails, they have put in more and more features into the ActiveRecord query language that are rip-offs from SQL. I can’t wait until Rails 6 when they admit reality and support inline SQL. :) I have to give them credit though; they do provide an escape hatch to SQL if their query language is not powerful enough.
On performance, most PCs have enough horse-power to cough up a few cycles to the SQLite VM. I know that when our old company (Adscape Media) built the in-game advertising engine we put SQLite into the client-side code (linked into the game software), and there were no performance issues that caused our acquirer (Google) to yank SQLite in favour of JSON, XML, or whatever. If SQLite can be the client-side backing store for a video game, I would say it is fine for a “thick” web app.
On the “public domain” issue, if you want a license for SQLite, Hwaci software (co-owned by Dr Hipp) will be happy to provide you a license for a small fee.
April 14th, 2009 at 10:16 am
You can write a good generic JSON/XML ORM layer on top of SQL. I’ve helped develop quite a nice one and there are a hundred good reasons to do this in preference to using a heirachical db on the server side.
…but can someone tell me why you wouldn’t use strictly hierachical storage on a *client side* database? After all the client is going to be dealing with the data purely in terms of its own object hierachies. There aren’t going to be any of the sort of curve balls thrown at it that you get in a “bigger” SOA type environment.
April 15th, 2009 at 8:31 pm
I used JSON and implement query syntax like SQL:
create data: http://vnjs.net/?id=1000000069
insert: http://vnjs.net/?id=1000000077
update: http://vnjs.net/?id=1000000079
delete: http://vnjs.net/?id=1000000075
select: http://vnjs.net/?id=1000000078
……
and select HTML element from DOM:
http://vnjs.net/?id=1000000082
end link to download my framework: http://vnjs.net/www/src/kombai.rar
June 2nd, 2009 at 10:40 am
Hi, Dion,
Would you share this talk infor by Vlad with your folks:
http://silicon-valley.siggraph.org/next.html
“Graphics on the Web: Going Beyond Images and Rectangles”
It’s on June 11(next Thursday) at Apple.
Thanks
-Sharon
February 10th, 2010 at 12:33 am
This would be really nice. Combined with the Gears Desktop API, you could open a shortcut to an online word processor with login happening behind the scenes and perceive very little difference with opening a shortcut to a Word document on your desktop….
May 3rd, 2010 at 8:34 pm
What the different MySQL and SQLite?