Mike kindly started the presentation with a consuming warning, letting us know in advance that he was going to be pimping JIRA (because this was going to be case study-esque).
These days JIRA uses Lucene for “Generic Data Indexing”: Fast retrieval of complex data object. This isn’t about text searching for “dog” sorted by relevance. The statistic pages all come back from a Lucene index, not from the DB.
Lucene has a way for you to write your own Sort routines via
I have seen the “viral Lucene” pattern apply in a variety of projects. You start out using it for /search, and then you see that you can use it for other things. Slowly your DB is doing less, and your Lucene indexes are growing. This is a killer open source project, even if the API is a little weird.
Hadoop: Open Source MapReduce
I had a couple of people ask “why Google hasn’t open sourced our MapReduce?” They didn’t know about Hadoop:
Hadoop is a framework for running applications on large
clusters of commodity hardware. The Hadoop framework
transparently provides applications both reliability and data
motion. Hadoop implements a computational paradigm named
map/reduce, where the application is divided into many small
fragments of work, each of which may be executed or reexecuted
on any node in the cluster. In addition, it provides a
distributed file system that stores data on the compute nodes,
providing very high aggregate bandwidth across the cluster. Both
map/reduce and the distributed file system are designed so that
node failures are automatically handled by the framework.
The intent is to scale Hadoop up to handling thousand of
computers. Hadoop has been tested on clusters of 600
Hadoop is a Lucene sub-project
that contains the distributed computing platform that was
formerly a part of Nutch. This
includes the Hadoop Distributed Filesystem (HDFS) and an
implementation of map/reduce.
For more information about Hadoop, please see the Hadoop wiki.
The great efforts of Christophe Bisciglia of the open source group revolve around UW classes where Hadoop is used in the curriculum.