Jan 05

Skynet: MapReduce in Ruby

Ruby, Tech with tags: 3 Comments »

Adam Pisoni of Geni.com has released a Ruby Gem of a new library skynet (have to love the name!), which is a Ruby implementation of MapReduce (not a wrapper on Hadoop or anything like that):

With Skynet, one can easily convert a time-consuming serial task, such as a computationally expensive Rails migration, into a distributed program running on many computers.

Skynet is an adaptive, self-upgrading, fault-tolerant, and fully distributed system with no single point of failure. It uses a “peer recovery” system where workers watch out for each other. If a worker dies or fails for any reason, another worker will notice and pick up that task. Skynet also has no special ‘master’ servers, only workers which can act as a master for any task at any time. Even these master tasks can fail and will be picked up by other workers.

In general:

Skynet works by putting “tasks” on a message queue which are picked up by skynet workers, who execute the tasks, then put their results back on the message queue. Skynet works best when it runs with your code. For example, you might have a rails app and want some code you‘ve already written to run asynchronously or in a distributed way. Skynet can run within your code by installing a skynet launcher into your app. Running this skynet launcher within your app guarantees all skynet workers will have access to your code. This will be covered later.

Skynet currently supports 2 message queue systems, TupleSpace and Mysql. By default, the TupleSpace queue is used as it is the easiest to set up, though it is less powerful and less scaleable for large installations.

If you are in Rails-land, you get some nice additions to ActiveRecord such as a distributed find:


and send_later:

model_object.send_later(:method, options, :save)

I can’t wait to see people implementing Terminators ;)