Microsoft is Thinking in Flex; I like to read my feeds though Speed Up! with Wordpress and Gears
May 26

Twitter is getting a Mom

Tech with tags: , Add comments

Twitter MOM

@al3x told us about the Twitter architecture, and what says it all is:

Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency’s sake, Twitter was built with technologies and practices that are more appropriate to a content management system. Over the last year and a half we’ve tried to make our system behave like a messaging system as much as possible, but that’s introduced a great deal of complexity and unpredictability. When we’re in crisis mode, adding more instrumentation to help us navigate the web of interdependencies in our current architecture is often our primary recourse. This is, clearly, not optimal.

Our direction going forward is to replace our existing system, component-by-component, with parts that are designed from the ground up to meet the requirements that have emerged as Twitter has grown.

Amid the huge number of “oh no twitter is down. make it faster!” posts, we have some good ones.

The answer isn’t “Use PHP” ;)

If I was Twitter I wouldn’t be looking for Erlang as the answer, but I would be interested in talking to Joe Armstrong. I wouldn’t jump to Java as the answer, but I would be reaching to talk to Cameron Purdy of Tangosol Coherence (now Oracle). These people have seen systems that make Twitter look like a toy in comparison, and it is the knowledge that is more valuable than any technology.

If you think about contorting a typical LAMP stack to run Twitter you quickly shudder. Having a database layer, even with master/slaves, is scary.

Twitter needs a Mom, and it looks like it is finally getting one. With true message oriented middleware, and money to get the systems they need, they should be fine. As Cedric says this isn’t an original problem.

The system of messages shouldn’t be living in bottleneck databases. Instead, they can be flowing through a river of distributed cache systems. The Jabber side of the house shouldn’t be able to “bring down” the entire website. The beauty of publish subscribe and messaging is that you can throttle things nicely. You shouldn’t be “running out of database connections.” You can tune the number of listeners for the “track” command for example, and if it is getting abused you limit its resources. Sure, this may mean that you get messages to people a little later, but who cares. If messages got a little slower would people even realise? Compare that to the birds lifting the animal to safety message.

In fact, if you think about systems such as the stock exchange. You will realise that you rarely get truly real-time access. Most of the time you are on a delay of some kind, and that is fine. Through the distributed caching architecture you can push out messages to other systems to do their work. One of these systems will be the website itself. is just another client to the messaging backbone. Even if the backbone is in trouble, the website can still show the current view of the world, and could even still batch up work to be done.

I was talking to another startup that is migrating away from a database backed system, and soon the entire real-time world will be in a huge distributed cache. I am sure that Twitter will be moving there too.

Currently, I still feed bad for the engineers. I have been there; The point where you are the limits of your current architecture and you know it can tank at any time. You are firefighting all day and night, and thus don’t even have much time to fix anything at all. It is hard work. It is tiring work. It is demoralizing work.

However, I know that Alex and the rest of the crew will pull through their current situation, which after all came about thanks to the amount of love that its users have for the service, and one day the new architecture will be there in a way where we will look back and remember the early days, where downtime was such an issue.

Thanks for all the hard work guys. I can’t wait to be tweeting on a fully loosely coupled architecture, talking to one of your Moms!

8 Responses to “Twitter is getting a Mom”

  1. Ray Cromwell Says:

    There, someone finally said it, amid the tons of opinion blog pieces about how difficult it is to scale or decentralize Twitter. :)

    It’s not really about programming languages, yet at the same time it is. Twitter is broke because it had the wrong design. However, I do believe the choice of tool encourages the design.

    People use Ruby on Rails because it makes writing database backed sites a joy. RoR, as a platform, therefore, tends to produce database centric designs.

    Erlang, for example, is fine tuned for event driven messaging, and therefore, if someone builds something with Erlang, it’s highly likely to leverage Erlang’s strengths.

    RoR brings the allure of quick-and-cheap development, but at the same time, it encourages hacking from the bottom up, rather than design from the top-down. Granted, there are many problems with top-down design as well, but there’s a middle ground. I

    Theoretically, Twitter could keep RoR around for a lot of the Web oriented form stuff, and replace the underlying messaging and track stuff with a pub/sub system with distribution and push updates.

    The only advantage to using Java is that there are tons of high performance, feature filled MoMs. Mature fail over, transience, persistence, distribution/federation, management, etc. Plus, mature APIs for designing and deploying MoM components, you’ve got JMS, Message Driven Beans, and ESB.

    For that reason, if I were CTO of Twitter, and looking to fix things in a hurry, I might buy either a top JMS MoM, or Coherence, and combine it with Java or Erlang. There is likely to be a push for Memcached usage, but Memcached doesn’t come close to Coherence in capability. Hell, IIRC, Coherence allows you to run complex cache queries and filters across the cache servers without pulling down the data.

    What I find confusing is how many people think Twitter is doing something unprecedented, as an excuse for the downtime. Anyone who has worked with real time stock feeds knows the message rate dwarfs Twitter easily, and the queries/processing that people run against the data are also more complex than Twitter. As you point out, the way it’s handled is by replication.

    One of the most charming things about email, SMS, and IM is the asynchronous nature of it. You usually don’t know how long it took for the message to get to you (unless you look at header time stamps), and you don’t care, unless it arrives spectacularly late. Asynchronous expectations are Twitter’s friend for scaling here.

  2. ahah Says:

    Are you sad that Java already has readily available solutions for twitter like scalability and performance disasters (AKA ruby)?

  3. Ray Cromwell Says:

    Java is unhip in the Web 2.0 market. :) s/Scala/Java and some of the cool people will like it. :)

  4. Nati Shalom Says:

    Twitter is, fundamentally, a messaging system. Twitter was not designed as a messaging system, however. For expediency’s sake, Twitter was built with technologies and practices that are more appropriate to a content management system.

    This is really the main point i.e. event driven architecture lend itself for scaling which means that with the proper design we wouldn’t be talking about twitter scaling in the first place as i already wrote in recent post about twitter scaling:

    Now saying that is too simplistic IMO as what we have here a combination of events (messaging) and complex matching (data) – if you choose to use message driven design you’ll probably end up with complexity of managing state i.e. messaging doesn’t provide means for dealing with concurrent update, consistency, affinity etc. If your going to build your system with database centric design your going to end up writing lots of twiking to deal with event correlation, subscription model etc.

    I believe that the separation between messaging and data in the first place is one of the bigger fallacies in distributed architecture i.e. i believe that messaging and data are the same and are always tightly coupled. In many cases you trigger a message to update a state change and when you update something you need to trigger various actions as a result of that update. Dealing with it by writing messaging and database at the application level often lead to complex architecture and non scalable design . Just imagine what it takes to do a simple task like sending a message and updating two or more entities of that change – we need to send an event to messaging queue, read it update the database, to avoid paritial failure we need to use transaction between the messaging systems and databse, to ensure high availability we need to maintain two separate clusters, to disciminate that update to multiple subscribers we need to perform reverse matching (based on the subscription) and send event to the relevant subsribers, normally this involves another broker service which is yet another tier in our already complex environment. Nither messaging or database centric design address well this type of associated networking.

    I faced this problem in the past when i was involved in building of a b2b exchange. It was that realization that led me to TupleSpaces and to build a solution around JavaSpaces – JavaSpaces provides a mean to deal with distributed state (data-grid/caching) and messaging (events..) in one simple technology using only 4 verbs (write,read,take and notify).
    JavaSpaces enables you to share state, subscribe to changes in that state, query the data etc.

    A Space Based Architecture is a design pattern that was built to provide a solution for scaling out of stateful event driven applications using a combination of pattern from Messaging, data-grid/caching and parallel processing worlds.

    Roy i believe that data-grid/distributed-caching technologies are better alternative then many of the pure messaging or database solution as it already deals with distributed data management however it is still lacking the messaging end and therefore is not enough.

    A good starting point on Space Based Architecture can be found on wikipedia:

  5. Nati Shalom Says:

    Roy i believe that data-grid/distributed-caching technologies are better alternative then many of the pure messaging or database solution as it already deals with distributed data management however it is still lacking the messaging end and therefore is not enough.

    Sorry clicked the submit button too soon – i meant Ray not Roy.

  6. Ray Cromwell Says:

    I think it depends on whether or not you need a FIFO, or something more sophisticated. What you get with many MoMs is the ability to choose either transient (messages can be lost if server goes down, higher performance, but riskier), or persistent (messages written to persistent storage using a journaling technique) combined with transactions. Many data-grid techniques amount to transient caches with superb reliability. I suppose if you factor in the ability to replay/repush updates from a master, using transient caches would be ok, but this makes the master a potential bottleneck.

    I agree that you could probably make data-grids work, and they have the advantage of being more flexible than MoMs, but recognize, you’ll have to build MoM functionality on top of the grid. I mean, this seems like a slam dunk excuse to use Java to me, simply because tried and true, market proven solutions exist. Otherwise, you have to build everything yourself, which you can do if you’re Google (GFS, BigTable, Map/Reduce, etc), but not everyone has the resources or time to do what Google does.

  7. Tug Says:

    I have to say, since I have worked with some of the Tangosol/Coherence guys, back in my Oracle life, that I am 100% with you on the fact that “experience” IS the key point. We have been discuss about systems and apps that I could not even dream about… In addition, talking about technology, I have been quite excited about the Coherence technology too: “simple and infinite power”

    What about moving Twitter and Google AppEngine ? ;)


  8. bob Says:

    old news

Leave a Reply

Spam is a pain, I am sorry to have to do this to you, but can you answer the question below?

Q: Type in the word 'ajax'