Tofu (thetofu) wrote,

BOSH Cloud

We all know that ejabberd is cool and erlang is extremely scalable. Facebook even decided to use erlang for their new chat system! Even with all of that, I want to write about an alternative. :) I am not gonna compare numbers and benchmarks. This post will lack data and statistics for all of that. I am just gonna describe a system thats implemented, in production, and works well. It is the story of the BOSH Cloud.

First, XMPP is a standard protocol for presence based applications, namely IM chat. It is growing in use, even as I type this blog entry. Web based chat can be done using XMPP with some help. The help comes from an extension to the XMPP standard called BOSH. BOSH solves the problem of implementing something that requires state, over a stateless protocol like HTTP. (Ejabberd has a BOSH implementation) Anyway, the question is, 'what is a BOSH Cloud?'

The answer is that it is a set of Amazon Ec2 instances used to provide a scalable BOSH connection manager. You have a HTTP load balancer up front and you can create as many BOSH instances based on your scaling needs on the back end.

To create one of these you will need the following:

Ability to create Amazon Elastic Compute Cloud (ec2) images and instances.

What we have here is a recipe to implement a BOSH connection manager and do simple round robin scaling. Ngnix provides the load balancing and Punjab provides the BOSH implementation. To scale, you run them on Amazon's elastic compute cloud.

To start, you will need an amazon web developers account and know a bit about ec2.

This implementation starts with a basic debian image. Installed on this image is Python, Twisted, Nginx, and Punjab. There is plenty of good documentation on how to build an ec2 image on amazon's web site. Once we have the applications and their dependencies built we will need to configure them before we build a new image.

Lets start with what is up front, Nginx. This will be our simple round robin load balancer. The configuration will look something like the following : (values will vary based on your needs)
user www-data;
worker_processes  4;

error_log  /var/log/nginx-error.log;
pid        /var/run/;

events {
    worker_connections  4096;

http {
    include       /usr/local/nginx/conf/mime.types;
    default_type  application/octet-stream;

    access_log  /var/log/access.log;

    sendfile        on;
    proxy_read_timeout 300;
    keepalive_timeout  65;
    tcp_nodelay        on;

    gzip  on;
    # ... some other stuff

    upstream punjab {
        ip_hash; # needed to make sure ips stay on the punjab server used by the connection
        # .... and so on

    server {
        listen       80;
        server_name  localhost;

        # ... other config options 

        location /bosh {
               proxy_set_header Host $http_host;
               proxy_set_header X-Real-IP $remote_addr;
               proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
               proxy_pass http://punjab;

Nginx could also serve static files or your chat client. Nginx has a very small foot print and is fast. The main thing about this configuration example is the nginx load balancing. NOTE: you will NEED the ip_hash directive in order to have the same ip use one punjab. Otherwise it will be disconnected.

Next we configure punjab with the basic .tac file that comes with punjab.

# punjab tac file
from twisted.web import server, resource, static
from twisted.application import service, internet

from punjab.httpb  import Httpb, HttpbService

root = static.File("./html")

#b = resource.IResource(HttpbService(1, use_raw=True))
b = resource.IResource(HttpbService(1))
root.putChild('/bosh', b)

site  = server.Site(root)

application = service.Application("punjab")
internet.TCPServer(5280, site).setServiceParent(application)

You may also want to use dynamic DNS and set your punjab domains when instances start up. The following script will do that.

USER_DATA=`/usr/bin/curl -s`
# hostname is the first available value
HOSTNAME=`echo $USER_DATA | cut -f 1 -d , | cut -f 2 -d =`
hostname $HOSTNAME

MYNAME=`/usr/bin/curl -s`
cat<<EOF | /usr/bin/nsupdate -y ddnskey:yourddnskey
update delete $ CNAME
update add $ 60 CNAME $MYNAME

Now that those are configured, you can build your image. Once you have the images, the magic can happen. :)

So first we start an instance and run nginx.

ec2-run-instances -d "hostname=www"

This will start up one instance and if the script above is executed it will take the user data given and create.

Which will load balance to the punjab instances.

You can have nginx start up on boot with the above configuration and you are almost there.

Start up the Punjab instances.
for i in 0 1 2 3; do
   ec2-run-instances  -d "hostname=punjab$i" 

This will start up 4 Punjab Amazon instances and if your server start up scripts run punjab on boot, you will be up and running. You can scale as you like by starting up new instances or taking down old. You can have a script to reconfigure nginx and then HUP it to add the new instances to the load balancing pool.

I will note that there are things left out, namely the Javascript BOSH client. You are welcome to leave questions in the comments or leave them as exercises for your enjoyment. :) That is it for a basic BOSH setup using Amazon ec2 and python or a "BOSH Cloud" . You can also do this with ejabberd and erlang.
Tags: bosh, erlang, jabber, punjab, python, scaling, twisted, xmpp
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic