Thursday, December 15, 2011

Flask and Mongodb on PyPy (Benchmark results Twisted vs Apache mod_wsgi)

If you are reading this on the Pygrunn website I advice you to head over to our Blog for better formatting of this post.

We (at Paylogic) have been working on a new big project using Flask and Mongodb for a couple of months now when we stumbled on problems running the Flask application using Apache's mod_wsgi. At some point we were getting strange Exceptions which were caused by 'Multiple Python Sub Interpreters and C extension modules' as described here. The problems did not occur if we used pymongo without the C extensions enabled but there was of course a performance loss (so we were told).

When I hear run a python library without C extensions my mind immediately brings up PyPy and fortunately the only C extension we were using for the project was required by pymongo which we now did not use anymore. At this point I decided to give PyPy a try. This post describes how we got everything working using PyPy and some performance measurements and the outcome of a couple of simple benchmarks.

Using PyPy


We use Virtualenv and you should too :P all kidding aside, I did not have any troubles getting my project to work with PyPy using Virtualenv so I just kept using it. The following describes how to use PyPy together with Virtualenv.
  1. Download PyPy from the website or build the source.

  2. Download and install the enhanced Virtualenv from: https://github.com/pypa/virtualenv

  3. Create a new virtual environment and activate it:

    $> virtualenv -p /path/to/pypy my-venv
    $> source my-venv/bin/activate

  4. If you are using Mongoengine, you need my tiny patch here, install it first:

    (my-venv)> pip install \
    https://github.com/metzlar/mongoengine#egg=mongoengine

  5. Now install your project. We use a lot of Flask extensions and they all seem to work fine with PyPy:

    (my-venv)> pip install \
    Flask-mongoengine Flask-Celery Flask-Babel .....

  6. And now run your tests, they should all pass ;-)

Webserver


Apache and mod_wsgi can not use PyPy since mod_wsgi has its own (CPython) interpreter build in (or linked?). Time to go search for another web server. We already used Nginx in front of Apache so now we just needed to find a Pure Python Webserver to run instead of Apache. This post gave a nice overview of what was available.

I decided to go with Twisted Web first since it seemed easy to install and it supported PyPy according to the PyPy compatibility wiki. I learned from Twisted it is very easy to switch between webservers so no big deal for now. To install Twisted do:

(my-venv)> pip install twisted

Now you can test your wsgi application by invoking:

(my-venv)> twistd web \
--wsgi=fully.qualified.path.name.to.your.wsgi.application

Performance


So with this all set up, it was time to do some performance measuring. For this I wrote a very small script that called our web application using urllib2 and did the following:
  1. Create a new user (POST /user)
  2. Login with the new user (POST /user/login)
  3. Change the user's profile (POST /profile)
  4. Retrieve the profile (GET /profile)
  5. Retrieve a status page (GET /status/mongo)

As you can guess all steps involved at least a read from the database and most also changed mongo documents.

Setup


I setup 2 small instances in Amazon EC2 on which I got a vanilla Mongodb 2.0 running. One instance would get Apache and CPython while the other would get Twisted and PyPy, to be precise.

C-instance
  • Apache (mpm-worker, threaded) with mod_wsgi
  • CPython
  • Mongodb
Py-instance
  • Twisted web
  • PyPy
  • Mongodb

The machines started out as identical clones with Debian 6 and minimal required packages installed.

Outcome


We ran the script in a loop and forked it to be executed in parallel using bash:

$> x=1; while [ $x -le 10 ]; do (python script.py &); \
x=$(( $x + 1 )); done


The above runs 10 concurrent processes in the background.

The following tables describe the time it took for the script to make one run (create + login + edit + ..etc). It shows the average duration in seconds of the full benchmark but also the average duration in seconds of the fastest 50% of each run per amount of concurrent benchmark scripts. The median and standard deviation of the whole benchmark are included to give the reader an idea of the correlation.

We tested different benchmark lengths by changing the amount of runs but there was no significant change to the results when using lenghts ranging between 10 and 1000 runs. All results displayed below were gathered with 20 runs per benchmark.

C-instance
No. of scripts AVG duration AVG of smallest 50% Median Std. dev.
2 2.1961285472 2.16111454964 2.23114254475 0.035013997555
5 4.24873378039 4.07180131078 4.28083276749 0.199116645764
10 8.19694006443 7.75776187658 8.38324973583 0.561580396311


We observed greatly increasing load (up to 20) on the instance when we increased the amount of concurrent running scripts so we immediately concluded a small AWS EC2 instance could not deal with more then 2 simultaneous requests making this setup not suitable for production ;-) This also explains the increasing duration and deviations in the table above. No Exceptions were raised or errors could be found during the benchmark though.

Py-instance
No. of scripts AVG duration AVG of smallest 50% Median Std. dev.
2 1.19314089417 1.19220632315 1.1940754652 0.000934571025
5 2.44566946268 2.40679194927 2.46148204803 0.034066457154
10 4.86752203584 4.83535173416 4.88977667093 0.0368080682422


The Py-instance seems quite faster heh ;-)

Identical increases in load were observed so we concluded again this setup was not suitable for production. Still, Twisted seemed to perform better on higher loads since the deviation kept on being quite small and averages and median close together while we increased the amount of concurrent requests, which is identical to the amount of concurrent benchmark scripts.

Conclusion


We decided to keep on using PyPy for now. From our tests we concluded PyPy and Twisted perform about 80% faster over CPython and mod_wsgi. Of course we will have to re-benchmark when we start setting up the real production environment using separate machines for the mongo replicasets, celery workers and webservers and measure CPU, IO and memory usage, but for now it seems convenient to keep using PyPy. If necessary, it feels more easy to switch from PyPy to CPython then the other way around ;-)

Then there is always the option to change webserver implementations. I already executed the identical benchmark against Gunicorn using various settings but durations were a bit higher then with Twisted. Did you benchmark pure python webservers on PyPy? Let us know!

6 comments:

  1. Standard deviations: a rare sight in benchmarks like this! Good on you!

    ReplyDelete
  2. You might want to consider trying a similar benchmark in cylone using its bottle implementation:
    https://github.com/fiorix/cyclone/blob/master/demos/bottle/bottle.py
    with the txmongo database adaptor
    https://github.com/fiorix/mongo-async-python-driver
    example:
    https://github.com/fiorix/mongo-async-python-driver/blob/master/examples/webapps/cyclone_server.tac

    This would be a complete twisted async implementation and *should* therefore get better throughput.

    Also, are you running mongo on EC2? I think mongo loves IO and some EC@ configs that can be pretty slow

    ReplyDelete
  3. @James Casbon, at this moment we use EC2 for development only but we are seriously thinking to use it for production too. We have to investigate mongod performance on EC2 though....

    The "cyclone using its bottle implementation" seems interesting, we will check it out soon. Thanks!

    ReplyDelete
  4. Your tables are very hard to read with so many digits. A general rule would be to round the uncertainty or standard deviation to two significant digits, and then round results to the same number of decimal places. More digits is just noise, both numerical and visual.

    ReplyDelete
  5. The pymongo documentation is actually wrong. It is NOT 'due to the way mod_wsgi handles module reloading with multiple sub interpreters'. It is due to how the pymongo C extension is implemented. The problem which they link to on the mod_wsgi site is caused when a C extension doesn't properly separate data created at C level from different Python sub interpreters and mixes the data across sub interpreters. This is a violation of how you are meant to use the Python C API. They could fix this in pymongo C extension if they choose to, but seems they still want to say it is a mod_wsgi issue even though has been pointed out in the past that it isn't. The workaround for avoiding the problem in pymongo C extension is easy and you don't have to stop using the C extension. Not sure why you are taking the option of not using the C extension rather than just configuring mod_wsgi differently if not using the C extension is what you are doing.

    ReplyDelete
  6. Why not replace Apache with Twisted as well and do a real apples-to-apples CPython vs Pypy benchmark?

    The results here are fairly useless since we have no idea what is the bottleneck, the web frontend, or some component of the python stack you're using...

    On commenter praised you for publishing std deviations (to 12 places no less), but didn't lament on the whole "only change one thing" nature of doing comparison testing -- shame shame.

    ReplyDelete