Thursday, December 15, 2011

Flask and Mongodb on PyPy (Benchmark results Twisted vs Apache mod_wsgi)

If you are reading this on the Pygrunn website I advice you to head over to our Blog for better formatting of this post.

We (at Paylogic) have been working on a new big project using Flask and Mongodb for a couple of months now when we stumbled on problems running the Flask application using Apache's mod_wsgi. At some point we were getting strange Exceptions which were caused by 'Multiple Python Sub Interpreters and C extension modules' as described here. The problems did not occur if we used pymongo without the C extensions enabled but there was of course a performance loss (so we were told).

When I hear run a python library without C extensions my mind immediately brings up PyPy and fortunately the only C extension we were using for the project was required by pymongo which we now did not use anymore. At this point I decided to give PyPy a try. This post describes how we got everything working using PyPy and some performance measurements and the outcome of a couple of simple benchmarks.

Using PyPy


We use Virtualenv and you should too :P all kidding aside, I did not have any troubles getting my project to work with PyPy using Virtualenv so I just kept using it. The following describes how to use PyPy together with Virtualenv.
  1. Download PyPy from the website or build the source.

  2. Download and install the enhanced Virtualenv from: https://github.com/pypa/virtualenv

  3. Create a new virtual environment and activate it:

    $> virtualenv -p /path/to/pypy my-venv
    $> source my-venv/bin/activate

  4. If you are using Mongoengine, you need my tiny patch here, install it first:

    (my-venv)> pip install \
    https://github.com/metzlar/mongoengine#egg=mongoengine

  5. Now install your project. We use a lot of Flask extensions and they all seem to work fine with PyPy:

    (my-venv)> pip install \
    Flask-mongoengine Flask-Celery Flask-Babel .....

  6. And now run your tests, they should all pass ;-)

Webserver


Apache and mod_wsgi can not use PyPy since mod_wsgi has its own (CPython) interpreter build in (or linked?). Time to go search for another web server. We already used Nginx in front of Apache so now we just needed to find a Pure Python Webserver to run instead of Apache. This post gave a nice overview of what was available.

I decided to go with Twisted Web first since it seemed easy to install and it supported PyPy according to the PyPy compatibility wiki. I learned from Twisted it is very easy to switch between webservers so no big deal for now. To install Twisted do:

(my-venv)> pip install twisted

Now you can test your wsgi application by invoking:

(my-venv)> twistd web \
--wsgi=fully.qualified.path.name.to.your.wsgi.application

Performance


So with this all set up, it was time to do some performance measuring. For this I wrote a very small script that called our web application using urllib2 and did the following:
  1. Create a new user (POST /user)
  2. Login with the new user (POST /user/login)
  3. Change the user's profile (POST /profile)
  4. Retrieve the profile (GET /profile)
  5. Retrieve a status page (GET /status/mongo)

As you can guess all steps involved at least a read from the database and most also changed mongo documents.

Setup


I setup 2 small instances in Amazon EC2 on which I got a vanilla Mongodb 2.0 running. One instance would get Apache and CPython while the other would get Twisted and PyPy, to be precise.

C-instance
  • Apache (mpm-worker, threaded) with mod_wsgi
  • CPython
  • Mongodb
Py-instance
  • Twisted web
  • PyPy
  • Mongodb

The machines started out as identical clones with Debian 6 and minimal required packages installed.

Outcome


We ran the script in a loop and forked it to be executed in parallel using bash:

$> x=1; while [ $x -le 10 ]; do (python script.py &); \
x=$(( $x + 1 )); done


The above runs 10 concurrent processes in the background.

The following tables describe the time it took for the script to make one run (create + login + edit + ..etc). It shows the average duration in seconds of the full benchmark but also the average duration in seconds of the fastest 50% of each run per amount of concurrent benchmark scripts. The median and standard deviation of the whole benchmark are included to give the reader an idea of the correlation.

We tested different benchmark lengths by changing the amount of runs but there was no significant change to the results when using lenghts ranging between 10 and 1000 runs. All results displayed below were gathered with 20 runs per benchmark.

C-instance
No. of scripts AVG duration AVG of smallest 50% Median Std. dev.
2 2.1961285472 2.16111454964 2.23114254475 0.035013997555
5 4.24873378039 4.07180131078 4.28083276749 0.199116645764
10 8.19694006443 7.75776187658 8.38324973583 0.561580396311


We observed greatly increasing load (up to 20) on the instance when we increased the amount of concurrent running scripts so we immediately concluded a small AWS EC2 instance could not deal with more then 2 simultaneous requests making this setup not suitable for production ;-) This also explains the increasing duration and deviations in the table above. No Exceptions were raised or errors could be found during the benchmark though.

Py-instance
No. of scripts AVG duration AVG of smallest 50% Median Std. dev.
2 1.19314089417 1.19220632315 1.1940754652 0.000934571025
5 2.44566946268 2.40679194927 2.46148204803 0.034066457154
10 4.86752203584 4.83535173416 4.88977667093 0.0368080682422


The Py-instance seems quite faster heh ;-)

Identical increases in load were observed so we concluded again this setup was not suitable for production. Still, Twisted seemed to perform better on higher loads since the deviation kept on being quite small and averages and median close together while we increased the amount of concurrent requests, which is identical to the amount of concurrent benchmark scripts.

Conclusion


We decided to keep on using PyPy for now. From our tests we concluded PyPy and Twisted perform about 80% faster over CPython and mod_wsgi. Of course we will have to re-benchmark when we start setting up the real production environment using separate machines for the mongo replicasets, celery workers and webservers and measure CPU, IO and memory usage, but for now it seems convenient to keep using PyPy. If necessary, it feels more easy to switch from PyPy to CPython then the other way around ;-)

Then there is always the option to change webserver implementations. I already executed the identical benchmark against Gunicorn using various settings but durations were a bit higher then with Twisted. Did you benchmark pure python webservers on PyPy? Let us know!

Friday, July 8, 2011

Talk summaries

Reinout van Rees, one of the esteemed speakers at PyGrunn 2011 turned out to be the fastest blogger alive. Hey made realtime notes of all the talks he visited. We really appreciate his effort.

Head over to Reinout's website to read the summaries

Slides uploaded

A number of presentation slides have been uploaded and are available from the talk descriptions on the schedule page. Some speakers had no slides, they did it all from memory. One of the many reasons you should have been there!

Tuesday, May 31, 2011

PyGrunn 2011 a huge success!

PyGrunn 2011 is over and judging by the smiling faces and positive feedback we can conclude that it was a big success! The talks were inspiring and there was a lot of positive buzz in the hallways in between talk. It is good to know there are so many like-minded Python & friends enthusiast around Groningen. We definitely enjoyed ourselves and we hope you did too.

Keep coming back here for updates and news. For instance about PyGrunn 2012, which will be held on the 11th of May 2012.

We would like to hear your feedback on the conference so we can improve matters for the next episode. Just send them to info@pygrunn.nl

Here's the groupshot of a large part of the audience!

Tuesday, May 17, 2011

T-3

Only three days to go untill PyGrunn 2011! The interest is really heating up, talks are being prepared, stages are being set up, T-shirts are shipped and lunch is ordered. It promises to be a wonderful day! Be sure to keep spreading the word to every software enthusiast you know so we can fill up the last seats and have a full house coming friday!

Monday, May 9, 2011

Talk added: Luit van Drongelen - Lightweight Python deployment servers

One of the last talks we added before the schedule is completely filled up. And it's another interesting one! It's about time you buy your tickets since we're selling out fast.

Thursday, May 5, 2011

Talk added: Alexander Solovyov - HG and complex development processes

Alexander Solovyov is both a valued Paylogic coworker and an important contributer to Mercurial (HG). During this talk he will tell us all about HG and complex development processes (both with and within HG). Brace for impact!

Talk added: Òscar Vilaplana - ØMQ

ØMQ (or zeromq) is one of those new fancy friends of Python and Òscar Vilaplana will tell us all about why you should pay close attention to both of them.

Wednesday, May 4, 2011

Schedule ready

The schedule is ready! Head over to the tab 'schedule' to see who is speaking when and where and plan your day.

Tuesday, May 3, 2011

Talk added: Codigy - Architectural choices in software development for mobile devices.

Tom de Vries and Gideon de Kok from Codigy (.nl) will give an interesting talk on 'Architectural choices in software development for mobile devices.' This is not directly Python related, but as the slogan states, this conference is about Python's friends as well. And mobile software development will be one of Python's new friends for sure! Tom and Gideon will show us why this is an exciting development.

Friday, April 29, 2011

Keynote speaker: Armin Ronacher

We are very proud to announce that Armin Ronacher will be the keynote presenter at PyGrunn! Armin is a well known and respected Python developer with an impressive trackrecord. His open source projects include:

Pygments — syntax highlighter written in Python. 2005-
Jinja — template engine for Python. 2006-
Sphinx — a Python documentation tool. 2007-
Werkzeug — WSGI utility library for Python. 2007- *
Babel — collection of tools for internationalizing Python applications. 2008- *
Flask — a microframework for Python based on good intentions. 2010- *
Zine — a blog software written in Python. 2007- *
MoinMoin — wiki engine written in Python. 2008 †

See Armin's website for a complete list. We are honoured that Armin is going to share some of his valuable insights in Python and web development with us!

Wednesday, April 20, 2011

Tickets on sale!

Ticketsale has started! Please go to the 'ticket tab' and order while they're still available.

Tuesday, April 19, 2011

Accepted talk: Duco Dokter - NLTK: natural language processing with Python

Another trick up Python's sleeve is natural language processing. Duco Dokter is going to show us why this is exciting (it is!). Summary of this interesting presentation:

"The Natural Language ToolKit provides a collection of modules for processing natural languages. This enables you, for instance, split text into sentences, tag words in texts regarding the syntactic role in the sentence or phrase, derive word stem, etc. In the real world, as opposed to the scientific world, the toolkit can be used as well, for example to implement a system for implementing automagical links in online documents. This is the example case in the presentation at hand, but the overall aim is to provide some insight into practical use of the NLTK"

Accepted talk: Rix Groenboom - MijnOverheid: performance testing in practice

I'm excited about a new talk added to our already promising schedule for PyGrunn 2011. Here is the summary of Rix Groenboom's talk 'MijnOverheid: performance testing in practice':

"MijnOverheid is the webportal to manage personal data, messages, and all kind of other business people have to do with the Dutch government. Characteristic for the system are the many external connections with various public services (such as GBA, Kadaster, RDW en UWV). Testing of the integrity of MijnOverheid will be discussed. Emphasis will be on performance analysis en profiling."

Monday, April 18, 2011

Track PyGrunn on Lanyrd

Help us increasing the buzz around PyGrunn by tracking us on Lanyrd.

Call for speakers

Although the schedule is almost complete we still have a few open slots for speakers left! So if you want to share your passion with friends your more than welcome at PyGrunn. Please contact info@pygrunn.nl for more information.

Flyer


Our flyer for PyGrunn 2011 is ready! You can find the PDF version here. Please spread the word about our beloved conference using this great flyer.

We would like to give credits to Klabam for the wonderful design.

Added speaker: Kim Chee Leong - Buildout

We are glad to announce that Kim Chee Leong will give a presentation on Buildout. Summary of his talk:

"Buildout is a build tool for Python projects. It's a self-contained environment which manages depencies for a project, install/configure software components and create repeatable DTAP enviroments. Buildout manages Python eggs from pypi, it has dependency checks and allows version pinning. During the development process buildout integrates with version control systems. Non-Python software components like a load balancer of caching proxy can be installed with buildout. Once the buildout configuration is written, deploying the enviroment is a simple task. This shortens the time for a developer to setup a project. Same goes for production environments, settings up a new environment is easy."

Added speaker: Emil Loer - Embeddng Python interpreter in Ruby and vice versa

Another interesting talk has been added to the PyGrunn schedule. Emil Loer will give a talk about embedding the Python interpreter in Ruby and vice versa.

Thursday, January 13, 2011

PyGrunn 2011 will be organised May the 20th 2011 in Het Paleis in Groningen

Het Paleis is a cultural living and workingenvironement. A place where artists, creative persons, entrepreneurs and enthusiasts can meet. A perfect spot for PyGrunn 2011!

Het Paleis
Boterdiep 111
9712 LM Groningen
T. 050-3118849

http://www.hetpaleisgroningen.nl/