jump to navigation

Bjoern September 26, 2016

Posted by PythonGuy in Uncategorized.
add a comment

A recent blog post where the speeds of various WSGI servers was compared piqued my interest. Among the most surprising results a new WSGI server named Bjoern. It is written in C and compatible with Python 2.7. Taking advantage of the famous libev, it provides unparalleled performance. It is difficult to imagine how you can make python servers run any faster.

If you don’t know about libevent and libev, you really should be making comments on performance. Nowadays, the best way to get performance out of your hardware with coroutines and microthreads. Multithreading is now a dinosaur of a foregone era, making the GIL completely irrelevant.

Thinking about the problem I am trying to solve, I think a WSGI server is exactly what I need. I can write my own framework for my servers rather easily. So I’m investigating Bjoern and others at the moment.

Writing PyPy Compatible Python September 24, 2016

Posted by PythonGuy in Uncategorized.
add a comment

Increasing, I see projects boasting support for PyPy. It’s time for a refresher on what PyPy is and why you should be writing PyPy-compliant code.

PyPy, as opposed to PyPI (the Python Package Index), is a project aiming to compile Python with Python. This sounds absurd and make a practical joke of some sort, but it’s important. When you consider the massive success that Google’s V8 Engine for Javascript has been, you wonder why Python can’t do the same thing, and then you realize that if you could just compile Python to native machine code (or any other kind of code) with Python code, then you would be well on your way to achieving V8’s performance, and maybe beating it because the compiler is written in Python, not C, and is thus easier to understand and iterate on.

PyPy has been around for a long time and has been very visible. It has struggled to achieve the level of performance of Python itself (called CPython since it is the Python engine written in C) but lately, it is increasingly showing that not only can it meet CPython, it can beat it.

Now, Python, the language, has a problem. It was written to make the job of the programmer super-easy, but in doing so, has made it incredibly difficult to turn it into machine code. The Python VM makes it all possible, but we don’t want to make a faster VM, we want to take Python code and turn it into the lowest-level machine code, highly optimized for the CPU it is running on. In order to make that happen, you have to modify the definition of Python slightly, or rather, embrace some differences to the CPython standard implementation.

This page documents the changes you need to make.It used to be that you couldn’t do things like assign different types to a variable, but those days seem to be long gone. Now, the only major difference is that things are not garbage collected like they are in CPython. So you need to explicitly close files and generators when you are done using them. Thankfully, the “with” statement makes this trivial. It is a good pattern you should always be using.

There are some other low-level details you probably won’t run into listed here. If you go through the list, you can see how far PyPy has come.

If you want to take advantage of PyPy’s speed, you’ll need to write your code a certain way. This page lists some of the ways you can code your program to make it run a lot faster in PyPy. Basically, it’s all about being aware of what’s happening at the silicon level and working with that. Notably, the first kind of optimization you should do is choosing the right algorithm. You might make your O(N**2) algorithm run as fast as you like, it will still lose to even poorly optimized O(N log N) algorithms when you have large datasets.

PyPy is reported to give about a 7x performance boost. It is production ready, today.

Two other projects to keep your eye on:

  • Nuitka, which complies Python to C++.
  • Pyston, which uses LLVM.

Finally, a word on the GIL. People talk about the GIL as if it’s a bad thing. It’s not. It’s shifting the cost of multi-threading from every object access to the entire process. If you were to naively remove the GIL and add lock-checks on every access, Python would run a bajillion times slower in single-threaded mode. Perhaps someone will figure out a way to get the best of both worlds, but I highly doubt it.

If the GIL is your bottleneck, use multiple processes, invest in building a SOA architecture, and remind yourself that eventually, you’ll have to start running your jobs on more than one server. In other words, with other languages that support multi-threading well, your evolution is single-threaded -> multi-threaded -> multi-process. With Python, we just cut out the middle man, skip multi-threading, and start investing in multi-process development sooner rather than later. In the end, we get to ignore a whole class of errors that are notoriously difficult to detect, diagnose, and repair.

Guido van Rossum has basically said, “Remove the GIL over my dead body (or by proving me wrong.)” Folks, if you can’t show Guido that you can remove the GIL and make Python better, you have no business saying that the GIL is the problem.

 

Picking the Best Python Web Framework September 24, 2016

Posted by PythonGuy in Uncategorized.
1 comment so far

I’m at the point in my job where I get to pick an entirely new web framework for Python. There are so many out there, it’s really hard to choose.

The first choice I need to look at is whether I need a “full” web framework, or a “minimal” web framework. But first, what do I mean by “web framework”?

A Web Framework is a library that provides the following features:

  • A way of mapping URLs to methods
  • A way of maintaining state across web requests (IE, database connections.)
  • A way of rendering HTML templates.
  • Several other goodies that you typically use in a web server.

A full web framework provides all of the above and lots more. This would be frameworks like Pyramid, Django, and Turbogears.

A minimal web framework provides a lot less. These are things like CherryPy and Flask.

By comparison, things like gevent, twisted, and tornado are not web frameworks. They are simply web servers. You’ll have to build the framework bits yourself.

Since I’m not building a user-facing website but a backend REST server, I don’t need a full web framework. This means Django is out of the question, and Pyramid and Turbogears are less desirable because they are so big.

The next question to consider is what version of Python do I intend to use, and whether I want to support things like Cython and PyPy. Since I am interested in performance, I will likely want to experiment with PyPy, anything that doesn’t run on PyPy is out of the question. I also want to support Python 2 AND 3. My team is transitioning to Python 3 so I don’t want to hold them back with my choice.

I then consider whether I need to interface with a database. If so, then I always choose SQLAlchemy. For those who are not familiar with SQLAlchemy, you have no idea what you are missing. Once you experience SQLAlchemy, you will never, ever want to interface with a database in any other way ever again. SQLAlchemy provides features that are all but impossible in other languages, and it does it seamlessly and effortlessly.

Thankfully, SQLAlchemy is a very well-maintained and mature product, so it supports Python 2 and 3 and PyPy.

Now that I’ve narrowed down the field quite a bit, I need to consider the last requirement. Since I’ll be competing with Node.js and other languages that provide coroutines, I want to be able to use gevent. Gevent is one of those hidden gems in Python that no one seems to know about. They say, “Python doesn’t support coroutines” but with gevent, it really does and it is awesome. Gevent makes Python competitive with many languages. PyPy seals the deal and makes Python the best language ever.

And now, let’s look at my options.

  • CherryPy, which has been around a long time and been a favorite of mine. I like the logo and the name, but it is engineered very well and supports all of the features I need. CherryPy also supports SSL natively.
  • Pylons is old, stable, and incredibly powerful. I have spent a lot of time in Pylons and I loved every minute of it.
  • Pyramid is new and I’ve tried to use it a few times but I’ve always chosen Pylons instead. Maybe I should give it another shot.
  • web2py is not Python 3 compatible.
  • Wheezy.web’s claim to fame was being the fastest back in 2012. It hasn’t been updated since 2015.
  • Bottle seems intriguing and simple. I wonder whether it supports PyPy though.
  • Flask is very popular and deserves inspection. It doesn’t seem to support Python 3 well, though. Nor does it seem to support PyPy.
  • Hug is also intriguing.
  • Falcon is what Hug is built on. So I need to take a look.

In terms of a web server, I’m going to use something off the shelf. Here are my options:

  • Nginx. I have a long history with Nginx and I really don’t like it.
  • Apache. People don’t like Apache for some reason. It is not as fast as Nginx but, in my book, much easier to configure and use. Also, they don’t hide useful features behind a paywall. Apache also has mod_wsgi.
  • Gunicorn is almost synonymous with Python web development. I’ll have to consider it.
  • Spawning seems interesting. It is worthy of more investigation.
  • Pylon’s Waitress also appears intriguing. It requires more investigation.

I am going to continue to investigate and I’ll try to keep my blog updated with my latest findings.

I should add: The reason why Python has so many web frameworks is because Python is awesome. It’s not hard for people to try out new ideas and get them production ready, and so there are always going to be tons of options out there, and they are going to be quite different from each other. This is overwhelming to some, but I prefer choice and I love experimenting with new things.

Begin with the Ending July 26, 2016

Posted by PythonGuy in Uncategorized.
add a comment

I don’t know what it is, but I’m working on a team now where some people like to write a bunch of code, test it, and then integrate. To me, that feels backwards.

My preferred order is: write a tiny bit of code, integrate that, and then write a lot of code, testing the integration along the way. Unit tests and such often come last, and then only when I’m not able to easily test with integration tests.

The way you do this is through scaffolding. Say you have a client and a server. I would write a minimal server. Then I would write a minimal client. Then I would have the client call the server. After that, members of my team can start working on different features, which means modifying the client and the server to provide that feature.

Writing code this way isn’t a license to be stupid. You still have to think hard about how you want things to work. It does, however, free you from a lot of tedious and unnecessary details when planning. IE, you should be focusing on what messages are passed back and forth, and the general content of those messages, not specific parameters and fields.

Notes on Pip July 26, 2016

Posted by PythonGuy in Uncategorized.
add a comment

Why pip?

Pip manages your python code rather well. It downloads and installs dependencies and makes your python experience almost seamless.

Pip combined with virtualenv is a powerful tool. In fact, I never use the pip installed at /usr/bin/pip/. I always set up a virtualenv and use the pip installed there.

Installing 3rd Party Packages

Pip can take URLs but it usually takes package names of packages on PyPi. You can even specify versions.

Here’s how to do that:

pip install <package name>

If you want a specific version:

pip install <package name>==<version number>

You can install multiple packages at once, too.

But note, you really shouldn’t be installing more than one package. If you’re developing your own package, you shouldn’t be installing anything but your own package.

Installing Your Own Packages

Create a setup.py and layout your code properly. I won’t document that here; you can look elsewhere for instructions. Then run pip to install your package. It will install all your dependencies as well.

pip install -e <directory with setup.py>

If you want to install the test or dev dependencies:

pip install -e <directory with setup.py>[test]
pip install -e <directory with setup.py>[dev]

NOTE: If you are using zsh, you have to quote things properly.

The -e flag puts the files in as symlinks to the original code files. You can modify the files in your project and they will be modified in the installed location.

Bad Practices

  • Not writing a setup.py because you think your code is not a package. All the code you write should be a package.
  • Doing pip freeze > requirements.txt. Just use setup.py and list your dependencies explicitly.
  • Not using virtualenv or similar for each project.

 

Things in ViM I Really Miss June 22, 2016

Posted by PythonGuy in Uncategorized.
1 comment so far

When I am using a different text editor or (blegh! a word processor), there are certain things I miss. Namely:

  • quickly searching for things with /<regex>
  • quickly replacing with :%s/…/…/
  • marking points and jumping between them.
  • Jumping to the end of a sentence or the end of a paragraph or the end of a word, or the beginnings of the above.
  • Repeating the last command I just did.
  • Recording a macro and running it multiple times.

I really wonder how you heathens who never learned an advanced text editor like ViM get along with your life. It’s like watching you trying to create a fire with a stick and a string while I have a microwave.

GPL Thoughts May 11, 2016

Posted by PythonGuy in Licensing, Uncategorized.
Tags: , , , , , ,
add a comment

It’s 2016, and the GPL is once again on my mind. Broadly speaking, there are three types of licenses out there.

  • Free Software, which is software licensed such that it will always remain freely available and open to modification. As a side-effect, it will “infect” other software that uses it such that it becomes free as well.
  • Open-source Software, which is software licensed such that it can be freely shared or modified, but it doesn’t “infect” other software.
  • Proprietary Software, which is software licensed such that it cannot be freely shared or modified.

There is a fourth type, but it barely deserves mentioning. “Public domain” software, which is software that is practically dead.

When Stallman came up with the GPL, his interest was to fundamentally change the way we write, use, modify and share software. In order to accomplish this, he set into motion the following plan:

  1. Write some good software and freely share it, but make sure it cannot be part of any proprietary software.
  2. Write more good software that competes against and replaces proprietary software.
  3. Eventually, it won’t make sense to write proprietary software anymore, because all the good software is free and nobody expects to pay for software.

According to Stallman’s plan, we’re very deep into 2 and well on the way to 3.

The question that inevitably arises, “How do developers get paid?” is forefront on my mind. After all, I write code so that I can get paid. (I would probably write code otherwise, but not nearly so much.)

So how can I get paid for my work?

Truth be told, I make more money from supporting software than anything else. It’s very rare that I get to write entirely new software, even when I’m working on a software project. Most of the time, I am fixing already existing code, or adapting it to some purpose, or more likely, testing it to see if it actually does what we think it does, or just to figure out why it doesn’t do what I want it to do.

The tools that I use are almost all GPL. Or rather, if all of my tools were GPL it wouldn’t hurt me in the slightest. In fact, it would make my life a lot better.

But then how do I make money?

Simply put, people still want software to be written, and it isn’t unheard of to have people pay me to work on open source projects. It isn’t unthinkable that I could be hired to work on free software as well, if there was a great need for that. There are not a few developers who are already being paid by private companies to do that.

The big question then isn’t, “How do individual developers get paid”, but “How do we convince people to pay for software development work?” The answer is when there is a need, people will pay. Sometimes very large sums of money.

The proprietary model doesn’t make any sense anymore. Proprietary software is an agreement that looks like this. “Give me your money, I’ll take it and I’ll give you software that you can’t look at or modify. If you like the software, keep giving me money. If you want to make the software better, beg me to do it for you.” That just doesn’t fly.

I think we’ll see companies that arise that hand out their software for free. In exchange, they will get paid to modify the software or to teach people how to use it. They may also get paid to adapt the software for a particular environment. Thus, the value in a software company will not be the software, but the developers and the ability of the company to apply the software to solve problems.

This, to me, is tremendously encouraging. If this vision comes to pass, I won’t be hired to write software and fired when I finish. I’ll be hired to staff companies so they can say, “Whatever you need done, we can do it, provided you can afford our developers.”

The GPL really is the way forward, and free software is the only good solution out there.

 

 

 

Shared Memory and Python May 10, 2016

Posted by PythonGuy in Advanced Python, GIL.
add a comment

I’m researching shared memory and Python right now, because it seems like the only hope for a particular situation we are seeing.

Basically, we have a very high-performance web server that is designed to handle a large number of requests per second. At the same time, we want to be able to update the code that is used to process these requests, real time.

Our webserver is using microthreads (greenlets) and so it can only really do one thing at a time, even though it looks like it is doing many things. When we go to update the code, everything else must stop until the update is complete.

Obviously, we can use rolling deployments, but there are some particular issues we see with that. Namely, memory and resource management.

If we did “hard” rolling updates, we would have to turn off a node, taking it out of service. Then we’d perform the operation, and then add the node back to the service. This would take a non-trivial amount of time.

If we did “soft” rolling updates, we would spin up a new web server on a node, flip from the old to the new server, and retire the old server. This requires twice as much memory as we typically need.

Another option would involve shared memory. The code would be served from shared memory. When we’d like to do an update, we’d launch another process to create a new chunk of code in shared memory, then we’d flip from the old to the new code. This seems like the ideal option.

The problem is that Python only supports the most basic data structures in shared memory.

Perhaps there is a way to trick Python into treating shared memory as actual Python code, maybe by storing the code as an array. When we go to run the code, we load the code from the array directly into a function and then call it. I don’t know if that would work, or that any method would work, but I’m looking into it.

Suggestions welcome!

The Pastiche Test May 2, 2016

Posted by PythonGuy in Uncategorized.
1 comment so far

I found this article fascinating: The pastiche test

The idea is sound, and I wish I knew what it was called before he named it, if it had a name.

The idea is this: Can you implement core language features in the language itself? That is, can you write something that would do the same thing as “if” using functions or whatnot?

There have been more than one occasion where I would have like to have had access to, or be able to replicated, some of the core features of Python. If Python completely passes the pastiche test (and PyPy is proof that it can), then that would not have been a hurdle.

Lazy Evaluation by Default May 2, 2016

Posted by PythonGuy in Uncategorized.
add a comment

I’ve been thinking a lot about a programming language idea I had where function calls do not execute the parameters before hand. It sounds interesting enough but I think it’s not anything new. It’s a style of programming called “Lazy evaluation”.

It works like this.

Normally, you’d have code that looks like this:

a = 1 + 1

b = a + 1

and you’d expect the following operations to occur:

  1. Add 1 and 1 (2)
  2. Assign 2 to a
  3. Add 2 and 1 (3)
  4. Assign 3 to b

With lazy evaluation, none of the above happens. Instead, this is what happens.

  1. Assign a to “1 + 1”
  2. Assign b to “a + 1”

Note that none of the addition operations are performed. Instead, if you wanted to get the value for b, you’d have to tell it, “Calculate your value.” In which case, the following sequence of events would occur:

  1. Calculate a.
  2. Calculate 1 + 1
  3. Calculate 2 + 1

Note that this demonstrates how you can nest lazy evaluation.

A beautiful aspect of lazy evaluation is that values that are never used are never evaluated. This can be a problem when you consider functions that have side effects. If there is a value that depends on the results of a function with side effects, that function will never get called and those side effects never manifest unless the value that depends on the function is evaluated.

But this can actually be a benefit. Remember how when you do “a or b” on Python it doesn’t evaluate b if a is True? This makes the “or” and “and” operators behave something like “if”. That is, don’t execute an entire branch of code if you’re not supposed to.

Just a thought at this point. I might try writing some pseudo-code to see how it all works out.