jump to navigation

A Sane REST October 3, 2016

Posted by PythonGuy in Uncategorized.
add a comment

I’ve complained about REST numerous times, but I think I have a way of making it sane and useful.

My first design goal was compatibility with the huge variety of REST clients and servers out there. By “compatible”, I mean that it should work more or less, or be super-imposable on the system. That is, the system I describe here should be at least a subset of the features the clients or servers provide. I want to take advantage of all the tools and resources out there, but I don’t want to use peculiar features of one or the other.

The second design goal was simplicity. I want people to “get” it at a fundamental level, and see how to make their REST server compatible without having to think too hard.

Here’s the overview. I’m going to use the customer object as an example.

  1. There are three types of resources: Group Resources, Item Resources, and Method Resources.
  2. Group Resources live at /customers/. You can GET or POST. You cannot DELETE.
    1. GET will grab a subset of the customers. I call this “find”. You can specify various filter parameters based on the attributes of the object (you can get inventive here), but you must support the following parameters.
      • page: The page number, 1-based. This can be specified.
      • items_per_page: The number of items per page. This can be specified.
    2. POST will add an item.
  3. Item Resources live at /customers/:id. You can GET, PUT, or DELETE. Optionally, you can PATCH, but you really shouldn’t have objects with so many attributes that it’s necessary.
    1. GET will fetch the item.
    2. PUT will update the item’s attributes.
    3. DELETE will remove the item.
  4. Method Resources live at /email/. Method names are typically verbs. You can GET or POST. Either way, it’s the same.

Parameters are specified either in the URL GET parameters or the POST/PUT bodies. For POST and PUT, you can also specify parameters via the URL.

It is encouraged to use JSON as the POST/PUT body. However, if you use form encoded parameters, the following convention applies:

  • If a parameter is specified once, it is considered a single value.
  • If a parameter is specified multiple times, it is considered part of a list.

Don’t make it any more complicated than that.

I haven’t really decided what conventions I’ll use for documentation and such. I assume that Swagger is good enough and use those conventions.

Only the following HTTP status codes are allowed:

  • 200 means it all went OK. The response is JSON-encoded.
    • GET to a Group Resource has the following parameters:
      • page: The page number, 1-based.
      • items_per_page: The number of items per page.
      • first_item: The index of the first item, 0-based. This is (page-1)*items_per_page.
      • last_item: The index of the last item+1, 0-based. This is (page)*items_per_page, or total_items, whichever is less.
      • total_items: The total number of items (if available).
      • next_page: If there is a next page, the next page number goes here.
      • prev_page: If this is not page 1, the previous page number goes here.
      • results: The results, a list of items.
    • POST to a Group Resource returns the new object, as for the GET to the Item Resource.
    • GET to an Item Resource returns the object.
    • PUT to an Item Resource returns the object.
    • DELETE to an Item Resource returns “” or {}
    • GET or POST to a Method Resource returns the response.
  • 404 means the resource doesn’t exist or you don’t have authorization to access it.
  • 401 means you are not authenticated and you need to be to access those resources. This is a cue to the client that it needs to log in again.
  • 500 is for all other errors.
  • For all the errors, a JSON-encoded object with the following parameters are returned:
    • code: The error code or name. In Python, this would be the exception name.
    • description: A human-readable description of the error, hopefully with suggestions on how to fix it.
    • stacktrace: In non-production environments, this would have the stacktrace.

That’s all there is to it. Pretty simple and straightforward, and it should be compatible with most clients and servers out there.

URI vs. URL vs. URN October 3, 2016

Posted by PythonGuy in Uncategorized.
add a comment

Sometimes people confuse URLs with URIs.

Here’s how I keep track of what the difference is.

A URL is a string of text that points you to something on the internet you can download. The “L” means “Location”.

A URN is a string of text that points you to something that exists in the real world but not on the internet. The “N” means “Name”. These are things like the ISBN of a book.

A URI is a string of text that can be a URL or URN.

I know this isn’t very precise, but it should be helpful.

Please don’t use URI when you mean URL. If you are describing something on your server, then it’s a URL. If you’re describing an ephemeral concept that can’t exist on a server, then use URI.

Bjoern September 26, 2016

Posted by PythonGuy in Uncategorized.
add a comment

A recent blog post where the speeds of various WSGI servers was compared piqued my interest. Among the most surprising results a new WSGI server named Bjoern. It is written in C and compatible with Python 2.7. Taking advantage of the famous libev, it provides unparalleled performance. It is difficult to imagine how you can make python servers run any faster.

If you don’t know about libevent and libev, you really should be making comments on performance. Nowadays, the best way to get performance out of your hardware with coroutines and microthreads. Multithreading is now a dinosaur of a foregone era, making the GIL completely irrelevant.

Thinking about the problem I am trying to solve, I think a WSGI server is exactly what I need. I can write my own framework for my servers rather easily. So I’m investigating Bjoern and others at the moment.

Writing PyPy Compatible Python September 24, 2016

Posted by PythonGuy in Uncategorized.
add a comment

Increasing, I see projects boasting support for PyPy. It’s time for a refresher on what PyPy is and why you should be writing PyPy-compliant code.

PyPy, as opposed to PyPI (the Python Package Index), is a project aiming to compile Python with Python. This sounds absurd and make a practical joke of some sort, but it’s important. When you consider the massive success that Google’s V8 Engine for Javascript has been, you wonder why Python can’t do the same thing, and then you realize that if you could just compile Python to native machine code (or any other kind of code) with Python code, then you would be well on your way to achieving V8’s performance, and maybe beating it because the compiler is written in Python, not C, and is thus easier to understand and iterate on.

PyPy has been around for a long time and has been very visible. It has struggled to achieve the level of performance of Python itself (called CPython since it is the Python engine written in C) but lately, it is increasingly showing that not only can it meet CPython, it can beat it.

Now, Python, the language, has a problem. It was written to make the job of the programmer super-easy, but in doing so, has made it incredibly difficult to turn it into machine code. The Python VM makes it all possible, but we don’t want to make a faster VM, we want to take Python code and turn it into the lowest-level machine code, highly optimized for the CPU it is running on. In order to make that happen, you have to modify the definition of Python slightly, or rather, embrace some differences to the CPython standard implementation.

This page documents the changes you need to make.It used to be that you couldn’t do things like assign different types to a variable, but those days seem to be long gone. Now, the only major difference is that things are not garbage collected like they are in CPython. So you need to explicitly close files and generators when you are done using them. Thankfully, the “with” statement makes this trivial. It is a good pattern you should always be using.

There are some other low-level details you probably won’t run into listed here. If you go through the list, you can see how far PyPy has come.

If you want to take advantage of PyPy’s speed, you’ll need to write your code a certain way. This page lists some of the ways you can code your program to make it run a lot faster in PyPy. Basically, it’s all about being aware of what’s happening at the silicon level and working with that. Notably, the first kind of optimization you should do is choosing the right algorithm. You might make your O(N**2) algorithm run as fast as you like, it will still lose to even poorly optimized O(N log N) algorithms when you have large datasets.

PyPy is reported to give about a 7x performance boost. It is production ready, today.

Two other projects to keep your eye on:

  • Nuitka, which complies Python to C++.
  • Pyston, which uses LLVM.

Finally, a word on the GIL. People talk about the GIL as if it’s a bad thing. It’s not. It’s shifting the cost of multi-threading from every object access to the entire process. If you were to naively remove the GIL and add lock-checks on every access, Python would run a bajillion times slower in single-threaded mode. Perhaps someone will figure out a way to get the best of both worlds, but I highly doubt it.

If the GIL is your bottleneck, use multiple processes, invest in building a SOA architecture, and remind yourself that eventually, you’ll have to start running your jobs on more than one server. In other words, with other languages that support multi-threading well, your evolution is single-threaded -> multi-threaded -> multi-process. With Python, we just cut out the middle man, skip multi-threading, and start investing in multi-process development sooner rather than later. In the end, we get to ignore a whole class of errors that are notoriously difficult to detect, diagnose, and repair.

Guido van Rossum has basically said, “Remove the GIL over my dead body (or by proving me wrong.)” Folks, if you can’t show Guido that you can remove the GIL and make Python better, you have no business saying that the GIL is the problem.


Picking the Best Python Web Framework September 24, 2016

Posted by PythonGuy in Uncategorized.
1 comment so far

I’m at the point in my job where I get to pick an entirely new web framework for Python. There are so many out there, it’s really hard to choose.

The first choice I need to look at is whether I need a “full” web framework, or a “minimal” web framework. But first, what do I mean by “web framework”?

A Web Framework is a library that provides the following features:

  • A way of mapping URLs to methods
  • A way of maintaining state across web requests (IE, database connections.)
  • A way of rendering HTML templates.
  • Several other goodies that you typically use in a web server.

A full web framework provides all of the above and lots more. This would be frameworks like Pyramid, Django, and Turbogears.

A minimal web framework provides a lot less. These are things like CherryPy and Flask.

By comparison, things like gevent, twisted, and tornado are not web frameworks. They are simply web servers. You’ll have to build the framework bits yourself.

Since I’m not building a user-facing website but a backend REST server, I don’t need a full web framework. This means Django is out of the question, and Pyramid and Turbogears are less desirable because they are so big.

The next question to consider is what version of Python do I intend to use, and whether I want to support things like Cython and PyPy. Since I am interested in performance, I will likely want to experiment with PyPy, anything that doesn’t run on PyPy is out of the question. I also want to support Python 2 AND 3. My team is transitioning to Python 3 so I don’t want to hold them back with my choice.

I then consider whether I need to interface with a database. If so, then I always choose SQLAlchemy. For those who are not familiar with SQLAlchemy, you have no idea what you are missing. Once you experience SQLAlchemy, you will never, ever want to interface with a database in any other way ever again. SQLAlchemy provides features that are all but impossible in other languages, and it does it seamlessly and effortlessly.

Thankfully, SQLAlchemy is a very well-maintained and mature product, so it supports Python 2 and 3 and PyPy.

Now that I’ve narrowed down the field quite a bit, I need to consider the last requirement. Since I’ll be competing with Node.js and other languages that provide coroutines, I want to be able to use gevent. Gevent is one of those hidden gems in Python that no one seems to know about. They say, “Python doesn’t support coroutines” but with gevent, it really does and it is awesome. Gevent makes Python competitive with many languages. PyPy seals the deal and makes Python the best language ever.

And now, let’s look at my options.

  • CherryPy, which has been around a long time and been a favorite of mine. I like the logo and the name, but it is engineered very well and supports all of the features I need. CherryPy also supports SSL natively.
  • Pylons is old, stable, and incredibly powerful. I have spent a lot of time in Pylons and I loved every minute of it.
  • Pyramid is new and I’ve tried to use it a few times but I’ve always chosen Pylons instead. Maybe I should give it another shot.
  • web2py is not Python 3 compatible.
  • Wheezy.web’s claim to fame was being the fastest back in 2012. It hasn’t been updated since 2015.
  • Bottle seems intriguing and simple. I wonder whether it supports PyPy though.
  • Flask is very popular and deserves inspection. It doesn’t seem to support Python 3 well, though. Nor does it seem to support PyPy.
  • Hug is also intriguing.
  • Falcon is what Hug is built on. So I need to take a look.

In terms of a web server, I’m going to use something off the shelf. Here are my options:

  • Nginx. I have a long history with Nginx and I really don’t like it.
  • Apache. People don’t like Apache for some reason. It is not as fast as Nginx but, in my book, much easier to configure and use. Also, they don’t hide useful features behind a paywall. Apache also has mod_wsgi.
  • Gunicorn is almost synonymous with Python web development. I’ll have to consider it.
  • Spawning seems interesting. It is worthy of more investigation.
  • Pylon’s Waitress also appears intriguing. It requires more investigation.

I am going to continue to investigate and I’ll try to keep my blog updated with my latest findings.

I should add: The reason why Python has so many web frameworks is because Python is awesome. It’s not hard for people to try out new ideas and get them production ready, and so there are always going to be tons of options out there, and they are going to be quite different from each other. This is overwhelming to some, but I prefer choice and I love experimenting with new things.

Begin with the Ending July 26, 2016

Posted by PythonGuy in Uncategorized.
add a comment

I don’t know what it is, but I’m working on a team now where some people like to write a bunch of code, test it, and then integrate. To me, that feels backwards.

My preferred order is: write a tiny bit of code, integrate that, and then write a lot of code, testing the integration along the way. Unit tests and such often come last, and then only when I’m not able to easily test with integration tests.

The way you do this is through scaffolding. Say you have a client and a server. I would write a minimal server. Then I would write a minimal client. Then I would have the client call the server. After that, members of my team can start working on different features, which means modifying the client and the server to provide that feature.

Writing code this way isn’t a license to be stupid. You still have to think hard about how you want things to work. It does, however, free you from a lot of tedious and unnecessary details when planning. IE, you should be focusing on what messages are passed back and forth, and the general content of those messages, not specific parameters and fields.

Notes on Pip July 26, 2016

Posted by PythonGuy in Uncategorized.
add a comment

Why pip?

Pip manages your python code rather well. It downloads and installs dependencies and makes your python experience almost seamless.

Pip combined with virtualenv is a powerful tool. In fact, I never use the pip installed at /usr/bin/pip/. I always set up a virtualenv and use the pip installed there.

Installing 3rd Party Packages

Pip can take URLs but it usually takes package names of packages on PyPi. You can even specify versions.

Here’s how to do that:

pip install <package name>

If you want a specific version:

pip install <package name>==<version number>

You can install multiple packages at once, too.

But note, you really shouldn’t be installing more than one package. If you’re developing your own package, you shouldn’t be installing anything but your own package.

Installing Your Own Packages

Create a setup.py and layout your code properly. I won’t document that here; you can look elsewhere for instructions. Then run pip to install your package. It will install all your dependencies as well.

pip install -e <directory with setup.py>

If you want to install the test or dev dependencies:

pip install -e <directory with setup.py>[test]
pip install -e <directory with setup.py>[dev]

NOTE: If you are using zsh, you have to quote things properly.

The -e flag puts the files in as symlinks to the original code files. You can modify the files in your project and they will be modified in the installed location.

Bad Practices

  • Not writing a setup.py because you think your code is not a package. All the code you write should be a package.
  • Doing pip freeze > requirements.txt. Just use setup.py and list your dependencies explicitly.
  • Not using virtualenv or similar for each project.


Things in ViM I Really Miss June 22, 2016

Posted by PythonGuy in Uncategorized.
1 comment so far

When I am using a different text editor or (blegh! a word processor), there are certain things I miss. Namely:

  • quickly searching for things with /<regex>
  • quickly replacing with :%s/…/…/
  • marking points and jumping between them.
  • Jumping to the end of a sentence or the end of a paragraph or the end of a word, or the beginnings of the above.
  • Repeating the last command I just did.
  • Recording a macro and running it multiple times.

I really wonder how you heathens who never learned an advanced text editor like ViM get along with your life. It’s like watching you trying to create a fire with a stick and a string while I have a microwave.

GPL Thoughts May 11, 2016

Posted by PythonGuy in Licensing, Uncategorized.
Tags: , , , , , ,
add a comment

It’s 2016, and the GPL is once again on my mind. Broadly speaking, there are three types of licenses out there.

  • Free Software, which is software licensed such that it will always remain freely available and open to modification. As a side-effect, it will “infect” other software that uses it such that it becomes free as well.
  • Open-source Software, which is software licensed such that it can be freely shared or modified, but it doesn’t “infect” other software.
  • Proprietary Software, which is software licensed such that it cannot be freely shared or modified.

There is a fourth type, but it barely deserves mentioning. “Public domain” software, which is software that is practically dead.

When Stallman came up with the GPL, his interest was to fundamentally change the way we write, use, modify and share software. In order to accomplish this, he set into motion the following plan:

  1. Write some good software and freely share it, but make sure it cannot be part of any proprietary software.
  2. Write more good software that competes against and replaces proprietary software.
  3. Eventually, it won’t make sense to write proprietary software anymore, because all the good software is free and nobody expects to pay for software.

According to Stallman’s plan, we’re very deep into 2 and well on the way to 3.

The question that inevitably arises, “How do developers get paid?” is forefront on my mind. After all, I write code so that I can get paid. (I would probably write code otherwise, but not nearly so much.)

So how can I get paid for my work?

Truth be told, I make more money from supporting software than anything else. It’s very rare that I get to write entirely new software, even when I’m working on a software project. Most of the time, I am fixing already existing code, or adapting it to some purpose, or more likely, testing it to see if it actually does what we think it does, or just to figure out why it doesn’t do what I want it to do.

The tools that I use are almost all GPL. Or rather, if all of my tools were GPL it wouldn’t hurt me in the slightest. In fact, it would make my life a lot better.

But then how do I make money?

Simply put, people still want software to be written, and it isn’t unheard of to have people pay me to work on open source projects. It isn’t unthinkable that I could be hired to work on free software as well, if there was a great need for that. There are not a few developers who are already being paid by private companies to do that.

The big question then isn’t, “How do individual developers get paid”, but “How do we convince people to pay for software development work?” The answer is when there is a need, people will pay. Sometimes very large sums of money.

The proprietary model doesn’t make any sense anymore. Proprietary software is an agreement that looks like this. “Give me your money, I’ll take it and I’ll give you software that you can’t look at or modify. If you like the software, keep giving me money. If you want to make the software better, beg me to do it for you.” That just doesn’t fly.

I think we’ll see companies that arise that hand out their software for free. In exchange, they will get paid to modify the software or to teach people how to use it. They may also get paid to adapt the software for a particular environment. Thus, the value in a software company will not be the software, but the developers and the ability of the company to apply the software to solve problems.

This, to me, is tremendously encouraging. If this vision comes to pass, I won’t be hired to write software and fired when I finish. I’ll be hired to staff companies so they can say, “Whatever you need done, we can do it, provided you can afford our developers.”

The GPL really is the way forward, and free software is the only good solution out there.




Shared Memory and Python May 10, 2016

Posted by PythonGuy in Advanced Python, GIL.
add a comment

I’m researching shared memory and Python right now, because it seems like the only hope for a particular situation we are seeing.

Basically, we have a very high-performance web server that is designed to handle a large number of requests per second. At the same time, we want to be able to update the code that is used to process these requests, real time.

Our webserver is using microthreads (greenlets) and so it can only really do one thing at a time, even though it looks like it is doing many things. When we go to update the code, everything else must stop until the update is complete.

Obviously, we can use rolling deployments, but there are some particular issues we see with that. Namely, memory and resource management.

If we did “hard” rolling updates, we would have to turn off a node, taking it out of service. Then we’d perform the operation, and then add the node back to the service. This would take a non-trivial amount of time.

If we did “soft” rolling updates, we would spin up a new web server on a node, flip from the old to the new server, and retire the old server. This requires twice as much memory as we typically need.

Another option would involve shared memory. The code would be served from shared memory. When we’d like to do an update, we’d launch another process to create a new chunk of code in shared memory, then we’d flip from the old to the new code. This seems like the ideal option.

The problem is that Python only supports the most basic data structures in shared memory.

Perhaps there is a way to trick Python into treating shared memory as actual Python code, maybe by storing the code as an array. When we go to run the code, we load the code from the array directly into a function and then call it. I don’t know if that would work, or that any method would work, but I’m looking into it.

Suggestions welcome!