jump to navigation

An Analogy for Types January 14, 2017

Posted by PythonGuy in Uncategorized.
1 comment so far

Sometimes the best way to teach a principle is to share an analogy.

Let’s come up with an analogy of types. Let’s say your program is a set of instructions you give to Amelia Bedelia.

Now, suppose Amelia Bedelia needed you to spell out what type of everything you refer to. “Please, Amelia Bedelia, bake a cake” becomes “bake (a function which takes a recipe for something bakeable) a cake (an instance of the cake class, which is a bakeable item).

Versus, “Bake a cake”.

See the point?

Now, if you told Amelia Bedelia “bake a shoe”, in a dynamic, strong typed system, she would look for the recipe for baking a shoe, and not finding one, would say, “I can’t do that. I can’t find the recipe for a shoe.”

Either way, the end result is the same. The question is when does Amelia Bedelia tell you she can’t do it: right after you tell her, or when she realizes she can’t do it.

But remember how hard it was to tell the strict/weakly typed Amelia Bedelia to bake a cake?

End.

The Code Development Lifecycle January 13, 2017

Posted by PythonGuy in Uncategorized.
add a comment
  1. Clearly identify the problem.
  2. Document the problem.
  3. Identify multiple solutions to the problem.
  4. Document the solutions.
  5. Choose the best solution.
  6. Document the reasons why you think the solution you chose is best.
  7. Write unit tests to demonstrate and reproduce the problem.
  8. Document the unit tests.
  9. Write integration tests to demonstrate and reproduce the problem.
  10. Document the integration tests.
  11. Correct the code to make the unit tests pass.
  12. Document the code.
  13. Code review for style and consistency.
  14. Deploy to integration system.
  15. Document the deployment.
  16. Run the code in the larger system against integration tests.
  17. When all tests pass, deploy to production.
  18. Document the deployment.

Notes:

  • Identifying the problem requires the art of science. Considering your observations, propose a theory. Try to disprove that theory with tests. The theory that survives all critical tests may be correct, but please don’t limit your imagination. As you gain more experience as a developer, you’re going to see more kinds of problems so you don’t have to be so imaginative.
  • Document everything. Why? Because it helps you move on with your life and it helps the poor schmuck who has to keep up with you or follow you.
  • Identify multiple solutions. If you only have one solution in mind, you have a bad imagination.
  • Choose the best solution. What is “best”? That depends on you and your team values. You should have a discussion with your team on what is truly important.
  • Unit tests test only one function, and not even the code that the function calls. Mock aggressively. (This is where dynamic, strong type systems shine best.)
  • Integration tests test that two systems interface properly. When you have two systems that interface properly, you have a new system that includes them both that needs to be tested with other systems. Integration tests take a long time to run and are usually quite complicated.
  • When you write a test, make sure it fails. If it doesn’t fail, it is a bad test.
  • You write as much code to make the tests pass, and no more. If you want to add more, you need to go back to step 1.
  • Code review will never catch bugs. Don’t try to catch bugs in code review. Instead, check that the developer has been keeping up best practices and ensure that this is code you want to maintain in the long run.

 

Static Typing January 13, 2017

Posted by PythonGuy in Uncategorized.
2 comments

One of the hottest debates in programming, even today, is typing. By that, of course, I mean variable types, not the sort of typing you do on the keyboard, although the editor wars are still raging. (ViM is the best by the way.)

I’d like to try and approach this discussion with some logic. Before I engage the logic muscles, though, let me announce that I have written thousands and thousands and maybe millions of lines of code. I don’t know. I have my keyboard I’ve been using for the past four years and the letter “s” is completely gone and “a” and a few others are on their way out. I am paid to write code, I am paid to make other people’s code work, and I am paid to tell other engineers how to write their code. I’m a senior engineer.

We had, about six months ago, a very lively debate about typing. And we decided to go with Python. I think that was the right decision. I have an opinion based on lots of experience, and I think I am right, even without any logic.

But let’s set that aside.

Let’s do this logically. Here are a list of logical statements.

  1. “Variable” is any named entity in your program that stores a “value”. It can be anything from integers, floats, strings, complex data structures like arrays or lists, and even functions and classes and modules and stacktraces.
  2. The “type” of a value tells the programmer and program alike how the value behaves. Certain behavior varies based on the type. For instance, you don’t add integers the same way as you add floats.
  3. “Strong typing” means you can tell what type a value is with no other information than the value itself. Python is an example of a “strongly typed” language, as every value is stored in memory as a PyObject, and the Python language can tell you the type of any value.
  4. “Weak typing” means you cannot tell what type a value is without additional information. C/C++ are good examples of this, as you could be looking at an int or a float or anything else. Without type declarations in the language itself, it would be impossible to keep things straight.
  5. In many languages, variables hold information on the type of the value they store. However, this is not true for all languages. For instance, in Python, the variable is simply a name-value pair, stored in a dict.
  6. “Static typing” means that you cannot change the type of the value in a variable. Some languages allow you to assign a derived class of the type of the variable, others are more strict.
  7. “Dynamic typing” means any variable can hold any value. Python is fully dynamic, but many languages are partially dynamic as they cannot store all values in a variable or some variables do contain type information.
  8. “Explicit typing” means the programmer must tell the computer what type each variable is.
  9. “Implicit typing” means the programmer does not tell the computer what type each variable is. The computer can infer the types by simple analysis.
  10. “Type system” is the way types are treated by a particular language, and includes the language used to describe its types.
  11. “Simple” and “complex” refer to the number of components and the number of sub-components in those components. IE, the function “foo(bar, baz)” is more complex than “foo(bar)” because it takes 2 parameters. (Parameters are sub-components of a function.)
  12. “Correct code” means that the code accomplishes the purpose it was intended to accomplish. “Incorrect code” means it is not correct. Note that simply because a program compiles does not mean it is correct. There must be some human element to judge the correctness of the program.
  13. Simple is better than complex, but the code must be correct for it to matter at all.
  14. Explicit is better than implicit sense it helps people unfamiliar with a system understand how it works.
  15. Explicit typing requires a type system that is explicit. That is, the programmer must spell out what the types are using the language the type system uses.
  16. Implicit typing merely hides the type system from the programmer. However, there is still a type system underneath that the programmer needs to be aware of when he violates the constraints of the system.
  17. Dynamically typed languages typically have a simpler type system than static type systems. All variables can be any type of value so there are no constraints like there are in static systems.
  18. Statically typed languages must have a more complicated type system. This is because it imposes at least one constraint: Variables cannot hold a different type of value.
  19. Weakly typed languages require static typing. This is because it is impossible to manage the values without knowing what types they are, and the values themselves do not contain that information.
  20. Strongly typed languages do not require static typing. This is because it is additional information that should at least be consistent with each other based on the type system.
  21. Complex systems are more difficult to understand and manipulate than simple systems.
  22. There is a class of error called “type mismatch errors”. They are introduced when the programmer creates incorrect code that improperly handles the values in question.
  23. Many static type systems eliminate or at least reduce type mismatch errors at compile time.
  24. Strong, dynamic type systems do not  eliminate or reduce type mismatch errors at compile time.
  25. Whether or not errors are caught at compile time or run time doesn’t matter as long as the errors are caught.
  26. In order to prove your software correct, you must demonstrate that it behaves as expected. Only the simplest of programs can be analyzed by reading the code.
  27. 26 requires writing what we call “tests”. The code is run against the test, and if it passes, then it is assumed the code is correct. If the test is not able to detect errors in the code, then it is not a sufficient test.
  28. 23 & 24 imply that the compiler is doing some of the tests that would be handled in the testing phase.
  29. Since explicit is better than implicit, implicitly testing the code with the compiler is worse than explicitly testing the code with tests.
  30. Therefore, dynamic, strong typing is best.

Addendum

This is really the argument “strong, dynamic type systems are much simpler than any other type system; The benefit of a more complicated type system is that only one kind of error is detected during the compilation phase rather than the test phase, but this is a very small benefit compared to the cost of the complexity of having a type system at all. Therefore, in all cases, strong, dynamic type systems are best.” I’ve just spelled out the assumptions and the logic behind it all.

I should mention that people who write code but do not write sufficient tests are cheating. Until you write tests, you cannot understand whether your code is right or wrong. The compiler can’t tell you anything. It exists to convert your code into machine instructions, that’s it. You still have to test those machine instructions for correctness.

With strong, dynamic systems you do not write tests that have already been written. For instance, I don’t need to write a test for what happens when you add a string and an integer in Python. Those tests already exist, and cover every possible combination of types. When a new type is introduced, it should include tests that would plug it into the ecosystem. IE, if you want it to have addition property, then you need to write the tests that will show it adds in some cases but not others.

Finally, I want to mention what I think is the most obvious evidence against typing systems. You know how we used to have a big zoo of fundamental particles until physicists were able to figure out quarks? See, if a system is composed of smaller systems, learn those smaller systems and ignore the bigger system, and you’ll understand the bigger system. Every sufficiently complicated type system has, inside of it, a dynamic, strong type system. The dynamic, strong type system is like quarks, and the more complicated type system is like that zoo of fundamental particles. If you really want to understand particle physics, study quarks, not protons and neutrons and all the other composite particles. If you want to do particle physics, you need to do quarks, not protons and neutrons and such. In this way, the strong, dynamic system is the only type system you ever need to learn. Once you’ve tamed that, your job is done.

Which is why I hardly ever see a type error in Python. The one case that seems to arise is when I have more than a few arguments to a function, and I forget the order. The solution is simple: Don’t use ordered arguments!

 

Why You Should Never Do Multithreaded Programming November 3, 2016

Posted by PythonGuy in Uncategorized.
add a comment

TL;DR: As your software gets popular, you’ll want to scale your program. The temptation is to use multi-threaded programming techniques, but this is a very complicated paradigm that has bugs that are vicious and very difficult to find and remove. As your software grows more, you’ll have to adopt the multi-process paradigm. So just skip multi-threaded programming and go to multi-process programming. If you need multiple threads, use greenlets instead.

I was first introduced to multi-threaded programming sometime in the 90s. I recall reading about it in C/C++ programming books. It was wonderfully complicated, just the sort of thing to excite my teenage mind. As I started using it, however, I soon realized that what the authors of the book said about it wasn’t a joke. Deadlock and other issues tarnished the image of multi-threaded programming.

But what was I to do? I had a single computer running a single core, and if I wanted to do work really fast, I had to use multi-threaded programming techniques.

Over time, I learned about the asynchronous programming model. In this model, you don’t have real threads. Instead, you have a central function that dispatches to various other functions which do a little bit of work on a task and return control to the central function. If the central function is built around a select() or poll() function, and they each processed a file handle, you could handle a very large number of simultaneous connections with a single thread. Provided that there wasn’t a lot of data flowing and there wasn’t a lot of work to be done, you could get tremendous throughput compared to multi-threaded programming.

I read about how Google was using Python a few years after that. They would write simple, single-threaded applications. Then when they needed the apps to do more, they would just start up more machines and run the app on those machines, and then use load-balancing techniques to direct the workload to each machine. Programming in this style is called multi-process programming, what we’ve been doing all along since Windows 95 introduced us modern operating systems. The difference was that you couldn’t share memory. You had to open pipes over the network to communicate. That was your only option.

Of course, if the two processes are running on the same machine, and you match the number of processes to the number of cores, it’s like having several little virtual machines inside of a single machine. As long as you only communicated with pipes, you could see how to take these processes and move them to other services. Network latency was higher than local UNIX socket latency, but that could be dealt with.

In short, multi-threaded programming is simply unnecessary. There is no reason to use it. If you feel tempted to do it, just use multi-process programming knowing that your additional processes can be moved to another machine.

A lot of work has been done figuring out how to make one process do more work. Before I continue, let me explain why this work doesn’t really matter.

When you have an exponentially growing demand for your program, like, for instance, it is doubling every 6 months or something like that, then you’ll need to scale your operations as well. If it takes 10 machines to handle 10,000 units, how many machines to handle 20,000 units? The answer is simple: About 20. And 40,000 units? Now you need 40 machines. Each time you double, you double the number of machines.

Let’s say I could fix the code so it ran twice as fast, or rather, needed half as many compute resources. All I’ve done is bought some time. See, when you’re handling 20,000 units with 20 machines, I’m handling the same with 10 machines. 6 months from now, I’ll be at 20, and you’ll be at 40. Another 6 months and we’ll both double again. All I’ve done is said I’m going to hold off on purchasing more resources for 6 months.

When you look at things this way, then the amount of time you buy by making your code more efficient works out to be a linear unit of time with diminishing returns. Make my program 2x as good, and I buy 6 months. 4x as good gives me only 6 more months. 8x as good gives me only 6 more months. And 16x as good buys me a year.

If it takes me 6 months to make my code 2x as good, I’m wasting my time. I’d be better off writing a new program and buying more machines.

That said, some people aren’t running cash cows and do care about how much their services cost because they didn’t do a good job estimating how much profit there would be. They say things to their workers like, “We can’t afford to buy twice as many machines”. What changed is not the code quality, but the profit margin. They may have been making $1 per unit at 10,000 units, but at 20,000 units, they are only making $0.50. And so what they’re really saying is, “We have to stop growing as a company.” If your company leaders are saying that, it’s time to find a new job.

Of course, sometimes they say things like, “We can’t afford to buy those machines today (because our credit limit won’t allow it or we don’t have investors), but we will have that money tomorrow (because we know it is worth it.)” When they say things like this, then and only then should you waste time making your program run faster.

All the above said, we have learned some seriously awesome tricks to making programs work faster. It’s called asynchronous programming, and it allows a single process to behave like 10,000 processes. That’s 2x applied to itself 13 times. So if you’re growing at a rate of 2x every 6 months, that will buy you 6 or 7 years, which is longer than I’ve lasted at pretty much every job I’ve ever worked at. That’s like several lifecycles of technologies on the internet. In Python, the easiest way to manage asynchronous programming is with greenlets. (Avoid twisted. Seriously.)

So do yourself a favor, learn about greenlets, and learn how you can program in a synchronous style asynchronously.

But don’t bother learning about multi-threaded programming. Just know that you never, ever want to go down that road.

Makefile November 3, 2016

Posted by PythonGuy in Uncategorized.
add a comment

Now that I am older, I see things that the younger programmers don’t see. Among those things I see are people not using software for the wrong reasons. Just because something is old and complicated doesn’t mean it isn’t the right tool for you.

The ancient and venerated Makefile is, in my mind, the most wonderful tool you can use to make your software development smoother and simpler. Ignore it at your peril. Avoid it at your detriment.

You know how you tell your co-workers about how to do stuff? Rather than tell them, just put it into a Makefile. That way, they don’t have to remember anything but “make <the thing you want to do>”. And if they forget what they can do, they can just pop open the Makefile and see what the various targets are and how they work.

The Makefile syntax is cryptic and is very uninviting. Any sufficiently useful Makefile will have commands that are difficult to decipher. However, with a little practice, and a few minutes with the documentation on GNU make, you too can be an expert. The best part is you know that the Makefile syntax isn’t going to change anytime soon. What you learn about it today will probably be useful 30 years from now.

I won’t bother explaining how make works. There are several similar projects out there, but none of them compare to the power of the Makefile. Why do these other programs exist when Makefile is so much older? I believe it is because of a few reasons.

  1. People didn’t take the time to learn make. I can’t blame them, but really, there is a reason why it is so complicated. As you learn to use make, you will grow to appreciate that complexity.
  2. Vendor lock-in. If the vendor can get you to depend on their make clone, then you’ll never go back to using the free and open source make system. Once they’ve got you in their web, you’re not going anywhere except to the land of regret and sorrows.
  3. People don’t believe make should be so complicated. They come up with their own solutions that end up being even more complicated.

The philosophy behind make is rather simple.

  • You have targets, usually files you want to build. However, targets can be “virtual”, meaning that once the target is complete, there is no file leftover.
  • Targets depend on requirements. These requirements are also targets. Ultimately, the requirements boils down to a set of files or imaginary virtual targets.
  • You have recipes that list the commands needed to build the target from the requirements.
  • You have a ton of configuration parameters that will allow you to create a flexible Makefile that is truly cross-compatible against a wide variety of platforms. These can be a rather difficult chore to get right, so typically there is the infamous “configure” script that will figure out where you keep everything on your system and make sure you have the right things installed and they are the right versions.

When you use make, you can depend on things like Ruby’s gems or Python’s pip system. You can literally put make on top of anything, since all it does is call programs with parameters. For instance, I’ve seen Makefiles in Java projects with Ant for a configuration system.

Make is a lingua franca. Or, in more modern terms, it is the English Language of building computer software. It is so universally adopted that it becomes the common way to express how to build things.

Do yourself and your co-workers and co-collaborators a favor. Write a Makefile today. Start using it. Start learning how they work. Study other people’s Makefiles. This is not wasted effort.

A Sane REST October 3, 2016

Posted by PythonGuy in Uncategorized.
add a comment

I’ve complained about REST numerous times, but I think I have a way of making it sane and useful.

My first design goal was compatibility with the huge variety of REST clients and servers out there. By “compatible”, I mean that it should work more or less, or be super-imposable on the system. That is, the system I describe here should be at least a subset of the features the clients or servers provide. I want to take advantage of all the tools and resources out there, but I don’t want to use peculiar features of one or the other.

The second design goal was simplicity. I want people to “get” it at a fundamental level, and see how to make their REST server compatible without having to think too hard.

Here’s the overview. I’m going to use the customer object as an example.

  1. There are three types of resources: Group Resources, Item Resources, and Method Resources.
  2. Group Resources live at /customers/. You can GET or POST. You cannot DELETE.
    1. GET will grab a subset of the customers. I call this “find”. You can specify various filter parameters based on the attributes of the object (you can get inventive here), but you must support the following parameters.
      • page: The page number, 1-based. This can be specified.
      • items_per_page: The number of items per page. This can be specified.
    2. POST will add an item.
  3. Item Resources live at /customers/:id. You can GET, PUT, or DELETE. Optionally, you can PATCH, but you really shouldn’t have objects with so many attributes that it’s necessary.
    1. GET will fetch the item.
    2. PUT will update the item’s attributes.
    3. DELETE will remove the item.
  4. Method Resources live at /email/. Method names are typically verbs. You can GET or POST. Either way, it’s the same.

Parameters are specified either in the URL GET parameters or the POST/PUT bodies. For POST and PUT, you can also specify parameters via the URL.

It is encouraged to use JSON as the POST/PUT body. However, if you use form encoded parameters, the following convention applies:

  • If a parameter is specified once, it is considered a single value.
  • If a parameter is specified multiple times, it is considered part of a list.

Don’t make it any more complicated than that.

I haven’t really decided what conventions I’ll use for documentation and such. I assume that Swagger is good enough and use those conventions.

Only the following HTTP status codes are allowed:

  • 200 means it all went OK. The response is JSON-encoded.
    • GET to a Group Resource has the following parameters:
      • page: The page number, 1-based.
      • items_per_page: The number of items per page.
      • first_item: The index of the first item, 0-based. This is (page-1)*items_per_page.
      • last_item: The index of the last item+1, 0-based. This is (page)*items_per_page, or total_items, whichever is less.
      • total_items: The total number of items (if available).
      • next_page: If there is a next page, the next page number goes here.
      • prev_page: If this is not page 1, the previous page number goes here.
      • results: The results, a list of items.
    • POST to a Group Resource returns the new object, as for the GET to the Item Resource.
    • GET to an Item Resource returns the object.
    • PUT to an Item Resource returns the object.
    • DELETE to an Item Resource returns “” or {}
    • GET or POST to a Method Resource returns the response.
  • 404 means the resource doesn’t exist or you don’t have authorization to access it.
  • 401 means you are not authenticated and you need to be to access those resources. This is a cue to the client that it needs to log in again.
  • 500 is for all other errors.
  • For all the errors, a JSON-encoded object with the following parameters are returned:
    • code: The error code or name. In Python, this would be the exception name.
    • description: A human-readable description of the error, hopefully with suggestions on how to fix it.
    • stacktrace: In non-production environments, this would have the stacktrace.

That’s all there is to it. Pretty simple and straightforward, and it should be compatible with most clients and servers out there.

URI vs. URL vs. URN October 3, 2016

Posted by PythonGuy in Uncategorized.
add a comment

Sometimes people confuse URLs with URIs.

Here’s how I keep track of what the difference is.

A URL is a string of text that points you to something on the internet you can download. The “L” means “Location”.

A URN is a string of text that points you to something that exists in the real world but not on the internet. The “N” means “Name”. These are things like the ISBN of a book.

A URI is a string of text that can be a URL or URN.

I know this isn’t very precise, but it should be helpful.

Please don’t use URI when you mean URL. If you are describing something on your server, then it’s a URL. If you’re describing an ephemeral concept that can’t exist on a server, then use URI.

Bjoern September 26, 2016

Posted by PythonGuy in Uncategorized.
add a comment

A recent blog post where the speeds of various WSGI servers was compared piqued my interest. Among the most surprising results a new WSGI server named Bjoern. It is written in C and compatible with Python 2.7. Taking advantage of the famous libev, it provides unparalleled performance. It is difficult to imagine how you can make python servers run any faster.

If you don’t know about libevent and libev, you really should be making comments on performance. Nowadays, the best way to get performance out of your hardware with coroutines and microthreads. Multithreading is now a dinosaur of a foregone era, making the GIL completely irrelevant.

Thinking about the problem I am trying to solve, I think a WSGI server is exactly what I need. I can write my own framework for my servers rather easily. So I’m investigating Bjoern and others at the moment.

Writing PyPy Compatible Python September 24, 2016

Posted by PythonGuy in Uncategorized.
add a comment

Increasing, I see projects boasting support for PyPy. It’s time for a refresher on what PyPy is and why you should be writing PyPy-compliant code.

PyPy, as opposed to PyPI (the Python Package Index), is a project aiming to compile Python with Python. This sounds absurd and make a practical joke of some sort, but it’s important. When you consider the massive success that Google’s V8 Engine for Javascript has been, you wonder why Python can’t do the same thing, and then you realize that if you could just compile Python to native machine code (or any other kind of code) with Python code, then you would be well on your way to achieving V8’s performance, and maybe beating it because the compiler is written in Python, not C, and is thus easier to understand and iterate on.

PyPy has been around for a long time and has been very visible. It has struggled to achieve the level of performance of Python itself (called CPython since it is the Python engine written in C) but lately, it is increasingly showing that not only can it meet CPython, it can beat it.

Now, Python, the language, has a problem. It was written to make the job of the programmer super-easy, but in doing so, has made it incredibly difficult to turn it into machine code. The Python VM makes it all possible, but we don’t want to make a faster VM, we want to take Python code and turn it into the lowest-level machine code, highly optimized for the CPU it is running on. In order to make that happen, you have to modify the definition of Python slightly, or rather, embrace some differences to the CPython standard implementation.

This page documents the changes you need to make.It used to be that you couldn’t do things like assign different types to a variable, but those days seem to be long gone. Now, the only major difference is that things are not garbage collected like they are in CPython. So you need to explicitly close files and generators when you are done using them. Thankfully, the “with” statement makes this trivial. It is a good pattern you should always be using.

There are some other low-level details you probably won’t run into listed here. If you go through the list, you can see how far PyPy has come.

If you want to take advantage of PyPy’s speed, you’ll need to write your code a certain way. This page lists some of the ways you can code your program to make it run a lot faster in PyPy. Basically, it’s all about being aware of what’s happening at the silicon level and working with that. Notably, the first kind of optimization you should do is choosing the right algorithm. You might make your O(N**2) algorithm run as fast as you like, it will still lose to even poorly optimized O(N log N) algorithms when you have large datasets.

PyPy is reported to give about a 7x performance boost. It is production ready, today.

Two other projects to keep your eye on:

  • Nuitka, which complies Python to C++.
  • Pyston, which uses LLVM.

Finally, a word on the GIL. People talk about the GIL as if it’s a bad thing. It’s not. It’s shifting the cost of multi-threading from every object access to the entire process. If you were to naively remove the GIL and add lock-checks on every access, Python would run a bajillion times slower in single-threaded mode. Perhaps someone will figure out a way to get the best of both worlds, but I highly doubt it.

If the GIL is your bottleneck, use multiple processes, invest in building a SOA architecture, and remind yourself that eventually, you’ll have to start running your jobs on more than one server. In other words, with other languages that support multi-threading well, your evolution is single-threaded -> multi-threaded -> multi-process. With Python, we just cut out the middle man, skip multi-threading, and start investing in multi-process development sooner rather than later. In the end, we get to ignore a whole class of errors that are notoriously difficult to detect, diagnose, and repair.

Guido van Rossum has basically said, “Remove the GIL over my dead body (or by proving me wrong.)” Folks, if you can’t show Guido that you can remove the GIL and make Python better, you have no business saying that the GIL is the problem.

 

Picking the Best Python Web Framework September 24, 2016

Posted by PythonGuy in Uncategorized.
2 comments

I’m at the point in my job where I get to pick an entirely new web framework for Python. There are so many out there, it’s really hard to choose.

The first choice I need to look at is whether I need a “full” web framework, or a “minimal” web framework. But first, what do I mean by “web framework”?

A Web Framework is a library that provides the following features:

  • A way of mapping URLs to methods
  • A way of maintaining state across web requests (IE, database connections.)
  • A way of rendering HTML templates.
  • Several other goodies that you typically use in a web server.

A full web framework provides all of the above and lots more. This would be frameworks like Pyramid, Django, and Turbogears.

A minimal web framework provides a lot less. These are things like CherryPy and Flask.

By comparison, things like gevent, twisted, and tornado are not web frameworks. They are simply web servers. You’ll have to build the framework bits yourself.

Since I’m not building a user-facing website but a backend REST server, I don’t need a full web framework. This means Django is out of the question, and Pyramid and Turbogears are less desirable because they are so big.

The next question to consider is what version of Python do I intend to use, and whether I want to support things like Cython and PyPy. Since I am interested in performance, I will likely want to experiment with PyPy, anything that doesn’t run on PyPy is out of the question. I also want to support Python 2 AND 3. My team is transitioning to Python 3 so I don’t want to hold them back with my choice.

I then consider whether I need to interface with a database. If so, then I always choose SQLAlchemy. For those who are not familiar with SQLAlchemy, you have no idea what you are missing. Once you experience SQLAlchemy, you will never, ever want to interface with a database in any other way ever again. SQLAlchemy provides features that are all but impossible in other languages, and it does it seamlessly and effortlessly.

Thankfully, SQLAlchemy is a very well-maintained and mature product, so it supports Python 2 and 3 and PyPy.

Now that I’ve narrowed down the field quite a bit, I need to consider the last requirement. Since I’ll be competing with Node.js and other languages that provide coroutines, I want to be able to use gevent. Gevent is one of those hidden gems in Python that no one seems to know about. They say, “Python doesn’t support coroutines” but with gevent, it really does and it is awesome. Gevent makes Python competitive with many languages. PyPy seals the deal and makes Python the best language ever.

And now, let’s look at my options.

  • CherryPy, which has been around a long time and been a favorite of mine. I like the logo and the name, but it is engineered very well and supports all of the features I need. CherryPy also supports SSL natively.
  • Pylons is old, stable, and incredibly powerful. I have spent a lot of time in Pylons and I loved every minute of it.
  • Pyramid is new and I’ve tried to use it a few times but I’ve always chosen Pylons instead. Maybe I should give it another shot.
  • web2py is not Python 3 compatible.
  • Wheezy.web’s claim to fame was being the fastest back in 2012. It hasn’t been updated since 2015.
  • Bottle seems intriguing and simple. I wonder whether it supports PyPy though.
  • Flask is very popular and deserves inspection. It doesn’t seem to support Python 3 well, though. Nor does it seem to support PyPy.
  • Hug is also intriguing.
  • Falcon is what Hug is built on. So I need to take a look.

In terms of a web server, I’m going to use something off the shelf. Here are my options:

  • Nginx. I have a long history with Nginx and I really don’t like it.
  • Apache. People don’t like Apache for some reason. It is not as fast as Nginx but, in my book, much easier to configure and use. Also, they don’t hide useful features behind a paywall. Apache also has mod_wsgi.
  • Gunicorn is almost synonymous with Python web development. I’ll have to consider it.
  • Spawning seems interesting. It is worthy of more investigation.
  • Pylon’s Waitress also appears intriguing. It requires more investigation.

I am going to continue to investigate and I’ll try to keep my blog updated with my latest findings.

I should add: The reason why Python has so many web frameworks is because Python is awesome. It’s not hard for people to try out new ideas and get them production ready, and so there are always going to be tons of options out there, and they are going to be quite different from each other. This is overwhelming to some, but I prefer choice and I love experimenting with new things.