jump to navigation

Why You Should Never Do Multithreaded Programming November 3, 2016

Posted by PythonGuy in Uncategorized.
trackback

TL;DR: As your software gets popular, you’ll want to scale your program. The temptation is to use multi-threaded programming techniques, but this is a very complicated paradigm that has bugs that are vicious and very difficult to find and remove. As your software grows more, you’ll have to adopt the multi-process paradigm. So just skip multi-threaded programming and go to multi-process programming. If you need multiple threads, use greenlets instead.

I was first introduced to multi-threaded programming sometime in the 90s. I recall reading about it in C/C++ programming books. It was wonderfully complicated, just the sort of thing to excite my teenage mind. As I started using it, however, I soon realized that what the authors of the book said about it wasn’t a joke. Deadlock and other issues tarnished the image of multi-threaded programming.

But what was I to do? I had a single computer running a single core, and if I wanted to do work really fast, I had to use multi-threaded programming techniques.

Over time, I learned about the asynchronous programming model. In this model, you don’t have real threads. Instead, you have a central function that dispatches to various other functions which do a little bit of work on a task and return control to the central function. If the central function is built around a select() or poll() function, and they each processed a file handle, you could handle a very large number of simultaneous connections with a single thread. Provided that there wasn’t a lot of data flowing and there wasn’t a lot of work to be done, you could get tremendous throughput compared to multi-threaded programming.

I read about how Google was using Python a few years after that. They would write simple, single-threaded applications. Then when they needed the apps to do more, they would just start up more machines and run the app on those machines, and then use load-balancing techniques to direct the workload to each machine. Programming in this style is called multi-process programming, what we’ve been doing all along since Windows 95 introduced us modern operating systems. The difference was that you couldn’t share memory. You had to open pipes over the network to communicate. That was your only option.

Of course, if the two processes are running on the same machine, and you match the number of processes to the number of cores, it’s like having several little virtual machines inside of a single machine. As long as you only communicated with pipes, you could see how to take these processes and move them to other services. Network latency was higher than local UNIX socket latency, but that could be dealt with.

In short, multi-threaded programming is simply unnecessary. There is no reason to use it. If you feel tempted to do it, just use multi-process programming knowing that your additional processes can be moved to another machine.

A lot of work has been done figuring out how to make one process do more work. Before I continue, let me explain why this work doesn’t really matter.

When you have an exponentially growing demand for your program, like, for instance, it is doubling every 6 months or something like that, then you’ll need to scale your operations as well. If it takes 10 machines to handle 10,000 units, how many machines to handle 20,000 units? The answer is simple: About 20. And 40,000 units? Now you need 40 machines. Each time you double, you double the number of machines.

Let’s say I could fix the code so it ran twice as fast, or rather, needed half as many compute resources. All I’ve done is bought some time. See, when you’re handling 20,000 units with 20 machines, I’m handling the same with 10 machines. 6 months from now, I’ll be at 20, and you’ll be at 40. Another 6 months and we’ll both double again. All I’ve done is said I’m going to hold off on purchasing more resources for 6 months.

When you look at things this way, then the amount of time you buy by making your code more efficient works out to be a linear unit of time with diminishing returns. Make my program 2x as good, and I buy 6 months. 4x as good gives me only 6 more months. 8x as good gives me only 6 more months. And 16x as good buys me a year.

If it takes me 6 months to make my code 2x as good, I’m wasting my time. I’d be better off writing a new program and buying more machines.

That said, some people aren’t running cash cows and do care about how much their services cost because they didn’t do a good job estimating how much profit there would be. They say things to their workers like, “We can’t afford to buy twice as many machines”. What changed is not the code quality, but the profit margin. They may have been making $1 per unit at 10,000 units, but at 20,000 units, they are only making $0.50. And so what they’re really saying is, “We have to stop growing as a company.” If your company leaders are saying that, it’s time to find a new job.

Of course, sometimes they say things like, “We can’t afford to buy those machines today (because our credit limit won’t allow it or we don’t have investors), but we will have that money tomorrow (because we know it is worth it.)” When they say things like this, then and only then should you waste time making your program run faster.

All the above said, we have learned some seriously awesome tricks to making programs work faster. It’s called asynchronous programming, and it allows a single process to behave like 10,000 processes. That’s 2x applied to itself 13 times. So if you’re growing at a rate of 2x every 6 months, that will buy you 6 or 7 years, which is longer than I’ve lasted at pretty much every job I’ve ever worked at. That’s like several lifecycles of technologies on the internet. In Python, the easiest way to manage asynchronous programming is with greenlets. (Avoid twisted. Seriously.)

So do yourself a favor, learn about greenlets, and learn how you can program in a synchronous style asynchronously.

But don’t bother learning about multi-threaded programming. Just know that you never, ever want to go down that road.

Advertisements

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: