jump to navigation

Modern MUD May 22, 2018

Posted by PythonGuy in Uncategorized.
add a comment

Thinking about how I would write a MUD nowadays, I realized that they are quite useful for stateful applications.

Regardless, one of the interesting things is the idea that you have to be able to store the state of your MUD and restore from a previously stored state. Using greenlets and such to track the states of the various on-going events makes that incredibly difficult. Instead, you’ll want to introduce a data structure that represents the state of the thing.

Really, what we want is a language that can be stored to disk, and when recovered, go back to its previous state, despite whatever changes have happened around it. Obviously, this language can’t talk to open file handles or anything to do with the OS at all. It would need a way of communicating with the outside world, and something like a pubsub service would be ideal.

Writing in this language, the code for the world would be easier to describe, but you’d have to keep in mind that you have thousand if not millions of “threads” running at once.

Anyway, food for thought.

Python’s mock.call and floats June 27, 2017

Posted by PythonGuy in Uncategorized.
add a comment

You may have seen an error like this:

AssertionError: [call(0.2),
 call(0.30000000000000004),
 call(0.5),
 call(0.9),
 call(1.7000000000000002),
 call(3.3000000000000003)] != [call(0.2), call(0.3), call(0.5), call(0.9), call(1.7), call(3.3)]

This is one of the problems with comparing floats in Python (or any language, for that matter.)

You need to do some rounding if you want to compare floats in a meaningful way.

Try this instead:

for (actual, expected) in zip(
        self.sleep.call_args_list,
        (0.2, 0.3, 0.5, 0.9, 1.7, 3.3)):
    self.assertAlmostEqual(actual[0][0], expected)

It’s a burden. If you need to do this a lot, then you can write your own comparator function.

Python Isn’t Why Your Code is Slow June 13, 2017

Posted by PythonGuy in Uncategorized.
add a comment

Oftentimes, we blame our programming language for the slowness of our programs.

A recent thread in /r/programming shows that it’s not the language — it’s the program.

With a little work, you can make a Python “yes” that is almost as fast as GNU’s “yes”.

Comment
byu/kjensenxz from discussion
inprogramming

The less, kiddos, is that when your program is slow, it’s your program, not the tools. Now, the tools might make things harder, but your job is to write the best program, so you have to work around that.

Types, types, types! February 27, 2017

Posted by PythonGuy in Uncategorized.
add a comment

Of all the arguments against Python, the one that seems to stick at all nowadays is typing. They think Python’s type system is all wrong, that we should instead embrace an archaic type system written back in the day when computers had very limited CPU cycles and memory.

I think I have an argument that should completely obliterate their point of view.

They like their type system because it gives them:

  • Speed
  • Reliability and predictability
  • Control

OK, I give you that. Yes, when the compiler knows in advance what type a variable is, when you can write simple programs to ensure that you don’t have any type mismatch (really, you just push that error out of the program itself), and you feel like you are somehow managing complexity. You win that part of the debate.

However, what do you do with generic types?

If they don’t understand why generic types are important, then you bring up templates, generic functions or classes that can be applied to any kind of type that provides a certain basic level of functionality.

See, when you bring up generics in these type systems, it is a whole ball of wax that they know better than to tangle with. They roll their eyes, they say things like, “When would I ever use generics? They’re so error-prone and a source of so many problems.”

The “Aha!” moment is when they realize that every type system must support the generic type, and their type system does it poorly.

Something like Python, with its dynamic-strong paradigm, handles them effortlessly.

No, you can’t predict how your program will behave, but could you ever, really? All you can do is break it down into little bits that you can test, test those bits (AKA, unit tests) and then see how the parts fit together (AKA integration tests). I mean, you’re going to write your tests anyway, right? What fool would think just because their program compiles it must be correct?

In short: You’re going to have to deal with generic types eventually. It’s better to use a language that can handle them very well, because once you’ve handled generic types, you’ve handled ALL types.

Someone asked me about programming languages February 27, 2017

Posted by PythonGuy in Uncategorized.
add a comment
I won’t post the question or the context. I don’t think it’s too important. The response I wrote follows.
It’s hard to describe what it is a programming language is supposed to be.

Imagine you were an engineer in charge of designing a bridge. You want to build a bridge that goes from point A to point B and can carry a certain amount of weight and such.

Now, of the umpteen billion possible bridges you can build, you’re going to settle on one design. When you choose what makes that particular design the “best”, you’re going to factor in things like the cost of materials, the kinds of technology the local construction companies have available to them, what kind of maintenance will be performed, etc…

Programming languages are like bridges. They go from point A (ideas in your head) to point B (actual running code.) When you consider what programming language to use, you are not so interested in whether the programming language can do the job because they all can, as long as they are Turing complete. You’re going to be interested, instead, in things like how much work it takes to make the program, who is available to maintain or extend the code base, and what you will have to do to find and fix bugs that may or may not arise in the future.

Of all the attributes of a programming language, the most important to me, after programming my whole life and spending 17 years being paid to program, is complexity. If your programming language is complex, then I know it’s going to be really hard to write a correct program, very hard to maintain, and it will cost a lot of money to find people smart enough to keep it working.

But a programming language can be too “simple” as well, meaning that there aren’t enough features that I would like to have available. For instance, Lua is a great, simple language, but it is too simple and doesn’t handle some stuff that other languages provide, and I find that very annoying. And the simplest language of all? Machine code. It’s so simple even CPUs can understand it.

It’s all about balance.

When I talk about “Vanilla C” vs. “C/C++”, I am describing a problem I saw arise in the C/C++ community. C itself is a rather simple language, pretty well-defined with only a few weird cases that arise rarely. The subset of C I call “Vanilla C” is plenty to get any job done right the first time. It takes quite a bit of work, and you have to be a bit verbose, but it gets the job done. C/C++ introduce a number of new concepts designed to make my life better, but it just ended up making my job harder. I saw teams write down rules about “Please don’t use feature X” and such. It seemed every new thing brought in by C++ was disfavored, except namespaces, and even then, it would be confusing.

The reason people use C at all nowadays is for speed and simplicity. Well, now that CPUs are not the bottleneck for the vast majority of problems we are challenged with (even video games!) C or any of its cousins are not the solution we are looking for. Besides, we learn in college that oftentimes, the issue is we’re using an O(n^2) solution when a pretty good O(n) solution exists, so no amount of C code in O(n^2) will make that beat Python in O(n) with sufficiently large numbers.

And one more thing: When you write Python, you can have the computer compile it down to the same thing as C, and make it run as fast as C. And with JIT, some algorithms are even faster.

The world has changed a lot since the 90s. It’s time to embrace the new paradigm shift where the programming language can do a lot more work for you, all without creating a very complex system.

An Analogy for Types January 14, 2017

Posted by PythonGuy in Uncategorized.
1 comment so far

Sometimes the best way to teach a principle is to share an analogy.

Let’s come up with an analogy of types. Let’s say your program is a set of instructions you give to Amelia Bedelia.

Now, suppose Amelia Bedelia needed you to spell out what type of everything you refer to. “Please, Amelia Bedelia, bake a cake” becomes “bake (a function which takes a recipe for something bakeable) a cake (an instance of the cake class, which is a bakeable item).

Versus, “Bake a cake”.

See the point?

Now, if you told Amelia Bedelia “bake a shoe”, in a dynamic, strong typed system, she would look for the recipe for baking a shoe, and not finding one, would say, “I can’t do that. I can’t find the recipe for a shoe.”

Either way, the end result is the same. The question is when does Amelia Bedelia tell you she can’t do it: right after you tell her, or when she realizes she can’t do it.

But remember how hard it was to tell the strict/weakly typed Amelia Bedelia to bake a cake?

End.

The Code Development Lifecycle January 13, 2017

Posted by PythonGuy in Uncategorized.
add a comment
  1. Clearly identify the problem.
  2. Document the problem.
  3. Identify multiple solutions to the problem.
  4. Document the solutions.
  5. Choose the best solution.
  6. Document the reasons why you think the solution you chose is best.
  7. Write unit tests to demonstrate and reproduce the problem.
  8. Document the unit tests.
  9. Write integration tests to demonstrate and reproduce the problem.
  10. Document the integration tests.
  11. Correct the code to make the unit tests pass.
  12. Document the code.
  13. Code review for style and consistency.
  14. Deploy to integration system.
  15. Document the deployment.
  16. Run the code in the larger system against integration tests.
  17. When all tests pass, deploy to production.
  18. Document the deployment.

Notes:

  • Identifying the problem requires the art of science. Considering your observations, propose a theory. Try to disprove that theory with tests. The theory that survives all critical tests may be correct, but please don’t limit your imagination. As you gain more experience as a developer, you’re going to see more kinds of problems so you don’t have to be so imaginative.
  • Document everything. Why? Because it helps you move on with your life and it helps the poor schmuck who has to keep up with you or follow you.
  • Identify multiple solutions. If you only have one solution in mind, you have a bad imagination.
  • Choose the best solution. What is “best”? That depends on you and your team values. You should have a discussion with your team on what is truly important.
  • Unit tests test only one function, and not even the code that the function calls. Mock aggressively. (This is where dynamic, strong type systems shine best.)
  • Integration tests test that two systems interface properly. When you have two systems that interface properly, you have a new system that includes them both that needs to be tested with other systems. Integration tests take a long time to run and are usually quite complicated.
  • When you write a test, make sure it fails. If it doesn’t fail, it is a bad test.
  • You write as much code to make the tests pass, and no more. If you want to add more, you need to go back to step 1.
  • Code review will never catch bugs. Don’t try to catch bugs in code review. Instead, check that the developer has been keeping up best practices and ensure that this is code you want to maintain in the long run.

 

Static Typing January 13, 2017

Posted by PythonGuy in Uncategorized.
2 comments

One of the hottest debates in programming, even today, is typing. By that, of course, I mean variable types, not the sort of typing you do on the keyboard, although the editor wars are still raging. (ViM is the best by the way.)

I’d like to try and approach this discussion with some logic. Before I engage the logic muscles, though, let me announce that I have written thousands and thousands and maybe millions of lines of code. I don’t know. I have my keyboard I’ve been using for the past four years and the letter “s” is completely gone and “a” and a few others are on their way out. I am paid to write code, I am paid to make other people’s code work, and I am paid to tell other engineers how to write their code. I’m a senior engineer.

We had, about six months ago, a very lively debate about typing. And we decided to go with Python. I think that was the right decision. I have an opinion based on lots of experience, and I think I am right, even without any logic.

But let’s set that aside.

Let’s do this logically. Here are a list of logical statements.

  1. “Variable” is any named entity in your program that stores a “value”. It can be anything from integers, floats, strings, complex data structures like arrays or lists, and even functions and classes and modules and stacktraces.
  2. The “type” of a value tells the programmer and program alike how the value behaves. Certain behavior varies based on the type. For instance, you don’t add integers the same way as you add floats.
  3. “Strong typing” means you can tell what type a value is with no other information than the value itself. Python is an example of a “strongly typed” language, as every value is stored in memory as a PyObject, and the Python language can tell you the type of any value.
  4. “Weak typing” means you cannot tell what type a value is without additional information. C/C++ are good examples of this, as you could be looking at an int or a float or anything else. Without type declarations in the language itself, it would be impossible to keep things straight.
  5. In many languages, variables hold information on the type of the value they store. However, this is not true for all languages. For instance, in Python, the variable is simply a name-value pair, stored in a dict.
  6. “Static typing” means that you cannot change the type of the value in a variable. Some languages allow you to assign a derived class of the type of the variable, others are more strict.
  7. “Dynamic typing” means any variable can hold any value. Python is fully dynamic, but many languages are partially dynamic as they cannot store all values in a variable or some variables do contain type information.
  8. “Explicit typing” means the programmer must tell the computer what type each variable is.
  9. “Implicit typing” means the programmer does not tell the computer what type each variable is. The computer can infer the types by simple analysis.
  10. “Type system” is the way types are treated by a particular language, and includes the language used to describe its types.
  11. “Simple” and “complex” refer to the number of components and the number of sub-components in those components. IE, the function “foo(bar, baz)” is more complex than “foo(bar)” because it takes 2 parameters. (Parameters are sub-components of a function.)
  12. “Correct code” means that the code accomplishes the purpose it was intended to accomplish. “Incorrect code” means it is not correct. Note that simply because a program compiles does not mean it is correct. There must be some human element to judge the correctness of the program.
  13. Simple is better than complex, but the code must be correct for it to matter at all.
  14. Explicit is better than implicit sense it helps people unfamiliar with a system understand how it works.
  15. Explicit typing requires a type system that is explicit. That is, the programmer must spell out what the types are using the language the type system uses.
  16. Implicit typing merely hides the type system from the programmer. However, there is still a type system underneath that the programmer needs to be aware of when he violates the constraints of the system.
  17. Dynamically typed languages typically have a simpler type system than static type systems. All variables can be any type of value so there are no constraints like there are in static systems.
  18. Statically typed languages must have a more complicated type system. This is because it imposes at least one constraint: Variables cannot hold a different type of value.
  19. Weakly typed languages require static typing. This is because it is impossible to manage the values without knowing what types they are, and the values themselves do not contain that information.
  20. Strongly typed languages do not require static typing. This is because it is additional information that should at least be consistent with each other based on the type system.
  21. Complex systems are more difficult to understand and manipulate than simple systems.
  22. There is a class of error called “type mismatch errors”. They are introduced when the programmer creates incorrect code that improperly handles the values in question.
  23. Many static type systems eliminate or at least reduce type mismatch errors at compile time.
  24. Strong, dynamic type systems do not  eliminate or reduce type mismatch errors at compile time.
  25. Whether or not errors are caught at compile time or run time doesn’t matter as long as the errors are caught.
  26. In order to prove your software correct, you must demonstrate that it behaves as expected. Only the simplest of programs can be analyzed by reading the code.
  27. 26 requires writing what we call “tests”. The code is run against the test, and if it passes, then it is assumed the code is correct. If the test is not able to detect errors in the code, then it is not a sufficient test.
  28. 23 & 24 imply that the compiler is doing some of the tests that would be handled in the testing phase.
  29. Since explicit is better than implicit, implicitly testing the code with the compiler is worse than explicitly testing the code with tests.
  30. Therefore, dynamic, strong typing is best.

Addendum

This is really the argument “strong, dynamic type systems are much simpler than any other type system; The benefit of a more complicated type system is that only one kind of error is detected during the compilation phase rather than the test phase, but this is a very small benefit compared to the cost of the complexity of having a type system at all. Therefore, in all cases, strong, dynamic type systems are best.” I’ve just spelled out the assumptions and the logic behind it all.

I should mention that people who write code but do not write sufficient tests are cheating. Until you write tests, you cannot understand whether your code is right or wrong. The compiler can’t tell you anything. It exists to convert your code into machine instructions, that’s it. You still have to test those machine instructions for correctness.

With strong, dynamic systems you do not write tests that have already been written. For instance, I don’t need to write a test for what happens when you add a string and an integer in Python. Those tests already exist, and cover every possible combination of types. When a new type is introduced, it should include tests that would plug it into the ecosystem. IE, if you want it to have addition property, then you need to write the tests that will show it adds in some cases but not others.

Finally, I want to mention what I think is the most obvious evidence against typing systems. You know how we used to have a big zoo of fundamental particles until physicists were able to figure out quarks? See, if a system is composed of smaller systems, learn those smaller systems and ignore the bigger system, and you’ll understand the bigger system. Every sufficiently complicated type system has, inside of it, a dynamic, strong type system. The dynamic, strong type system is like quarks, and the more complicated type system is like that zoo of fundamental particles. If you really want to understand particle physics, study quarks, not protons and neutrons and all the other composite particles. If you want to do particle physics, you need to do quarks, not protons and neutrons and such. In this way, the strong, dynamic system is the only type system you ever need to learn. Once you’ve tamed that, your job is done.

Which is why I hardly ever see a type error in Python. The one case that seems to arise is when I have more than a few arguments to a function, and I forget the order. The solution is simple: Don’t use ordered arguments!

 

Why You Should Never Do Multithreaded Programming November 3, 2016

Posted by PythonGuy in Uncategorized.
add a comment

TL;DR: As your software gets popular, you’ll want to scale your program. The temptation is to use multi-threaded programming techniques, but this is a very complicated paradigm that has bugs that are vicious and very difficult to find and remove. As your software grows more, you’ll have to adopt the multi-process paradigm. So just skip multi-threaded programming and go to multi-process programming. If you need multiple threads, use greenlets instead.

I was first introduced to multi-threaded programming sometime in the 90s. I recall reading about it in C/C++ programming books. It was wonderfully complicated, just the sort of thing to excite my teenage mind. As I started using it, however, I soon realized that what the authors of the book said about it wasn’t a joke. Deadlock and other issues tarnished the image of multi-threaded programming.

But what was I to do? I had a single computer running a single core, and if I wanted to do work really fast, I had to use multi-threaded programming techniques.

Over time, I learned about the asynchronous programming model. In this model, you don’t have real threads. Instead, you have a central function that dispatches to various other functions which do a little bit of work on a task and return control to the central function. If the central function is built around a select() or poll() function, and they each processed a file handle, you could handle a very large number of simultaneous connections with a single thread. Provided that there wasn’t a lot of data flowing and there wasn’t a lot of work to be done, you could get tremendous throughput compared to multi-threaded programming.

I read about how Google was using Python a few years after that. They would write simple, single-threaded applications. Then when they needed the apps to do more, they would just start up more machines and run the app on those machines, and then use load-balancing techniques to direct the workload to each machine. Programming in this style is called multi-process programming, what we’ve been doing all along since Windows 95 introduced us modern operating systems. The difference was that you couldn’t share memory. You had to open pipes over the network to communicate. That was your only option.

Of course, if the two processes are running on the same machine, and you match the number of processes to the number of cores, it’s like having several little virtual machines inside of a single machine. As long as you only communicated with pipes, you could see how to take these processes and move them to other services. Network latency was higher than local UNIX socket latency, but that could be dealt with.

In short, multi-threaded programming is simply unnecessary. There is no reason to use it. If you feel tempted to do it, just use multi-process programming knowing that your additional processes can be moved to another machine.

A lot of work has been done figuring out how to make one process do more work. Before I continue, let me explain why this work doesn’t really matter.

When you have an exponentially growing demand for your program, like, for instance, it is doubling every 6 months or something like that, then you’ll need to scale your operations as well. If it takes 10 machines to handle 10,000 units, how many machines to handle 20,000 units? The answer is simple: About 20. And 40,000 units? Now you need 40 machines. Each time you double, you double the number of machines.

Let’s say I could fix the code so it ran twice as fast, or rather, needed half as many compute resources. All I’ve done is bought some time. See, when you’re handling 20,000 units with 20 machines, I’m handling the same with 10 machines. 6 months from now, I’ll be at 20, and you’ll be at 40. Another 6 months and we’ll both double again. All I’ve done is said I’m going to hold off on purchasing more resources for 6 months.

When you look at things this way, then the amount of time you buy by making your code more efficient works out to be a linear unit of time with diminishing returns. Make my program 2x as good, and I buy 6 months. 4x as good gives me only 6 more months. 8x as good gives me only 6 more months. And 16x as good buys me a year.

If it takes me 6 months to make my code 2x as good, I’m wasting my time. I’d be better off writing a new program and buying more machines.

That said, some people aren’t running cash cows and do care about how much their services cost because they didn’t do a good job estimating how much profit there would be. They say things to their workers like, “We can’t afford to buy twice as many machines”. What changed is not the code quality, but the profit margin. They may have been making $1 per unit at 10,000 units, but at 20,000 units, they are only making $0.50. And so what they’re really saying is, “We have to stop growing as a company.” If your company leaders are saying that, it’s time to find a new job.

Of course, sometimes they say things like, “We can’t afford to buy those machines today (because our credit limit won’t allow it or we don’t have investors), but we will have that money tomorrow (because we know it is worth it.)” When they say things like this, then and only then should you waste time making your program run faster.

All the above said, we have learned some seriously awesome tricks to making programs work faster. It’s called asynchronous programming, and it allows a single process to behave like 10,000 processes. That’s 2x applied to itself 13 times. So if you’re growing at a rate of 2x every 6 months, that will buy you 6 or 7 years, which is longer than I’ve lasted at pretty much every job I’ve ever worked at. That’s like several lifecycles of technologies on the internet. In Python, the easiest way to manage asynchronous programming is with greenlets. (Avoid twisted. Seriously.)

So do yourself a favor, learn about greenlets, and learn how you can program in a synchronous style asynchronously.

But don’t bother learning about multi-threaded programming. Just know that you never, ever want to go down that road.

Makefile November 3, 2016

Posted by PythonGuy in Uncategorized.
add a comment

Now that I am older, I see things that the younger programmers don’t see. Among those things I see are people not using software for the wrong reasons. Just because something is old and complicated doesn’t mean it isn’t the right tool for you.

The ancient and venerated Makefile is, in my mind, the most wonderful tool you can use to make your software development smoother and simpler. Ignore it at your peril. Avoid it at your detriment.

You know how you tell your co-workers about how to do stuff? Rather than tell them, just put it into a Makefile. That way, they don’t have to remember anything but “make <the thing you want to do>”. And if they forget what they can do, they can just pop open the Makefile and see what the various targets are and how they work.

The Makefile syntax is cryptic and is very uninviting. Any sufficiently useful Makefile will have commands that are difficult to decipher. However, with a little practice, and a few minutes with the documentation on GNU make, you too can be an expert. The best part is you know that the Makefile syntax isn’t going to change anytime soon. What you learn about it today will probably be useful 30 years from now.

I won’t bother explaining how make works. There are several similar projects out there, but none of them compare to the power of the Makefile. Why do these other programs exist when Makefile is so much older? I believe it is because of a few reasons.

  1. People didn’t take the time to learn make. I can’t blame them, but really, there is a reason why it is so complicated. As you learn to use make, you will grow to appreciate that complexity.
  2. Vendor lock-in. If the vendor can get you to depend on their make clone, then you’ll never go back to using the free and open source make system. Once they’ve got you in their web, you’re not going anywhere except to the land of regret and sorrows.
  3. People don’t believe make should be so complicated. They come up with their own solutions that end up being even more complicated.

The philosophy behind make is rather simple.

  • You have targets, usually files you want to build. However, targets can be “virtual”, meaning that once the target is complete, there is no file leftover.
  • Targets depend on requirements. These requirements are also targets. Ultimately, the requirements boils down to a set of files or imaginary virtual targets.
  • You have recipes that list the commands needed to build the target from the requirements.
  • You have a ton of configuration parameters that will allow you to create a flexible Makefile that is truly cross-compatible against a wide variety of platforms. These can be a rather difficult chore to get right, so typically there is the infamous “configure” script that will figure out where you keep everything on your system and make sure you have the right things installed and they are the right versions.

When you use make, you can depend on things like Ruby’s gems or Python’s pip system. You can literally put make on top of anything, since all it does is call programs with parameters. For instance, I’ve seen Makefiles in Java projects with Ant for a configuration system.

Make is a lingua franca. Or, in more modern terms, it is the English Language of building computer software. It is so universally adopted that it becomes the common way to express how to build things.

Do yourself and your co-workers and co-collaborators a favor. Write a Makefile today. Start using it. Start learning how they work. Study other people’s Makefiles. This is not wasted effort.