jump to navigation

Static Typing January 13, 2017

Posted by PythonGuy in Uncategorized.
trackback

One of the hottest debates in programming, even today, is typing. By that, of course, I mean variable types, not the sort of typing you do on the keyboard, although the editor wars are still raging. (ViM is the best by the way.)

I’d like to try and approach this discussion with some logic. Before I engage the logic muscles, though, let me announce that I have written thousands and thousands and maybe millions of lines of code. I don’t know. I have my keyboard I’ve been using for the past four years and the letter “s” is completely gone and “a” and a few others are on their way out. I am paid to write code, I am paid to make other people’s code work, and I am paid to tell other engineers how to write their code. I’m a senior engineer.

We had, about six months ago, a very lively debate about typing. And we decided to go with Python. I think that was the right decision. I have an opinion based on lots of experience, and I think I am right, even without any logic.

But let’s set that aside.

Let’s do this logically. Here are a list of logical statements.

  1. “Variable” is any named entity in your program that stores a “value”. It can be anything from integers, floats, strings, complex data structures like arrays or lists, and even functions and classes and modules and stacktraces.
  2. The “type” of a value tells the programmer and program alike how the value behaves. Certain behavior varies based on the type. For instance, you don’t add integers the same way as you add floats.
  3. “Strong typing” means you can tell what type a value is with no other information than the value itself. Python is an example of a “strongly typed” language, as every value is stored in memory as a PyObject, and the Python language can tell you the type of any value.
  4. “Weak typing” means you cannot tell what type a value is without additional information. C/C++ are good examples of this, as you could be looking at an int or a float or anything else. Without type declarations in the language itself, it would be impossible to keep things straight.
  5. In many languages, variables hold information on the type of the value they store. However, this is not true for all languages. For instance, in Python, the variable is simply a name-value pair, stored in a dict.
  6. “Static typing” means that you cannot change the type of the value in a variable. Some languages allow you to assign a derived class of the type of the variable, others are more strict.
  7. “Dynamic typing” means any variable can hold any value. Python is fully dynamic, but many languages are partially dynamic as they cannot store all values in a variable or some variables do contain type information.
  8. “Explicit typing” means the programmer must tell the computer what type each variable is.
  9. “Implicit typing” means the programmer does not tell the computer what type each variable is. The computer can infer the types by simple analysis.
  10. “Type system” is the way types are treated by a particular language, and includes the language used to describe its types.
  11. “Simple” and “complex” refer to the number of components and the number of sub-components in those components. IE, the function “foo(bar, baz)” is more complex than “foo(bar)” because it takes 2 parameters. (Parameters are sub-components of a function.)
  12. “Correct code” means that the code accomplishes the purpose it was intended to accomplish. “Incorrect code” means it is not correct. Note that simply because a program compiles does not mean it is correct. There must be some human element to judge the correctness of the program.
  13. Simple is better than complex, but the code must be correct for it to matter at all.
  14. Explicit is better than implicit sense it helps people unfamiliar with a system understand how it works.
  15. Explicit typing requires a type system that is explicit. That is, the programmer must spell out what the types are using the language the type system uses.
  16. Implicit typing merely hides the type system from the programmer. However, there is still a type system underneath that the programmer needs to be aware of when he violates the constraints of the system.
  17. Dynamically typed languages typically have a simpler type system than static type systems. All variables can be any type of value so there are no constraints like there are in static systems.
  18. Statically typed languages must have a more complicated type system. This is because it imposes at least one constraint: Variables cannot hold a different type of value.
  19. Weakly typed languages require static typing. This is because it is impossible to manage the values without knowing what types they are, and the values themselves do not contain that information.
  20. Strongly typed languages do not require static typing. This is because it is additional information that should at least be consistent with each other based on the type system.
  21. Complex systems are more difficult to understand and manipulate than simple systems.
  22. There is a class of error called “type mismatch errors”. They are introduced when the programmer creates incorrect code that improperly handles the values in question.
  23. Many static type systems eliminate or at least reduce type mismatch errors at compile time.
  24. Strong, dynamic type systems do not  eliminate or reduce type mismatch errors at compile time.
  25. Whether or not errors are caught at compile time or run time doesn’t matter as long as the errors are caught.
  26. In order to prove your software correct, you must demonstrate that it behaves as expected. Only the simplest of programs can be analyzed by reading the code.
  27. 26 requires writing what we call “tests”. The code is run against the test, and if it passes, then it is assumed the code is correct. If the test is not able to detect errors in the code, then it is not a sufficient test.
  28. 23 & 24 imply that the compiler is doing some of the tests that would be handled in the testing phase.
  29. Since explicit is better than implicit, implicitly testing the code with the compiler is worse than explicitly testing the code with tests.
  30. Therefore, dynamic, strong typing is best.

Addendum

This is really the argument “strong, dynamic type systems are much simpler than any other type system; The benefit of a more complicated type system is that only one kind of error is detected during the compilation phase rather than the test phase, but this is a very small benefit compared to the cost of the complexity of having a type system at all. Therefore, in all cases, strong, dynamic type systems are best.” I’ve just spelled out the assumptions and the logic behind it all.

I should mention that people who write code but do not write sufficient tests are cheating. Until you write tests, you cannot understand whether your code is right or wrong. The compiler can’t tell you anything. It exists to convert your code into machine instructions, that’s it. You still have to test those machine instructions for correctness.

With strong, dynamic systems you do not write tests that have already been written. For instance, I don’t need to write a test for what happens when you add a string and an integer in Python. Those tests already exist, and cover every possible combination of types. When a new type is introduced, it should include tests that would plug it into the ecosystem. IE, if you want it to have addition property, then you need to write the tests that will show it adds in some cases but not others.

Finally, I want to mention what I think is the most obvious evidence against typing systems. You know how we used to have a big zoo of fundamental particles until physicists were able to figure out quarks? See, if a system is composed of smaller systems, learn those smaller systems and ignore the bigger system, and you’ll understand the bigger system. Every sufficiently complicated type system has, inside of it, a dynamic, strong type system. The dynamic, strong type system is like quarks, and the more complicated type system is like that zoo of fundamental particles. If you really want to understand particle physics, study quarks, not protons and neutrons and all the other composite particles. If you want to do particle physics, you need to do quarks, not protons and neutrons and such. In this way, the strong, dynamic system is the only type system you ever need to learn. Once you’ve tamed that, your job is done.

Which is why I hardly ever see a type error in Python. The one case that seems to arise is when I have more than a few arguments to a function, and I forget the order. The solution is simple: Don’t use ordered arguments!

 

Advertisements

Comments»

1. codeinfig - January 13, 2017

my own language has ordered arguments, and tries to keep them minimal in count. the alternative in python to ordered arguments is named arguments, so its like learning several more commands instead of just one. in some cases, having more commands with fewer arguments works as well as fewer commands with ordered arguments– this doesnt disagree fundamentally (more in terms of what comes first) with your statement, imo.

as for typing, i personally prefer the sort of typing you do (my language translates directly to python.) i think the sort of typing we like has a great deal of merit– i dont consider it a “debate,” however. its the right-tool-for-the-job argument. for most uses, our favorite typing is going to be acceptable, and also easier (imo.)

for some very important uses, static and stricter typing (python lumps a lot of similar types together in practice) are more or less a must. the reason i say “its not a debate” is that our way is absolutely fine (perhaps ideal) for the things we do– and really not well suited to these other things that we do not do.

our tool for our jobs are the right tools. their tools for their jobs are the right tools. we are both right– its more about the context in which we are right, than any real “debate.” so long as things are kept in context, no one is really debating anything. no ones saying dynamic typing is better when it isnt. some die-hard low-level coders (who could implement dynamic typing from low-level code if they wanted to) will say their way always works– theyre not wrong. all our handy dynamic this-and-that gets turned into their low-level stuff eventually. so theyre always right, and (provided we keep the argument in context) most likely so are we. thats more like a “partitioned consensus,” than a debate.

2. Jonathan Gardner - January 14, 2017

RE named parameters in Python: If your language doesn’t support named arguments, people are going to add it by passing in objects or dicts. Rather than have everyone choose their own way of named arguments, I think it’s better to have one consistent way.

I think there’s a bigger argument. Static/weak typing is useful when your compiler isn’t very strong. Back when CPU cycles were more expensive, I can see why a little homework on the part of the programmer could save a lot of cycles. Now that PyPy is a reality, and we’re converting real 100% honest-to-gosh Python into machine code, it’s clear that we have plenty of CPU cycles to spare figuring out the optimal way to implement our code, and we never have to spend 5 seconds about thinking about types.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: