jump to navigation

Pylons and PrefixMiddleware July 16, 2014

Posted by PythonGuy in Pylons, Python.
add a comment

So Python Guy was trying to get his Pylons website working with Amazon’s Elastic Beanstalk behind its Elastic Load Balancers running SSL. Everything seemed to work fine until Python Guy noticed that redirect() was sending URLs that pointed to the wrong address.

Never fear, Python Guy is here! All that needed to be done was to put PrefixMiddleware in front of the app. See here for instructions!

Advertisements

Probabilities December 5, 2011

Posted by PythonGuy in Beginning Programming, Python.
add a comment

I used to play a game called Hamurabi, a very ancient game.

The way it works is you have a population, an amount of land, and an amount of grain. Every year, you need to choose how much grain to plant, how much grain to feed the people, and how many acres of land to buy or sell.

Each year, a certain amount of grain would grow, people would starve if they didn’t get enough food, and rats would eat your surplus grain. Random events would cause disasters to make things interesting.

The game was remarkably simple but surprisingly addictive.

I figured it was time to write a new version for the modern age, when memory and CPU are no longer real constraints to most of the problems we faced programming 30 years ago. I chose the Hamurabi game as inspiration.

Of course, a real simulation of a primitive society could track each individual within the population. People would age, and then die from starvation, random events, or just old age.

Using the US Census data, I found some actuarial tables that gave a percentage chance someone would die at a particular age.

I stored the number of people in each age group, separated by sex, in a simple array. The index was the age.

Now all I needed to do was find a scalable way of calculating how many people die in each age group every year.

The simplest way to do this is to roll the dice for each person each year. Simply use the random modules’ random() function to get a number between 0.0 and 1.0. If this is left than the probability they will die, then they die.

Of course, if you have a thousand people, you have to call random() a thousand times. A million people need it called a million times. This would slow the program down as your population grew.

I vaguely recalled some math from my college years that should help me work these things out more quickly. Reading some of my old textbooks, I discovered that that Binomial Distribution gives you the exact probability of seeing x events in population of n where each has a probability of p.

However, the binomial distribution relies on combinatorials, which rely on factorials, which are notoriously difficult to calculate precisely for large numbers.

Luckily, this is not a new problem. Approximating the Binomial Distribution leads to the Normal or Gauss Distribution. This relies on simple exponentials and Python’s random module already has a gauss() and a normalvariate() built in.

What do you use for mu and sigma, the mean and standard deviation? The mean is simply n*p, and the standard deviation is the square root of n*p*(1-p).

The normal distribution really doesn’t give you good approximations when n or n*p is small. If n is really small (there are only a handful of people), I can just roll random() that many times. But if n is large, and p is very small (generally, for younger ages), then my approximation breaks down.

For this, the Poisson Distribution is useful. The Poisson Distribution can give you a very good approximation of the Binomial Distribution for small numbers of deaths. What I do is roll the dice with random(), and walk from 0 deaths upward, subtracting the probability that that many deaths would occur given the Poisson Distribution. When I’ve exceeded the roll, then that’s the number of deaths that occur.

Using these three methods of calculating deaths lead to a very scalable algorithm for calculating the number of deaths in a population of 1, 10, hundreds of even millions of people. I ran my simulation for several hundred years until I saw hundreds of billions of people, with tens of billions dying each year.

I think this shows why I like Python. Never once did I have to think very hard about how to express my ideas in Python. I was completely free to consider the algorithm. Most of my time was spent reading math books and translating complicated formulas into simple algorithms, and testing that it did what I expected.

Python Sucks (… when you suck at writing code) August 26, 2011

Posted by PythonGuy in Python.
1 comment so far

One of the neat features of Python is how readable it is. For most of your code, you won’t have to document what the code means, just why you chose to do things a certain way.

I am poking at a long function, more than my 1080p rotated monitor can fit on one screen at a readable font size for me. It flows off the edge of the screen, making it hard to see how different pieced fit together.

I don’t like long functions. When you reach a certain number of variables in a particular scope, or when you have bits and pieces of your code spread out like peanut butter, it becomes unreadable.

Of course, long functions are probably better than the alternatives, provided you can’t separate out different parts of it.

The author of the code I am reading decided it wasn’t a good idea to put relevant pieces of code together. As a result, the code is intertwined.

Variables are created in one spot and used in another. In between, they are not relevant at all.

  the_var = the_value

  ... some irrelevant bits of code ...

  use(the_var)

Obviously, it is much more readable if you do things this way:

  ... irrelevant bits of code ...

  the_var = the_value
  use(the_var)

  ... other irrelevant bits of code ...

Or even better, eliminate the variable altogether:

   use(the_value)

By minimizing the mixing of code, it becomes much easier to read the code and put things together in my head. This is no different than writing English in paragraphs, where each paragraph has sentences that relate to each other.

In addition, you can make your code more readable by sensibly keeping the variable count down. Sometimes, reducing the number of variables means you have a more complicated expression, so you have to think carefully about how to do this.

Comments, in addition to vertical whitespace, can be used to separate “paragraphs” of code. I use single-line comments when I’m talking about the next few lines, but multi-line comments when I’m talking about entire sections of code.

# 
# This is a description of the following block of
# code. Note how it sticks out because of the white space.
#

# This is a comment about the next two lines of code.
do_something()
while you_do_something_else():
   append_this_to_that()

# This comment applies to the next few lines.
this = that + that_other_thing - 5
these = [that, that_other_thing]

I hope my point is clear. You can write wonderful code, but make it entirely illegible, even in Python. So don’t do that. Take a little bit of time to make your code readable. Your future self will thank you.

Counting Days in Python August 25, 2011

Posted by PythonGuy in Advanced Python, Python.
add a comment

When generating reports and charts, it’s very common that you want to generate a sequence of days between a range.

Before we begin, let’s remind ourselves that the Python pattern for sequences is to include the first item, but exclude the last. That’s how range() and all the other functions work.

I’ve seen people often do something like this:

days = (some list-generating code)
for day in days:
   ... process that day ...

This may be how you were trained to do this in other languages. I strongly recommend not doing this way. In general, using lists when you don’t really need them means you’re passing up some wonderful features of Python that will make your code more readable, more robust, and faster.

Some languages encourage you to avoid lists and use the 3-clause for loop that is not found in Python. I prefer lists over 3-clause for loops, mostly because I’ve never met one that was clear and obvious in meaning compared to the iterative approach.

For finite sequences of days, I prefer the generator. It gives me the highest chance of getting things right from the beginning.

for day in (... generator for a sequence of days ...):
    ... process that day ...

What is appropriate for that generator? A generic generator function should do the trick:

import datetime
def generate_days(start_day, end_day):
    day = start_day
    while day < end_day:
        yield day
        day += datetime.timedelta(days=1)

The above pattern is so common it’s a wonder why we don’t generalize xrange. The interface could be something like:

xrange(first=0, last, increment=1)
Returns first, first+increment, first+increment+increment, etc until but not including last. Requires that increment can be added to first and that the current item can be compared with last.

One can dream…

You’ll note that if we write our own general xrange generator, it is just this:

def our_xrange(first, last, step):
    cur = first
    while cur < last:
        yield cur
        cur += step

Of course, using itertools, you can combine the count() and takewhile(), but count() seems to only care for numbers. It’s trivial to write your own count() generator, though.

You’ll notice that I’m not dealing with numbers and then converting those numbers to dates. I much prefer objects that know what month and day they are in to integers that require lots of math to massage their meaning out of them.

Finally, I want to mention one special case that arises with databases. Oftentimes, you want to query data from the database for a sequence of dates. While some databases provide advanced features to allow you to do this within the database, many (most, now that NoSQL databases seem to be common) do not. I wouldn’t rely on the database doing the hard work for you. Besides, as I’ll show below, it’s really not that hard in Python, probably much easier than the solution the database provides.

There are two ways to handle this. One, we could pre-generate a list of dates, and then query the database for each date with an “in” condition (SQL). Two, we could ask for all the data between two date ranges, sort them, and then walk through the data in parallel with our date counter in Python.

The first is not desirable. This is because you have to send a lot of data across the wire—one date for each point on your graph, or row in your table. Let’s ignore it and move on.

The second is more desirable, but the problem of iterating through two lists in parallel is not something that’s obvious to most programmers. There is a simple solution, but it takes a bit of explaining. I’ll try to summarize that here.

What you want to do is to first generate the iterator that will give you all the dates you are interested in, given only the start, end, and interval. Then you want to query your database for all points of data that align with this, giving only the start, end, and interval. You’ll need to sort the data coming from the database. This will end up in another iterator of some sort.

Next, you want to iterate across your date iterator, performing some action. You’ll want to grab the row from the database only if the next row matches that date.

Here’s some sample code.

def date_result_pairs(dates, results):
    """Generates pairs of (date, result), one for each date in dates. If there is no corresponding result, then the result will be None."""
    dates = iter(dates)
    results = iter(results)
    try:
        result = results.next()
        for day in dates:
            if result.date > day:
                yield (day, none)
            else:
                yield (day, result)
                result = results.next()
    except StopIteration:
        for day in dates:
            yield (day, None)


dates = our_xrange(start, end, interval)
results = (query the DB for all data between start, end, matching interval, sorted by date.)
for date, result in date_result_pairs(dates, results):
    print "Result for date %s is %r" % (date, result)

This isn’t perfect. It assumes the database is returning dates that match exactly with the dates you have, which may or may not be correct given your ability to write the correct query.

I hope this has demonstrated some of the elegance of Python for these kinds of problems.

As always, questions and ideas are welcome.

Some Dict Patterns August 10, 2011

Posted by PythonGuy in Advanced Python, Python.
6 comments

Introduction

I have seen a lot of Python code, as well as Java, Perl, and other languages. When dealing with dicts, I have identified a few patterns that I feel are optimal, depending on the situation. These are extraordinarily simple.

NOTE: The dict data structure is known by many names: map, a hashmap, an associative array, a hash, or even a table.

Use if Present

For instance, how many times have you written the following pseudo-code in your language of choice?

# pseudo-code
if key is in map:
    lookup value in map with key
    use value

Python provides the “get()” method which returns None if it is not present.

# Python
value = map.get(key)
if value is not None:
    # use value

Of course, sometimes you need to distinguish between values that are None and that are not present in the dict. You can rely on exceptional behavior for this:

# Python
try:
    value = map[key]
except KeyError:
    pass
else:
    # use value

Or you can use the “in” test:

# Python
if key in map:
    value = map[key]
    # use value

The above, of course, does 2 dict lookups.

Exceptions versus Lookups

There is discussion about whether to use exceptions or lookups. The general rule of thumb is that exceptions are not as slow as you think, since Python is pretty slow to begin with. “Slow”, of course, is a relative term that is meaningful when you compare Python to C/C++. And nowadays with PyPy, “slow” isn’t a proper word for it anymore.

Use Value or Default If Missing

Sometimes you want to use a default value if the key is not present. This is simply:

# Python
value = map.get(key, 'default')

Note that whatever expression you use as the default value will be evaluated, whether or not it was used. If the expression is expensive to calculate, then you can use this form:

# Python
value = map.get(key)
if value is None:
    value = expensive_expression()

Notice that you’re back to the previous pattern if you need to distinguish between a value of None and a missing key.

Use Value or Default and Store if Missing

Sometimes you want to store the default value in the dict if it is missing. “setdefault()” is the ideal method for this.

# Python
value = map.setdefault(key, 'default')

Of course, the caveats for expensive default expressions applies.

Conclusion and Summary

Those of you who are unfamiliar with Python might note how similar all of the above patterns are. Indeed, if you simply learn what the following expressions mean, you don’t have to think very hard to understand what the code does or to choose the right code:

  • key in dict
  • dict[key]
  • dict.get(key)
  • dict.get(key, default)
  • dict.setdefault(key, default)

Getting into PyOpenGL August 6, 2011

Posted by PythonGuy in OpenGL, Python.
add a comment

Python Guy is getting back into OpenGL. I remember doing some basic OpenGL back in ’99/2000. Nowadays, so much has changed that I’m learning it all over again.

If you want to follow along, I’m going to try and build a Minecraft clone.

The general idea of the game is that you control a character that can manipulate the world. I’m going for a medieval RPG style game play, maybe with some networking enabled in the future.

The problem of generating the cubes is harder than it looks. My big idea is to apply a smoothing algorithm so that the data looks more natural. That is, if two blocks only differ in elevation by 1 unit, then the two blocks should form a neat slope.

Another idea is just to start with raw data describing the landscape, and allow the player to manipulate the mesh.

Anyway, OpenGL is very well documented, both in the infamous Red Book and on the OpenGL wiki. I encourage you to check it out.

PyOpenGL, by the way, is the way to go.

Lua vs. Python, Embedding July 13, 2011

Posted by PythonGuy in Lua, Python.
15 comments

People whine about embedding Python is a pain.

I feel their pain, but not the pain of embedding Python. I feel their pain of being forced to write code in C.

As Glyph Lefkowitz so clearly explains, the fact that Lua is easily embedded doesn’t make it a good choice. Embedding is the wrong engineering choice to begin with.

What you should be doing as a programmer is creating usable code fragments. If the Free Software movement has taught us anything, the survivability of code depends on its usefulness, and that usefulness goes into how other programs can tap into its power.

If you’re writing an environment from which you expect other programmers will confine themselves within, you are making a big mistake. Those programmers who fall into this trap will one day realize that they’ve wasted their lives in pursuit of the impossible. If you want to know where things will go, go read about Emacs and Lisp. That’s what you’re recreating.

If, instead, you’re writing components of programming intelligence, buried into .so’s or .dll’s, then other programmers can follow after you and incorporate your program into theirs.

Lua might be a nice embedded language, but embedded languages are not what you really want.

Python v. Lua, Complexity and Simplicity July 13, 2011

Posted by PythonGuy in Lua, Python.
11 comments

One of the biggest arguments against Python and for Lua is simplicity. That is, the Lua fans claim that Lua is more simply than Python, because it’s language is much smaller.

I agree and concede the point that Lua’s language is smaller. I also admit that Lua’s code base is much, much smaller.

However, does this translate to simplicity for the programming task?

I think Lua is wonderfully engineered. The guts of Lua are beautifully assembled. It is a wonderful structure to behold, much like the Taj Mahal.

However, people don’t live in the Taj Mahal. They live in modern homes which are an engineer’s nightmare. Modern homes are much more comfortable because the modern conveniences that make living simple are readily available.

Behind the faucet is miles of piping. Behind the outlet is an electrical system that will confuse even the most experienced electrician. But we know that if we want to live, we would rather live in a modern home than the Taj Mahal.

Yes, the Taj Mahal is simpler, being constructed out of stones that fit perfectly together. Yes, we can appreciate how a few simple components have come together to make something greater than the whole. But that’s about all we can do.

A very, very good example is how Python approaches Object-Oriented Programming versus Lua’s approach to Object-Oriented Programming.

Python provides not insignificant language constructs, embedded deep within the syntax and interpretation, that make simple classes trivial to create, and hard classes easy.

Python’s OO system is easily learned. When you need special behavior, such as attribute look up and assignment magic, the way to do so is plain and easy.

Lua’s OO system, on the other hand, is extremely limited. You only have two or three syntactic elements to support OO programming, and these are not “magic”, nor can the magic be modified to fit the task.

Where is SQLAlchemy in Lua? It cannot exist. For Lua’s “simple” system, you will never have declarative ORM, and never understand what it means to have a class backed by a database. Sure, SQLAlchemy is much more complicated than anything you would ever dream of writing in Lua, but Python has provided the modern conveniences that not only make writing SQLAlchemy possible, but almost easy.

Why does Python have so many modules? Because it is so easy to create new modules that do interesting things, whereas in Lua, this is simply not done.

Yes, Lua, the language, is simpler. But I don’t want to use a language that is simpler. I want a language that makes my job, the task of programming, easier. I don’t care how complex that language is, anymore than I care whether how complicated the grammar in English is. I simply care how hard it is to do my job, and Lua makes my job harder, not easier.

Python v. Lua, Coroutines July 13, 2011

Posted by PythonGuy in Lua, Python.
3 comments

I am seriously studying Lua now. I’ve realized that a better language to compare to Lua is Javascript.

Reading about Python vs. Lua on the web I’ve discovered that the deciding factors in favor of Lua are argued to be:

  1. Lua is smaller, easier to master.
  2. Lua is faster.
  3. Lua doesn’t change as rapidly.
  4. Lua supports coroutines.
  5. Lua is easily embedded.

I can argue against each of these points to my satisfaction. I am a Python fan for a reason, and none of these attack the reason I am a Python fan.

Now, one important topic would be convincing, if it were true: coroutines.

Unfortunately, it seems the Lua fans have missed out on Python’s greenlets module. If you still say that Python lacks sufficient coroutines after seeing the greenlet module, then I’d like to understand what makes you say so.

Lua vs. Python July 13, 2011

Posted by PythonGuy in Lua, Python.
25 comments

I’ve seriously looked into Lua for twice. The first time I was excited about some of the potential, but I quickly lost interest. This time I have forced myself to overcome.

My impression is that Lua’s closest cousin is Javascript. The rather Laissez-faire approach to syntax means that there isn’t a whole lot of consistency, and there wasn’t a lot of time spent thinking hard about how programmers might actually use the language. There are a number of cases where a bit more consistency could have been introduced, and the syntax could have been reduced in verbosity, but the designers, for some reason, decided not to do so.

Here’s my list of (hopefully objective) differences between Lua and Python.

I’ve declared a winner for each point, based on my personal preference. Obviously, that part is not objective.

  1. “local”. Python’s vars are naturally local, whereas Lua requires the ‘local’ keyword. This is like Javascript, which requires ‘var’ to get a local variable. Since local variables should be much more common than global ones, Python wins.
  2. Table declarations. The various ways to declare tables are confusing, and the fact that you can combine them is even more so. Compare to Python’s clear distinction between lists and dicts, and you’ll understand what I mean. Python wins.
  3. Lists are dicts are tables are objects. I actually like the dot-syntax for item lookup in dicts. I do think Python could benefit from merging lists, dicts, objects, and perhaps even sets into one data type. Score one for Lua.
  4. No variable declaration needed. If you try to access an undeclared variable, Lua gives you ‘nil’. In Python, an exception is thrown. You should never access variables that have not been declared and assigned to. Score one for Python.
  5. Local scope for if / for / while blocks in Lua. Python does not create a new local scope for if / for / while blocks, so you can declare variables within if statements and you can access the for iterator variable outside of the blocks. Additional code is needed to share the values outside of the blocks in Lua, which is all too common. Python wins.
  6. Non-critical whitespace. Lua doesn’t treat whitespace special, and so programmers need not organize their code neatly for it to compile. Python wins.
  7. Block delimiters. Lua requires do / then – end combinations, while Python’s indentation is parsed by the parser. Python wins. If you’re blind (so that indentation has no meaning for your non-visual perception of the code), then Lua wins.
  8. Two different version of for. Rather than create a function that returns an iterator given start, stop and step, Lua built it into their syntax. Simpler syntax means less work. Python wins.
  9. Statements need no separator in Lua. Although discouraged, this is clearly confusing. Python wins.
  10. Both repeat-until and while-do are in Lua, while Python only has while. Python is simpler, and repeat-until is no big advantage since it is trivial to build with a while loop. Python wins.
  11. No continue statement in Lua. You must wrap your code with an if statement. Python wins since flat is better than nested.
  12. No else block for while / for loops in Lua. Python is about the only language I know that has else blocks for the loop statements, and I actually find them useful. Python wins.
  13. Return and break must be last statement in a block. This requires you to do things like stick a return in a do-end block. Python wins.
  14. Parens optional on function calls in Lua, which makes a “special” function call syntax with a single table, which really isn’t that special. This inconsistency is very troubling because it doesn’t help anyone except those who like obscuring their code. I think this confusion is one of the biggest barriers to Lua for beginners. Python wins.
  15. Method calls with : in Lua, whereas Python has the magical binding process which can be overridden. Python wins, because once you understand the binding process, there is no magic. If you don’t understand it, it just works the way it should.
  16. Missing params or extra params in function calls are not an error. Missing or additional results are not an error. This will not expose broken code, particularly refactored code. Python wins.
  17. Function calls produce list but only if last in list. This is just confusing as heck. Python wins.
  18. 1-based indexing in Lua. The math is all wrong, and requires you to remember -1 or +1 as needed. You never have 1-off errors in Python because you never need to +1 or -1 with indexes. Python wins.
  19. No named parameters, but you can pass a single table in Lua. Python wins.
  20. No default parameters in Lua. The syntax is very simple and all but obvious. Python wins.
  21. Local function declarations require local var declaration before function declaration. This is evidence that variables should be local by default, including function declarations. Python wins.
  22. Early binding in function bodies for Lua. Although Python is slower because it has to lookup the symbols for every function call, Python’s way is much more intuitive, especially for junior developers. Python wins.
  23. That’s all I have for now. I am sure there are a number of other differences that people can identify. As a junior Lua developer, I really don’t understand much about the subtleties of Lua.