Counting Days in Python August 25, 2011
Posted by PythonGuy in Advanced Python, Python.trackback
When generating reports and charts, it’s very common that you want to generate a sequence of days between a range.
Before we begin, let’s remind ourselves that the Python pattern for sequences is to include the first item, but exclude the last. That’s how range() and all the other functions work.
I’ve seen people often do something like this:
days = (some list-generating code) for day in days: ... process that day ...
This may be how you were trained to do this in other languages. I strongly recommend not doing this way. In general, using lists when you don’t really need them means you’re passing up some wonderful features of Python that will make your code more readable, more robust, and faster.
Some languages encourage you to avoid lists and use the 3-clause for loop that is not found in Python. I prefer lists over 3-clause for loops, mostly because I’ve never met one that was clear and obvious in meaning compared to the iterative approach.
For finite sequences of days, I prefer the generator. It gives me the highest chance of getting things right from the beginning.
for day in (... generator for a sequence of days ...):
... process that day ...
What is appropriate for that generator? A generic generator function should do the trick:
import datetime
def generate_days(start_day, end_day):
day = start_day
while day < end_day:
yield day
day += datetime.timedelta(days=1)
The above pattern is so common it’s a wonder why we don’t generalize xrange. The interface could be something like:
xrange(first=0, last, increment=1) Returns first, first+increment, first+increment+increment, etc until but not including last. Requires that increment can be added to first and that the current item can be compared with last.
One can dream…
You’ll note that if we write our own general xrange generator, it is just this:
def our_xrange(first, last, step):
cur = first
while cur < last:
yield cur
cur += step
Of course, using itertools, you can combine the count() and takewhile(), but count() seems to only care for numbers. It’s trivial to write your own count() generator, though.
You’ll notice that I’m not dealing with numbers and then converting those numbers to dates. I much prefer objects that know what month and day they are in to integers that require lots of math to massage their meaning out of them.
Finally, I want to mention one special case that arises with databases. Oftentimes, you want to query data from the database for a sequence of dates. While some databases provide advanced features to allow you to do this within the database, many (most, now that NoSQL databases seem to be common) do not. I wouldn’t rely on the database doing the hard work for you. Besides, as I’ll show below, it’s really not that hard in Python, probably much easier than the solution the database provides.
There are two ways to handle this. One, we could pre-generate a list of dates, and then query the database for each date with an “in” condition (SQL). Two, we could ask for all the data between two date ranges, sort them, and then walk through the data in parallel with our date counter in Python.
The first is not desirable. This is because you have to send a lot of data across the wire—one date for each point on your graph, or row in your table. Let’s ignore it and move on.
The second is more desirable, but the problem of iterating through two lists in parallel is not something that’s obvious to most programmers. There is a simple solution, but it takes a bit of explaining. I’ll try to summarize that here.
What you want to do is to first generate the iterator that will give you all the dates you are interested in, given only the start, end, and interval. Then you want to query your database for all points of data that align with this, giving only the start, end, and interval. You’ll need to sort the data coming from the database. This will end up in another iterator of some sort.
Next, you want to iterate across your date iterator, performing some action. You’ll want to grab the row from the database only if the next row matches that date.
Here’s some sample code.
def date_result_pairs(dates, results):
"""Generates pairs of (date, result), one for each date in dates. If there is no corresponding result, then the result will be None."""
dates = iter(dates)
results = iter(results)
try:
result = results.next()
for day in dates:
if result.date > day:
yield (day, none)
else:
yield (day, result)
result = results.next()
except StopIteration:
for day in dates:
yield (day, None)
dates = our_xrange(start, end, interval)
results = (query the DB for all data between start, end, matching interval, sorted by date.)
for date, result in date_result_pairs(dates, results):
print "Result for date %s is %r" % (date, result)
This isn’t perfect. It assumes the database is returning dates that match exactly with the dates you have, which may or may not be correct given your ability to write the correct query.
I hope this has demonstrated some of the elegance of Python for these kinds of problems.
As always, questions and ideas are welcome.
Comments»
No comments yet — be the first.