jump to navigation

Shared Memory and Python May 10, 2016

Posted by PythonGuy in Advanced Python, GIL.
add a comment

I’m researching shared memory and Python right now, because it seems like the only hope for a particular situation we are seeing.

Basically, we have a very high-performance web server that is designed to handle a large number of requests per second. At the same time, we want to be able to update the code that is used to process these requests, real time.

Our webserver is using microthreads (greenlets) and so it can only really do one thing at a time, even though it looks like it is doing many things. When we go to update the code, everything else must stop until the update is complete.

Obviously, we can use rolling deployments, but there are some particular issues we see with that. Namely, memory and resource management.

If we did “hard” rolling updates, we would have to turn off a node, taking it out of service. Then we’d perform the operation, and then add the node back to the service. This would take a non-trivial amount of time.

If we did “soft” rolling updates, we would spin up a new web server on a node, flip from the old to the new server, and retire the old server. This requires twice as much memory as we typically need.

Another option would involve shared memory. The code would be served from shared memory. When we’d like to do an update, we’d launch another process to create a new chunk of code in shared memory, then we’d flip from the old to the new code. This seems like the ideal option.

The problem is that Python only supports the most basic data structures in shared memory.

Perhaps there is a way to trick Python into treating shared memory as actual Python code, maybe by storing the code as an array. When we go to run the code, we load the code from the array directly into a function and then call it. I don’t know if that would work, or that any method would work, but I’m looking into it.

Suggestions welcome!

Advertisements

Python Killer Feature: Sets and Frozensets May 29, 2015

Posted by PythonGuy in Advanced Python.
Tags: ,
add a comment

I won’t bore you with the details of what set and frozenset do. The documentation is quite clear on that. What I will bother you with is how useful I find sets to be.

In Mathematics, sets are a very specific thing. You will find them used almost anywhere. Some say that the foundation of math can be found in sets themselves. Unless you’ve been trained in mathematics to this degree, you probably can’t appreciate how useful it is. Suffice it to say that it is one of those things you wished you learned in Kindergarten. It would’ve solved so many problems you encountered later on in your math career.

In Python, sets are a different thing. It has to be, for the same reason you can’t represent real numbers in a computer with a few bytes. In Python, you can put anything in a set. If you can put it in a list, you can put it in a set. Some objects are always unique and will never have a duplicate except with itself. Others are not unique and there is the possibility there will be another object that is a copy of it. Regardless, it does the right thing.

Well, I lied. It’s only hashable objects that can go in a set. In fact, think of the keys of a dict as a set, and you’re good to go. But really, if you want to stick something in a set, it’s a trivial matter to make it hashable, or put something in there in its stead that is hashable. For instance, an object’s ID, or a row’s primary key.

You’d be surprised how often you want to do the basic set operations.

Let’s say I am comparing what I have in one database with what I have in the other. I can build up a set of the primary keys of each database. Then I can do a set intersection to find which rows are in both databases and a set difference to see which is in one database but not the other. This leads to much clearer code than, say, walking through the two databases together. (But for large data sets where storing the primary keys in memory locally is less than ideal, you’re probably going to prefer walking through them.) But this sort of operation extends beyond databases. Let’s say a customer submits a form with an updated cart content. How are you going to see which items they added and which they removed unless you do this kind of operation? Or let’s say you want to compare two dicts with each other.

Another useful use case is when you want to find unique items among a group of items with many duplicates. Build a set of the items, and it’s automatically de-duped for you. If you want to find items that have unique attributes, then you can build a dict where the key is the attribute you want to be unique and the value is a list of the items that share that attribute, or a count, or whatever you want. (That’s a good use case for defaultdict, the topic of another post.)

Once you learn the set methods, you’re going to find many uses for them. You’re going to see a large number of your “for” loops disappear and be replaced by basic set operations. You’re going to wonder how you got along before discovering the ever-useful set. Well, I won’t hold you back any longer. Go read the docs, and happy Pythoneering!

Getting started with PySide, Part 1: Setting up on Windows March 6, 2014

Posted by PythonGuy in Advanced Python, Qt, Windows.
add a comment

After a long, circuitous journey in the wonderful world of game development in Python, Python Guy decided he likes Qt best. So he is trying out PySide, the LGPL version of PyQt. Python Guy has done a lot of PyQt in the past, but that was a long, long time ago and Python, Qt, and everything have moved on since then.

Why does Python Guy like Qt? For starters, it is clean. No, it is not simple, but it is clean. You can tell that every API, every parameter, every value in an Enum has been carefully thought out. There is no more nor no less than what you need.

Qt also strongly encourages a sane object model. All the things in Qt can be subclassed. Override what you need to override, and no more.

Qt’s signal/slot paradigm is beautiful. Python Guy tries to use it in everything he writes. The closest comparable thing is Javascript’s DOM event model, except that the DOM event model only applies to DOM objects. Qt has taken the concept of signals and slots and perfected it. Python Guy was pleased that there is a new syntax to do connections: Object.signal.connect(callable). That is nice and concise and clear and his preferred syntax for that sort of thing.

You might think Qt is a horrible platform for a game. That may be true if you intend to implement all the visual UI elements yourself. But if you’d rather focus on game play and mechanics, you’ll appreciate having an event loop, a complete set of UI elements, and a nice library that is quick to work with across platforms.

This article covers how to setup your dev environment in Windows.

For starters, you need to get used to the fact that the Windows command line is virtually useless compared to what you’re used to in the Linux world. Get over it, and don’t use it.

Install GitHub.com’s software. It will make it easy to start a new project and handle all the Git commands for you. If you get in a weird spot, there is a Git shell you can use to straighten it out. Don’t get in a weird spot. Learn how Git really works. If you don’t want to pay, make your projects public. There is no shame in showing the world the software you are working on. In fact, Python Guy doesn’t like closed-source software. So do the right thing and tell the world what you are playing with.

Install Python27. Python Guy looks forward to Python 3 and encourages you to try it out today. However, Python Guy was tired and simply wanted to get something that works. Be sure to update your environment variables, setting PATH appropriately.

Install PySide. This will install Qt, Qt Designer and Qt Linguist for you under its module in C:/Python27/Lib/site-packages. It also installs pyside-uic.exe under C:/Python27/Scripts. This will be useful.

Start your new project using the GitHub interface. Then open an explorer window at that directory. Click on the gear to see the extended options for a project. There you can find the option to open an explorer window.

In the explorer, add a new file, “README”. If you don’t see the “.txt” ending, you need to configure your Windows to show that to you. Otherwise, you’ll end up with things like “myprog.py.txt” which is not what you want.

Add some text to the README file. Commit it. Push it up to your repository.

Use your favorite editor. Python Guy prefers ViM. But you can use whatever you like best.

If you are using Linux, setup is as easy as using your package system to install all the necessary dependencies.

We’ll start with actual coding in the next article. We’re going to use Qt Designer to build our widgets and MainWindow for us, and then simply write code to plug everything together.

The game will be a simple game, a sort of Hammurabi clone. So we can focus on Qt, game building, and Python.

Recommendation for decorators: Please don’t. January 22, 2013

Posted by PythonGuy in Advanced Python.
add a comment

My software philosophy is simple. That’s the philosophy: simple.

See, as programmers and developers and engineers, our real job is to take really, really complicated things and make them easy. Take, for instance, spanning a chasm with a bridge. We take what used to be really, really hard and difficult and dangerous, and make it as simple as crossing the street.

So software should be simple. The simpler, the better.

Decorators have the potential to make hard code simple. They also have potential to make things complicated.

As is so often the case, as developers, we tend to get the idea that our code needs to reflect how smart we are. And what a mistake that is! Not only do we normally lack the humility to realize how smart we aren’t, but we also lack enough smarts to realize how smart we were a few moments ago, let alone weeks and years.

My advice and recommendation for using decorators in python: You probably don’t want to do that. In the rare cases where it actually makes things simpler, then go ahead.

Software Cancer February 14, 2012

Posted by PythonGuy in Advanced Python.
Tags:
1 comment so far

You’ve probably heard of “Software Bloat”. It’s where you code grows in size and complexity over time, until it becomes an unmanageable monstrous mess.

I think I’ve discovered something new: “Software Cancer.” It’s where one part of your project is badly written, but you’re afraid to remove it completely. However, it’s so badly written that over time, it grows to be the one place where you spend all your time trying to fix the code and make it work. In the end, it grows in size and complexity until the entire project is devoted to that one aspect of the project.

I think I’ve identified an instance of software cancer in the real world. Rather than rely on the traditional and obvious ways of displaying dates and times, a certain fellow thought he was clever and could write things in a way he thought was best. In the end, several hundred lines of code, he actually managed to get it to work.

Over time, we realized there were edge cases he hadn’t sufficiently tested or planned for. For instance, if you’re in EST, and you’re talking about a date in the future during EDT, what offset should you be using? Oops.

The fix was obvious: remove the cancer, and replace it with code that does things the right way, the way everyone else does it. Instead, we decided to make a quick patch, and move on.

A couple more patches later, and now it’s clear that that growing mass of tissue is not healthy. If we let it continue to grow, pretty soon, we’re going to have to adapt the entire code base to live with it. And that will drive development costs through the roof.

Cancers often start out as clever mutations. They never end up that way. Sure, sometimes a mutation is good, and can help the project reach its goal quicker. However, your chances of finding that good mutation, and distinguishing it from what has worked for the past few years is really, really low. If you want to experiment with something new or novel, by all means, do so, just don’t do it in a mature project that we have to deliver on schedule and under budget.

Sharing Code February 12, 2012

Posted by PythonGuy in Advanced Python.
add a comment

As you rapidly develop your application, you’ve only got a few extra moments to think about how to build future applications that are but a distant concept. In preparation for that, you’ve probably been doing your best to adhere to these good standards:

  • Good code separation. Each function or class is very focused in what it does, and there isn’t a lot of smudging of the lines.
  • Good module separation. Modules tend to be as atomic as they can, meaning that the idea of borrowing a module from your project isn’t a bad one.
  • Good object interactions. Objects rely on only a small interface between them, meaning that they can be replaced with new objects without too much difficulty.
  • Limited dependence on the framework. This makes the code portable to other frameworks or even code written without a framework at all.

Granted, if we all wrote our code this way, we’d never get done.

The day arrives when you actually need to share code between two projects. There are a number of ways to approach this, each with their benefits and drawbacks.

1. Copy & Paste. Take the code you want, copy it, and paste it to where you want to use it.

The benefits of Copy & Paste is that you get exactly what you need right away. You can modify the copied code to your heart’s content.

The drawback is that you aren’t sharing code. If you make a beneficial change to the code that you want to backport, then you have to go make the same or similar change to the original code. Over time, as small changes are made at either end, the two pieces of code will diverge until they bear little resemblance to each other. At that point, code sharing stops completely.

In many cases, Copy & Paste is the preferred route. When I’m writing my HTML templates, I do a lot of Copy & Pasting, mostly because that format of coding is mostly linear and polluting the code with method and function calls makes it impossible to work with. You have to compare Copy & Paste with the other methods before you know it will be right for you.

Some people immediately discredit Copy & Paste. I don’t think that’s wise. Bad methods are sometimes the best methods.

2. Common function. Take the code you want to share, put it into a function, stick the function into a shared module (usually as a new, 3rd project), and have both the original and the place where you need the code used call it.

This seems rather simple to people who are used to Java and Ruby’s ideas about the object model. For many years, when C was the de facto standard, this is the only way to share code. This is, surprisingly, a very powerful method. In most of my work, this is usually the method I settle on.

The benefits are obvious: Both bits of code are literally using the same piece of code. Any improvement you make to the shared code is seen by the other.

The complexity arises when you want to dramatically change the shared code. You have to coordinate changes with both the original and new calling points. If you haven’t learned to deal with this problem in software engineering, then you need to familiarize yourself with it immediately.

Sometimes, the way you want to share code isn’t neatly encapsulated in a single function. There are some other ways to share code that improve upon this, but they are language-specific. Since we’re dealing with Python here, I’ll show you how you can fix things in Python.

3. Common base class. In this method, what you do is create a common base class for the two classes you are creating containing the shared code.

This allows you to do more complicated instances of sharing code, where you want to share all the code, but have specific chunks of it different in the new version. Those different bits are put into methods on the derived classes, and the common bits are put into the base class.

Problems arise when you want to have class A share with B and C, but in a different way. That is, it’s not simply a matter of replacign the methods that are shared between A and B with methods for C to get the result you need. Maybe you can figure out a grandfather class that will do what you need, but sometimes you can’t.

4. Mixin Class. Those of you with backgrounds in more limited object models will take a while to grasp this concept, so be patient. The idea here is that you take the sharing you want betwen classes A and B (which already derive from different base classes), and add a new base class that is common among them. This is multiple inheritance in action.

This works, and sometimes quite well. Oftentimes, a class is a conglomeration of funcionality from separate classes anyway, and so multiple inheritance is in the cards from the beginning.

The problem is, of course, increased complexity. This really isn’t an issue if you are good at keeping separate things separate, and limiting the complexity of the interactions between them. If you’re strictly working within the MVC paradigm, for instance, then Mixin classes are a natural result.

4.5 Metaclasses. I mention this because it is a possibility. When you run into the problem that lends itself to this solution, you will know, because you’ll say something like, “If only I could turn the manufacture of classes into a function of some sort, then all of my problems would go away.”

5. Decorators. In this method, what you need to share is the code around a chunk of code. That is, the stuff that happens before and after the different bits. In this scenario, you’re writing an inside-out function that you want to share via the decorator pattern.

This is really a sweet solution, particularly as you start to think of Aspect-Oriented Programming. There are a lot of things that simply don’t fit the other paradigms I’ve mentioned before (logging, debugging, performance tracking.)

The drawback is that 99% of the people that do decorators don’t do them right. There are methods to making a sweet, perfect interface for your decorator, found in the 3rd party decorator module. Hopefully, this will become a standard part of Python, and people will be encouraged to use it.

6. RPC RPC stands for Remote Procedure Calls. It’s a concept that’s been around since before the internet was invented, and it will be here forever. Take the shared code, stick it in a program running independently of your applications, and have the applications call the server to run the code.

There are a million and one ways to do this. I encourage you to start simple by building your RPC system off of HTTP. But you can explore for yourself and see what is out there.

This is probably the most complicated yet most robust solution of all. I can’t begin to describe all of the benefits or drawbacks of it. I will say this, however: I believe that over time, we’re going to be sharing most of our code this way. The fact that we’re heading towards a parallelized paradigm means this will be thrust upon us. It seems to me it is the best way to think about parallelization with our limited, serial capacities.

Counting Days in Python August 25, 2011

Posted by PythonGuy in Advanced Python, Python.
add a comment

When generating reports and charts, it’s very common that you want to generate a sequence of days between a range.

Before we begin, let’s remind ourselves that the Python pattern for sequences is to include the first item, but exclude the last. That’s how range() and all the other functions work.

I’ve seen people often do something like this:

days = (some list-generating code)
for day in days:
   ... process that day ...

This may be how you were trained to do this in other languages. I strongly recommend not doing this way. In general, using lists when you don’t really need them means you’re passing up some wonderful features of Python that will make your code more readable, more robust, and faster.

Some languages encourage you to avoid lists and use the 3-clause for loop that is not found in Python. I prefer lists over 3-clause for loops, mostly because I’ve never met one that was clear and obvious in meaning compared to the iterative approach.

For finite sequences of days, I prefer the generator. It gives me the highest chance of getting things right from the beginning.

for day in (... generator for a sequence of days ...):
    ... process that day ...

What is appropriate for that generator? A generic generator function should do the trick:

import datetime
def generate_days(start_day, end_day):
    day = start_day
    while day < end_day:
        yield day
        day += datetime.timedelta(days=1)

The above pattern is so common it’s a wonder why we don’t generalize xrange. The interface could be something like:

xrange(first=0, last, increment=1)
Returns first, first+increment, first+increment+increment, etc until but not including last. Requires that increment can be added to first and that the current item can be compared with last.

One can dream…

You’ll note that if we write our own general xrange generator, it is just this:

def our_xrange(first, last, step):
    cur = first
    while cur < last:
        yield cur
        cur += step

Of course, using itertools, you can combine the count() and takewhile(), but count() seems to only care for numbers. It’s trivial to write your own count() generator, though.

You’ll notice that I’m not dealing with numbers and then converting those numbers to dates. I much prefer objects that know what month and day they are in to integers that require lots of math to massage their meaning out of them.

Finally, I want to mention one special case that arises with databases. Oftentimes, you want to query data from the database for a sequence of dates. While some databases provide advanced features to allow you to do this within the database, many (most, now that NoSQL databases seem to be common) do not. I wouldn’t rely on the database doing the hard work for you. Besides, as I’ll show below, it’s really not that hard in Python, probably much easier than the solution the database provides.

There are two ways to handle this. One, we could pre-generate a list of dates, and then query the database for each date with an “in” condition (SQL). Two, we could ask for all the data between two date ranges, sort them, and then walk through the data in parallel with our date counter in Python.

The first is not desirable. This is because you have to send a lot of data across the wire—one date for each point on your graph, or row in your table. Let’s ignore it and move on.

The second is more desirable, but the problem of iterating through two lists in parallel is not something that’s obvious to most programmers. There is a simple solution, but it takes a bit of explaining. I’ll try to summarize that here.

What you want to do is to first generate the iterator that will give you all the dates you are interested in, given only the start, end, and interval. Then you want to query your database for all points of data that align with this, giving only the start, end, and interval. You’ll need to sort the data coming from the database. This will end up in another iterator of some sort.

Next, you want to iterate across your date iterator, performing some action. You’ll want to grab the row from the database only if the next row matches that date.

Here’s some sample code.

def date_result_pairs(dates, results):
    """Generates pairs of (date, result), one for each date in dates. If there is no corresponding result, then the result will be None."""
    dates = iter(dates)
    results = iter(results)
    try:
        result = results.next()
        for day in dates:
            if result.date > day:
                yield (day, none)
            else:
                yield (day, result)
                result = results.next()
    except StopIteration:
        for day in dates:
            yield (day, None)


dates = our_xrange(start, end, interval)
results = (query the DB for all data between start, end, matching interval, sorted by date.)
for date, result in date_result_pairs(dates, results):
    print "Result for date %s is %r" % (date, result)

This isn’t perfect. It assumes the database is returning dates that match exactly with the dates you have, which may or may not be correct given your ability to write the correct query.

I hope this has demonstrated some of the elegance of Python for these kinds of problems.

As always, questions and ideas are welcome.

SQLAlchemy Tips: Performance August 17, 2011

Posted by PythonGuy in Advanced Python, SQLAlchemy.
3 comments

Sometimes, people use SQLAlchemy, see that things are slow, and blame SQLAlchemy. In my experience, SQLAlchemy is never to blame for performance problems.

There are, in general, two areas of blame:

  • Poor implementation of query code
  • Poor implementation of database schema

Granted, SQLAlchemy makes things look deceptively simple. It handles so many optimizations for you that you generally should go with the first idea that pops into your head. You can get a lot done with bad code.

However, at some point, you need to think, really hard, about what you want to get down when and how.

I won’t even touch on the topics of optimizing your database. There are innumerable resources out there to discuss when and when not to use indexes and how to arrange your data for the best query response.

I will say, however, that you need to think about which machine is going to do the work that needs to be done. Is your database going to do all the number crunching, or are you going to send the data to your app and have it do the number crunching? There are arguments for both ways.

The database is going to have indexes and the most recent data for you. If you try to limit the number of queries you make and the amount of data you send back and forth between the database, you’re probably doing things right.

Occasionally, the operation is better handled by having the database send the data to the app and the app works it over. These cases are rarer than the former.

With that in mind, let’s look at some simple optimizations to push the work on to the database and outside of your app.

Optimization 1: Are you making too many queries?

A good hint that you are making too many queries is when you put a query inside of a for loop iterating over the results of another query. In general, these queries can be combined. It will require, however, that you learn a little bit more about SQL and how to write those SQL statements in SQLAlchemy. You may want to visit the advanced topics of nested SELECT statements, and fully understand it, so that you can see how to optimize your nested queries.

Another problem I see from time are hidden queries. Look at your logs. Identify which queries match which lines of code. If you can’t point to where the query originated from, then you’ve got a hidden query. You need to understand more fully what exactly your SQLAlchemy model is and which attributes are spawning which queries.

Optimization 2: Are you gathering too much data?

The obvious case is when you have big tables, or tables that join with many tables, and you are sucking down more data from the database than you want or need.

The way to resolve this is with a healthy dose of deferred() and grouping of those deferreds.

Spreading your data across multiple tables makes this a bit clearer. For instance, do not put your metrics data in the same table as the attributes of the object. Seldom do you want both the color of something and the number of times someone bought it at the same time.

Optimization 3: Pagination.

This one is so obvious for an experienced SQLAlchemy user, but is unfamiliar to novices.

SQLAlchemy provides powerful mechanisms for pagination. When you want to do pagination, follow this simple formula.

      Specify the query for the data you want, including only the columns you are interested in. Be sure to join with all the tables you need to join with in order to get the data you need. (NOTE: You can specify columns as the parameters to the query() method of your Session!) Be sure to include whatever sorting or filtering you need as well.
      Run the count() on that query. This is the total number of rows. Certain databases do not like the count() call very much, so if you need to, store this someplace safe and temporary, such as a cookie or session or even in the HTML form.
      Append the modifiers limit() and offset(). Some databases do not respect these, but most of them do.

Optimization 4: Sanity.

As a final check, just poke around the logs of both your app and your database and see how long each of the queries take. Identify the slower, more common ones, and attack those.

Are the queries necessary? Do they gather too much information? Or should they be paginated?

Finally, you’ll want to revisit your schema and indexes to see if there is a better way for the database to handle that data.

Optimization is an ongoing process. Try not to get hung up on premature optimization, and try to not let your unfamiliarity with SQLAlchemy or your database limit you. Take the time to learn the more advanced features of both, so that you can apply the full power to your problems.

Objects vs. Arrays August 16, 2011

Posted by PythonGuy in Advanced Python.
2 comments

I’m playing around with some 3D data. I think this is a good learning experience for those who want to understand the flexibility Python allows.

I started with the simplest case. I had three Numpy arrays. One was an array of vertices, another an array of normals, and the third an array of triangles which referenced three vertices and three normals.

Rendering the surface in OpenGL is trivial. Just glBegin(GL_TRIANGLES), then go through each triangle, calling glNormal3dv() and glVertex3dv() for each normal and vertex in the triangle.

Thinking of this in the computer science sense, it’s apparent that the list of vertices and the list of normals are not truly independent. If a vertex or normal isn’t mentioned as part of a triangle, there’s no reason to hold on to it in memory.

In a sense, the arrays of vertices and normals is an artificial data structure, created for convenience, not correctness. If I manipulated the surface so that triangles would change or disappear, there’s a good chance that the arrays of vertices and normals would be incorrect, holding information that is no longer needed.

I started down the OO path, creating a Triangle class, then a Vertex and Normal class. I realized, part way through this process, that I was moving away from the way things work in OpenGL land. Ultimately, I would load the vertices and normals into a buffer of some sort, and then I would issue render commands against the buffer of vertices. I haven’t quite seen how to do this yet, but it’s apparent that the array is the proper way to store these things, not as separate objects spread out all over creation.

The second problem was one of elegance. In the system I had devised earlier, it is quite natural that many triangles would share the same vertex, and these relationships are all but obvious. If I moved the vertex, then all of the connected triangles would likewise move, whether or not they were aware of the change. While I can implement a system where the same Vertex object is shared among many Triangle objects, I doubt it would be as elegant as the first solution I derived. Finding which triangles share the same vertex is not trivial in the OO approach, but in the indexed approach it is a simple filter for triangles that have the same index to the vertex.

I haven’t exactly resolved this dilemma in my mind. I suppose that I can create an object that uses invisible indexing into a massive array, and manages the memory within the array not unlike you’d have to do in C. I’m not convinced that such a solution is elegant or even beneficial.

In the long run, I think having a highly optimized set of data structures, each with their own unique access and modification rules, is probably the right way to go. I can “protect” these within a class, and then provide accessors and mutators to view and modify specific segments as needed. Perhaps having the elements defined as low as individual Triangles and Vertexes is simply too much detail.

Some Dict Patterns August 10, 2011

Posted by PythonGuy in Advanced Python, Python.
6 comments

Introduction

I have seen a lot of Python code, as well as Java, Perl, and other languages. When dealing with dicts, I have identified a few patterns that I feel are optimal, depending on the situation. These are extraordinarily simple.

NOTE: The dict data structure is known by many names: map, a hashmap, an associative array, a hash, or even a table.

Use if Present

For instance, how many times have you written the following pseudo-code in your language of choice?

# pseudo-code
if key is in map:
    lookup value in map with key
    use value

Python provides the “get()” method which returns None if it is not present.

# Python
value = map.get(key)
if value is not None:
    # use value

Of course, sometimes you need to distinguish between values that are None and that are not present in the dict. You can rely on exceptional behavior for this:

# Python
try:
    value = map[key]
except KeyError:
    pass
else:
    # use value

Or you can use the “in” test:

# Python
if key in map:
    value = map[key]
    # use value

The above, of course, does 2 dict lookups.

Exceptions versus Lookups

There is discussion about whether to use exceptions or lookups. The general rule of thumb is that exceptions are not as slow as you think, since Python is pretty slow to begin with. “Slow”, of course, is a relative term that is meaningful when you compare Python to C/C++. And nowadays with PyPy, “slow” isn’t a proper word for it anymore.

Use Value or Default If Missing

Sometimes you want to use a default value if the key is not present. This is simply:

# Python
value = map.get(key, 'default')

Note that whatever expression you use as the default value will be evaluated, whether or not it was used. If the expression is expensive to calculate, then you can use this form:

# Python
value = map.get(key)
if value is None:
    value = expensive_expression()

Notice that you’re back to the previous pattern if you need to distinguish between a value of None and a missing key.

Use Value or Default and Store if Missing

Sometimes you want to store the default value in the dict if it is missing. “setdefault()” is the ideal method for this.

# Python
value = map.setdefault(key, 'default')

Of course, the caveats for expensive default expressions applies.

Conclusion and Summary

Those of you who are unfamiliar with Python might note how similar all of the above patterns are. Indeed, if you simply learn what the following expressions mean, you don’t have to think very hard to understand what the code does or to choose the right code:

  • key in dict
  • dict[key]
  • dict.get(key)
  • dict.get(key, default)
  • dict.setdefault(key, default)