SQLAlchemy Tips: Performance August 17, 2011Posted by PythonGuy in Advanced Python, SQLAlchemy.
Sometimes, people use SQLAlchemy, see that things are slow, and blame SQLAlchemy. In my experience, SQLAlchemy is never to blame for performance problems.
There are, in general, two areas of blame:
- Poor implementation of query code
- Poor implementation of database schema
Granted, SQLAlchemy makes things look deceptively simple. It handles so many optimizations for you that you generally should go with the first idea that pops into your head. You can get a lot done with bad code.
However, at some point, you need to think, really hard, about what you want to get down when and how.
I won’t even touch on the topics of optimizing your database. There are innumerable resources out there to discuss when and when not to use indexes and how to arrange your data for the best query response.
I will say, however, that you need to think about which machine is going to do the work that needs to be done. Is your database going to do all the number crunching, or are you going to send the data to your app and have it do the number crunching? There are arguments for both ways.
The database is going to have indexes and the most recent data for you. If you try to limit the number of queries you make and the amount of data you send back and forth between the database, you’re probably doing things right.
Occasionally, the operation is better handled by having the database send the data to the app and the app works it over. These cases are rarer than the former.
With that in mind, let’s look at some simple optimizations to push the work on to the database and outside of your app.
Optimization 1: Are you making too many queries?
A good hint that you are making too many queries is when you put a query inside of a for loop iterating over the results of another query. In general, these queries can be combined. It will require, however, that you learn a little bit more about SQL and how to write those SQL statements in SQLAlchemy. You may want to visit the advanced topics of nested SELECT statements, and fully understand it, so that you can see how to optimize your nested queries.
Another problem I see from time are hidden queries. Look at your logs. Identify which queries match which lines of code. If you can’t point to where the query originated from, then you’ve got a hidden query. You need to understand more fully what exactly your SQLAlchemy model is and which attributes are spawning which queries.
Optimization 2: Are you gathering too much data?
The obvious case is when you have big tables, or tables that join with many tables, and you are sucking down more data from the database than you want or need.
The way to resolve this is with a healthy dose of deferred() and grouping of those deferreds.
Spreading your data across multiple tables makes this a bit clearer. For instance, do not put your metrics data in the same table as the attributes of the object. Seldom do you want both the color of something and the number of times someone bought it at the same time.
Optimization 3: Pagination.
This one is so obvious for an experienced SQLAlchemy user, but is unfamiliar to novices.
SQLAlchemy provides powerful mechanisms for pagination. When you want to do pagination, follow this simple formula.
- Specify the query for the data you want, including only the columns you are interested in. Be sure to join with all the tables you need to join with in order to get the data you need. (NOTE: You can specify columns as the parameters to the query() method of your Session!) Be sure to include whatever sorting or filtering you need as well.
- Run the count() on that query. This is the total number of rows. Certain databases do not like the count() call very much, so if you need to, store this someplace safe and temporary, such as a cookie or session or even in the HTML form.
- Append the modifiers limit() and offset(). Some databases do not respect these, but most of them do.
Optimization 4: Sanity.
As a final check, just poke around the logs of both your app and your database and see how long each of the queries take. Identify the slower, more common ones, and attack those.
Are the queries necessary? Do they gather too much information? Or should they be paginated?
Finally, you’ll want to revisit your schema and indexes to see if there is a better way for the database to handle that data.
Optimization is an ongoing process. Try not to get hung up on premature optimization, and try to not let your unfamiliarity with SQLAlchemy or your database limit you. Take the time to learn the more advanced features of both, so that you can apply the full power to your problems.