jump to navigation

Java Makes Me Sad June 5, 2015

Posted by PythonGuy in Java, SQLAlchemy.
Tags: , ,
add a comment

Today, we were faced with an engineering challenge. The solution was obvious: Let’s add a table to relate these two concepts. It will solve all of our problems elegantly.

The backend developer balked. “It will take hours to write, hours more to test and debug!”

See, we decided that Java was cool and we’d write our backend services in Java. Now we are paying for it. Well, we paid for it before we even got out prototype working.

This made me really sad.

In SQLAlchemy, adding a table is extraordinarily simple. Just declare a new class, add in the members, and you’re done. Adding a relationship is super-easy as well. Just add an attribute name for the other class and set it equal to the results of the relationship() function. Using the relationships are super-duper-easy. Really, 10 lines of new code would be all that was required to get our feature working with a relationship table. In the API method, we’d need to change 1, maybe 2 lines of code to get it plugged in. Total work to production quality: Less than 10 minutes.

Morale of the Story: Development time is always the most expensive time. Minimize it.

Also: The JVM sucks. Just sayin’.

Advertisements

SQLAlchemy Tips: Performance August 17, 2011

Posted by PythonGuy in Advanced Python, SQLAlchemy.
3 comments

Sometimes, people use SQLAlchemy, see that things are slow, and blame SQLAlchemy. In my experience, SQLAlchemy is never to blame for performance problems.

There are, in general, two areas of blame:

  • Poor implementation of query code
  • Poor implementation of database schema

Granted, SQLAlchemy makes things look deceptively simple. It handles so many optimizations for you that you generally should go with the first idea that pops into your head. You can get a lot done with bad code.

However, at some point, you need to think, really hard, about what you want to get down when and how.

I won’t even touch on the topics of optimizing your database. There are innumerable resources out there to discuss when and when not to use indexes and how to arrange your data for the best query response.

I will say, however, that you need to think about which machine is going to do the work that needs to be done. Is your database going to do all the number crunching, or are you going to send the data to your app and have it do the number crunching? There are arguments for both ways.

The database is going to have indexes and the most recent data for you. If you try to limit the number of queries you make and the amount of data you send back and forth between the database, you’re probably doing things right.

Occasionally, the operation is better handled by having the database send the data to the app and the app works it over. These cases are rarer than the former.

With that in mind, let’s look at some simple optimizations to push the work on to the database and outside of your app.

Optimization 1: Are you making too many queries?

A good hint that you are making too many queries is when you put a query inside of a for loop iterating over the results of another query. In general, these queries can be combined. It will require, however, that you learn a little bit more about SQL and how to write those SQL statements in SQLAlchemy. You may want to visit the advanced topics of nested SELECT statements, and fully understand it, so that you can see how to optimize your nested queries.

Another problem I see from time are hidden queries. Look at your logs. Identify which queries match which lines of code. If you can’t point to where the query originated from, then you’ve got a hidden query. You need to understand more fully what exactly your SQLAlchemy model is and which attributes are spawning which queries.

Optimization 2: Are you gathering too much data?

The obvious case is when you have big tables, or tables that join with many tables, and you are sucking down more data from the database than you want or need.

The way to resolve this is with a healthy dose of deferred() and grouping of those deferreds.

Spreading your data across multiple tables makes this a bit clearer. For instance, do not put your metrics data in the same table as the attributes of the object. Seldom do you want both the color of something and the number of times someone bought it at the same time.

Optimization 3: Pagination.

This one is so obvious for an experienced SQLAlchemy user, but is unfamiliar to novices.

SQLAlchemy provides powerful mechanisms for pagination. When you want to do pagination, follow this simple formula.

      Specify the query for the data you want, including only the columns you are interested in. Be sure to join with all the tables you need to join with in order to get the data you need. (NOTE: You can specify columns as the parameters to the query() method of your Session!) Be sure to include whatever sorting or filtering you need as well.
      Run the count() on that query. This is the total number of rows. Certain databases do not like the count() call very much, so if you need to, store this someplace safe and temporary, such as a cookie or session or even in the HTML form.
      Append the modifiers limit() and offset(). Some databases do not respect these, but most of them do.

Optimization 4: Sanity.

As a final check, just poke around the logs of both your app and your database and see how long each of the queries take. Identify the slower, more common ones, and attack those.

Are the queries necessary? Do they gather too much information? Or should they be paginated?

Finally, you’ll want to revisit your schema and indexes to see if there is a better way for the database to handle that data.

Optimization is an ongoing process. Try not to get hung up on premature optimization, and try to not let your unfamiliarity with SQLAlchemy or your database limit you. Take the time to learn the more advanced features of both, so that you can apply the full power to your problems.

2011 Predictions for Python January 13, 2011

Posted by PythonGuy in Pylons, Python, Ruby, SQLAlchemy.
add a comment

I guess you’re supposed to make predictions in January. Here’s mine.

At the end of 2011, Python 2 will still be more popular than Python 3, but some important projects will migrate to Python 3 anyway.

Sometime in 2011, a lot of Ruby coders are going to wake up and start playing with something else. Not that Ruby’s bad, it’s just not the best. And it has some serious issues.

Sometime in 2011. Guido van Rossum will say at least one really wise thing that will completely redefine how we look at some aspect of day-to-day programming. Maybe two.

Sometime in 2011, PyPy will get a version of Python that runs faster than CPython.

Sometime in 2011, serious discussion will be had about whether we should all program in the PyPy dialect—just in case it does take off.

Sometime in 2011, Pyramid will be released and probably ignored for a long time. Who knows? Maybe it will be better than Pylons. Maybe not. It’s not really important because there won’t be a huge reason to shift to it. Regardless, the team developing Pyramid is going to learn a lot about the problems they were trying to solve, and propose something even shinier and better to solve all the problems they tried to solve in the past and a whole lot of new problems they invented because they are actually from the distant future.

Sometime in 2011, SQLAlchemy will become so intelligent it may actually be considered as a running mate in the 2012 presidential elections.

Sometime in 2011, someone’s going to give example code for their research paper in Python. And someone else is going to read that paper and understand it even though they don’t know Python. And then they are going to go through the rest of the month thinking that Python is really this cribbage people use for writing research papers. When someone finally points out it’s a real language and the example code was actually production in a working system, he’s going to do a double-take and spit his coffee all over. It’ll be a mess.

Taming SQLAlchemy January 11, 2011

Posted by PythonGuy in Pylons, SQLAlchemy.
add a comment

There is one Python module out there that is more magical than all the others combined. Those of you who know it know it as SQLAlchemy.

After working intimately with SQLAlchemy for over two years now, here are the recommendations I have for you, the SQLAlchemy newbie, before implementing your first major project.

One: Always stick all your SQLAlchemy stuff into three separate modules. One has all the specific types (these may be spread around a bit), another has the tables, and the last has the objects.

Two: Your SQLAlchemy objects are the ideal model objects in the MVC architecture. That is, you don’t need a wrapper around anything, like you do in other languages. Simply adapt the classes into your own specific needs. Add additional functions and methods as needed. Be sure that your do as much as you can in the mapper, because the SQLAlchemy mapper function is smarter than Einstein.

Three: Sometimes your queries get really hard, and it’s about that time that you fall back to your tables. Thankfully, you can combine your objects and your tables in queries. One trick I used is to have your query on the outside be for the object you want, and then have a subquery that is simply for the IDs you are interested in.

Four: Ask for help when needed! SQLAlchemy is extremely complicated, and even the experts need advice or help from time to time. The mailing list is awesome, and people there are friendly.

Five: Just like it takes a bit of time before you’re comfortable with iterators, it will take a while before you start passing SQLAlchemy Query objects around as if they were lists. The vast majority of your data work will be massaging the Query object, so get used to it as the currency in your data algorithms.

Six: Leave performance for last. You might identify things that may slow down your query that you’ll want to remember right away, however. For instance, if you are iterating through a loop and querying a table for each iteration, then ask yourself whether it makes sense to simply stick that all in a query. After all, SQL’s looping mechanism kicks butt on Python’s looping mechanism because your SQL database should be smarter than you’ll ever hope to be. (I am a huge user of PostgreSQL so this is true. MySQL? Oracle? SQL Server? Not so much.) Even if your database isn’t very smart, SQLAlchemy is. The important point is that you trust SQLAlchemy and your database to be more optimized than any code you could ever write, and then come back later to revisit performance issues.

Seven: Ignore the No-SQL databases. Your SQLAlchemy setup should be much more elegant and transparent than the best No-SQL databases out there. When you’ve identified bottlenecks that cannot be addressed by better data organization or indexing, or smarter indexing and caching, then you can think of even better ways to do all the above. A good No-SQL database can be a good cache, but that’s about as far as I’ll trust them.

In my work on Deal-Digger.com, using SQLAlchemy has meant I can implement 3 to 4 table relationships from the top of the Pylons stack to the bottom in about 10 minutes. There isn’t a whole lot to test because my job isn’t to test SQLAlchemy. The queries are legible, easy to work on, and honestly, fun to play with. I heartily endorse SQLAlchemy and encourage you to use it whenever you need to speak SQL!

Pylons Rocks! January 6, 2011

Posted by PythonGuy in Pylons, Python, SQLAlchemy.
add a comment

Python Rocks! Pylons Rocks! (Pyramid—I don’t know yet.)

Here’s what Python Guy does for work now. Go subscribe and get yourself some deals.

From 0 to Awesome: SQLAlchemy part II January 1, 2008

Posted by PythonGuy in Advanced Python, From 0 to Awesome, Python, SQLAlchemy.
add a comment

Last time, we talked about what relational databases were and what tables were. We also looked into a short SQLAlchemy script that created tables, added, modified, and deleted rows, and queried the data.

Today, I am going to talk a bit more theoretically about data. (more…)

From 0 to Awesome: SQLAlchemy part 1 December 31, 2007

Posted by PythonGuy in From 0 to Awesome, SQLAlchemy.
5 comments

I wanted to try and run a series on SQLAlchemy and databases in general. This is the first post of my new series From 0 to Awesome. I hope to have another series as time permits.

First things first: What is a database and why do you care? (more…)

SQLAlchemy Rocks! Part II August 16, 2007

Posted by PythonGuy in Advanced Python, Myghty, Pylons, SQLAlchemy, Web Technologies.
2 comments

As a follow up to the previous article, SQLAlchemy Rocks!, here’s how I used the page object.

Remember, I’m using Pylons. Here’s the controller.

    def index(self):
        page = int(request.params.get('p', 1))
        results_per_page = int(request.params.get('rpp', 10))

        c.portfolio_page = model.portfolios_page(page, results_per_page)

        return render_response('/home/index.myt')

And portfolios_page is defined as:

def portfolios_page(page, results_per_page):
    return Page(Portfolio.query().order_by(asc('name')),
            page, results_per_page)

Here, Portfolio is an ORM object, not a table.

The corresponding Myghty template (just the relevant bits):

Items <% c.portfolio_page.first+1 %>
to <% c.portfolio_page.last+1 %>
of <% c.portfolio_page.total %>
<table>
  <tr>
    <th>Portfolio</th>
    <th>Net Worth</th>
    <th>Permissions</th>
  </tr>
%   for p in c.portfolio_page:
  <tr>
    <td><a href="<% h.url_for(controller="portfolio", portfolio=p.id) %>"><%
        p.name %></a></td>
    <td><% p.net_worth |money %></td>
    <td><% p.permissions(c.user) %></td>
  </tr>
%   # end for
</table>
Results per page: <% c.portfolio_page.results_per_page %>
Page: <% c.portfolio_page.page %>
Total pages: <% c.portfolio_page.pages %>

Hope this isn’t too much code. Take your time to grok it all. Feel free to ask specific questions in the comments.

SQLAlchemy Rocks! July 27, 2007

Posted by PythonGuy in Advanced Python, SQLAlchemy, Web Technologies.
5 comments

I’ve been a Python guy for quite a while. The number of web frameworks and options is dizzying. The hardest thing about creating a website in Python is choosing the framework. Once that choice is made, the rest of the ride is easy.

I’ve recently picked up Pylons and SQLAlchemy. I chose Pylons because a good friend of mine (a Ruby fanatic) suggested I learn Rails. I can’t bring myself to pick up Ruby (the syntax is too confusing and ugly for my tastes, reminding me too much of the bad memory that perl is.) But I don’t mind playing with the next best thing, and apparently Pylons is it.

I think I’ve finally gotten over the initial learning hump. I say that because the TODO list is shrinking. I can’t think of things to do faster than I can do them now.

This article isn’t about Pylons though. This is about SQLAlchemy. SQLAlchemy is the ORM I would’ve written if I was smart enough and had enough time and motivation. I have yet to find even one thing in the framework that I would have chose to done differently. Even the naming is pleasantly consistent and predictable.

Anyway, let me share with you one of the pleasant surprises that SQLAlchemy had in store for me. I wanted to do pagination of results from a database. Normally, this is painful. That’s because SQL isn’t a fun thing to write, and it certainly isn’t fun to write a program to write SQL queries.

Well, I threw together this simple Pagination class in a few minutes.

class Subset(object):

    def __init__(self, query, first, num):
        self.query = query

        self.first = int(first)
        assert self.first>=0, 
                "first must be positive. (first=%r)" % self.first

        self.num = int(num)
        assert self.num>0, 
                "num must be greater than 0. (num=%r)" % self.num

        self.total = 0      # Should be the SQL COUNT() on the full query
        self.results = ()   # Should be the range of results.

        self.last = self.first + len(self.results) - 1

    def __len__(self): return len(self.results)
    def __getitem__(self, i): return self.results[i]
    def __iter__(self): return iter(self.results)

class Page(Subset):

    def __init__(self, query, page, results_per_page):
        self.page = int(page)
        assert self.page > 0, 
            "page must be greater than 0. (page=%r)" % self.page

        self.results_per_page = results_per_page
        assert self.results_per_page > 0, 
            "results_per_page must be greater than 0. " 
            "(results_per_page=%r)" % self.results_per_page

        Subset.__init__(self,
                query=query,
                first=(page-1)*results_per_page,
                num=results_per_page)

        self.pages = (self.total-1)/self.results_per_page + 1

I removed the __doc__ strings and other comments to condense the code a bit.

Let me walk through the code a bit. There are two classes—Subset and Page. Subset is a subset of a larger query—from first to first+num items. Page is a page of results—given the page number and the results per page.

Page is really something tacked on top of the Subset. The only difference is that you are talking about pages with Page and a range of results for Subset. They are both useful in a web framework. Page is used in traditional, non-AJAX type paged results. But Subset will be useful if I do something AJAX-y, like a scrollbar that picks up a group of results at a time.

Anyway, there is a missing bit in Subset.__init__(). I am supposed to set self.total to the total number of results. And I am supposed to set the results to just this page of results. I wrote this code like this originally, dreading the moment when I had to actually implement the SQL bits. I mean, I had been disappointed by every ORM system I had seen up until this point. Either they had no concept of LIMIT and OFFSET, or they made it all but impossible to use without using a subselect. I was prepared to be disappointed in a big way.

As I read the documentation in SQLAlchemy, I discovered that this is trivial. Read what it looks like below:

        self.total = self.query.count()
        self.results = tuple(self.query 
                .limit(self.num) 
                .offset(self.first).all())

Yes, that’s it. It really is that easy. You add a “.count()” to the query to get the count. You add a limit() and offset() clause in similar manner, and then grab the results with all().

This worked, flawlessly, and I can’t imagine anything better. I don’t know what else to say.

(The code sample above is Copyright © 2007.)