Simeon Franklin

Agile Development

Announcing the Modesto Scripting Language Meetup

Announcing the Modesto Scripting Language Meetup. I can't believe the nearest developer-oriented meetups are in Pleasanton and Livermore - so I started one myself! We may not have enough developers in the Modesto area to have a narrow focus (like a Python Meetup) but I thought that expanding to "Scripting Languages" still gives us some common ground. I hope to meet other devs and talk about typical Scripting tasks like web dev, automation, testing, gaming, UI, etc and hear from other Pythonistas, Rubyists, PHPer's and so on...

We've already got 7 members and Meet-n-Greet scheduled - if you live in the Modesto area sign up to be updated on future meetups!

Posted January 25th, 2012 in Programming Python (Comment)


Calling all PyCon Rejects

I recently received a very polite email from the PyCon program chair telling me what I already knew: my talk proposal didn't get accepted. My talk was rejected.

Let me explain how I already knew, why being rejected is not a bad thing, and why I'm challenging you to put your rejection to good use both personally and for the Python community.

This year is going to be my first PyCon. I'm signed up for the conference and taking a couple of tutorials as well. I'm really looking forwards to attending and decided that not only would I attend I would propose a talk.

I knew from the start that my talk proposal probably wouldn't be accepted but I had no idea just how low the odds in fact were.

I'm a good presenter - in my new job as a technical instructor I usually have no trouble connecting with my students, thinking on my feet and keeping the material interesting. I recently had a middle-aged engineer taking my Python Fundamentals class tell me "you're the best technical presenter I've ever heard."

I haven't done very much conference style presenting outside of a few talks at Baypiggies but I'm not afraid of public speaking, I'm organized, and like to think that I have a decent sense of technical aesthetics - what I find interesting and informative usually strikes other geeks the same way too.

Why didn't I get to speak at PyCon?

In addition to proposing a talk I also volunteered to serve on the PyCon review board. I have to admit that I mostly lurked - the review board uses a web interface to rate the talks, IRC meetings to initially screen the talks, and then after similar talks are lumped together in groups further IRC meetings are held to approve or reject talks in various categories until enough talks have been accepted.

I realized after a few reviews that not only wasn't I qualified to speak at PyCon I probably wasn't qualified to review talks for PyCon. I've never been to a technical conference - even a smaller regional one - and I don't know the Python community outside the Bay Area at all. Because I've never been to a conference I wasn't thinking of the success of the conference when I proposed my talk - I was just thinking about me and what I would have fun playing with and talking about.

Two things stood out to me in the review process. First - given the number of talks submitted (~400) and the number accepted (~100) there is no reason for a talk to be accepted without multiple reviewers saying "I've heard this submitter before and can vouch for his or her presenting skills". This is absolutely relevant because it also affects conference-goers decisions. It certainly affects mine - I signed up for the IPython tutorial because Fernando Perez is teaching it. I've never heard Fernando talk but I can't imagine that the creator of IPython won't have something interesting to say about it. I don't know if it would have caught my eye if it was taught by a name I didn't recognize.

Second - the topic needs to attract a broad set of attendees. I proposed a talk on a package that is little known in a niche area. I remain absolutely confident that I could put together an interesting presentation - but would anyone come? If neither the name of the presenter nor the topic inherently draw a broad base of interest then the talk won't be competitive. Uncompetitive talks aren't good for a conference.

The Benefits of rejection

I can see the benefits of my rejection. For instance I am grateful that I was not accepted only to end up speaking to an empty room. Rejection is always better than public suckage! More specifically I now know what I need to do to (hopefully) eventually speak at PyCon and it turns out that its something that will be good for me and for the Python community. Here, of course, is where you come in as well.

I remain confident that my proposal would make a good informative and entertaining talk. I also now understand that without a track record of conference presentation skills (or a role with a popular Python tool or program) I won't be speaking at PyCon. to get to my goal I need to do a couple of things.

First I should go ahead and do my talk - and you should too, fellow rejectees. As one of the organizers of a Python UG I know that it is sometimes difficult to find and schedule interesting talks. Your local area Python group is probably no different. I'll be finishing my talk and pitching it to Baypiggies next year and I'd like to encourage you to do the same wherever you are.

This will be good for me - I can practice my presentation-fu and polish my talk. It will also be good for my immediate Python community if my talk is as interesting and informative as I think it can be.

I'm hoping to go one step further, however, and you if you have no local Python UG you might have to proceed immediately to step two. Take a look at the python.org list of conferences. I'm just guessing but I bet the competition to present at PyOhio or PyTexas would be considerable less rigorous than at PyCon. There are also other conferences that are not specifically oriented towards Python but would accept Python talks. How great for the Python community would it be if in 2012 local Python conferences had lots of great talks to choose from and other more diverse conferences benefited from strong Python tracks?

tl;dr

If your proposal didn't get accepted for PyCon take your talk to your local Python UG and polish your skills - this will be good for them and good for you. Propose the talk to a regional Python or Open Source conference in 2012; you've already got the proposal written! If you weren't brave enough to propose a talk this year but would like to present at PyCon "someday" start preparing now by participating in the larger Python community. When you write your PyCon proposal next year perhaps your additional success and exposure will mean that your proposal makes it past the initial screening.

Posted December 23rd, 2011 in Programming Python (Comment)


What is "Un-Pythonic"?

Tonight's Baypiggies topic is going to be What is Pythonic - Marilyn Davis is doing the presentation. I had a few thoughts but they're all in the opposite direction. I don't feel authoritative enough to define Pythonicness but as an instructor I frequently get to see code written by students new to Python that is obviously un-pythonic, for lack of a better term.

I thought about some recent examples and came up with a coherence to unpythonicness. See if you see the commonality between the following examples of Pythonic code:

Testing for empty values


if x == "":
    dostuff()
if len(x) == 0:
    dosomethingelse()

How about getter/setter methods


class Stuff(objects):
    def __init__(self, num):
        self.num = num
    def get_num(self):
        # Do some other stuff
        return self.num
    def set_num(self, num):
        # Do some other stuff
        self.num = num

or filtering a list


newlst = []
for val in lst:
   if somecheck(val):
       newlst.append(val)

Each of these examples is a pattern (simple or complex) that is directly supported by Python. Builtin datatypes have a boolean sense (a non-empty string is not True or False but it is "truthy"), getters/setters are supplied by the @property decorator and the filter builtin or a list comprehension is a more succinct and specific means of filtering a list.

My unifying principle? Code that does a task manually or explicitly that Python has syntactic or builtin support for is unpythonic. The corollary is that it is necessary to know the language well. Import __builtins__ and make sure you know it exhaustively. I recently discovered max() this way after having writen if statements to return the larger of two numbers. Code that uses syntax, builtins, and stdlib well could still be unpythonic - but reinventing the wheel definitely is!

Posted December 15th, 2011 in Programming Python (Comment)


AHAH with Django and jQuery

I was recently asked about using AJAX via jQuery with Django and mentioned that I frequently use html fragments and a decorator to add Ajax functionality to existing views. Let's see how that works.

I have an existing view that I want to refresh via AJAX. Let's make the simplest thing possible:


from django.shortcuts import render_to_response
from django.template import RequestContext

counter = [0]

def index(request):
    counter[0] = counter[0] + 1
    return render_to_response("index.html",
                              {'counter': str(counter[0])},
                        context_instance=RequestContext(request))
                              

My views function renders a template and increments a constant each time the view is loaded. I know, I know, my counter is reset if my server restarts and I'm not thread safe but hey - this is just an example! The view is rendered by two templates:

index.html

{% extends "base.html" %}
{% block mytext %}
Current counter value is {{ counter }}.
{% endblock %}

and base.html

<html>
  <head>
    <title>Simple Demo</title>
  </head>
  <body>
  <h1>This is a simple page</h1>
  <div id="replace_me">
    {% block mytext %}
    This is dynamic content that should be replaced.
    {% endblock %}
  </div>
  </body>
</html>  

Each time I reload the page I see something like:

This is a simple page

Current counter value is 2.

We've got our initial case setup, lets make it AJAX! No, on second thought lets make it AHAH! AJAX technically stands for Asynchronous Javascript And XML but I rarely find myself using XML lately. If I actually need a data interchange format I usually use JSON - but I also frequently find myself using the Asynchronous Javascript and HTML pattern instead. I guess that comes out to AJAH but AHAH is definitely more fun to say.

The technique is very simple - the part of my page that needs to be loaded Asyncronously can be managed by just loading a snippet of HTML and inserting it instead of exchanging data and using Javascript to rebuild the page. This is usually less code (especially Javascript) and has the advantage of using the same views on the server side and hopefully the same templates. It's also built into my favorite Javascript Framework - so lets see it in action.

First I'm going to add the jQuery library to my base template and the JS and html necessary to trigger the asyncronous reloading. Now my template looks like:


<html>
  <head>
    <title>Simple Demo</title>
    <script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js" type="text/javascript"></script>
    <script type="text/javascript">
      $(function(){
          $("#click_me").click(function(){
              $("#replace_me").load("/ #replace_me");
          });
      });
  </script>
  </head>
  <body>
  <h1>This is a simple page</h1>
  <div id="replace_me">
    <div>
    {% block mytext %}
    This is dynamic content that should be replaced.
    {% endblock %}
    </div>
  </div>
  <a id="click_me" href="#">Click Me</a>
  </body>
</html>  

I'm not going to explain the javascript in detail - but even inexperienced jQueryists can see that I added a function triggered by clicking my link and the function uses jQuery's built in .load() function to make an asyncronous call to the url "/". I also specify a CSS selector so the loaded page (which is equivalent to refreshing the current page) is parsed and the contents of the first div inside of #replace_me are inserted into the current page's #replace_me div. My number changes without a browser reload! Woohoo!

But lets make this better and slightly more complicated. Maybe we don't want to render the whole page each time because our base template does complicated things like showing the logged in user, building a menu, constructing a recent changes sidebar and so on. We'd like each Asyncronous call to only build the piece of dynamic data that's changing.

To do this we can take advantage of a utility method on the Django request object called .is_ajax(). This depends in turn on using a sane browser or Javascript framework but in our case jQuery makes sure that a "X-REQUESTED-WITH" header is sent with each asyncronous request. The view code now reads:


from django.shortcuts import render_to_response
from django.template import RequestContext

counter = [0]

def index(request):
    counter[0] = counter[0] + 1
    if request.is_ajax():
        template = "index_ajax.html"
    else:
        template = "index.html"
    return render_to_response(template,
                              {'counter': str(counter[0])},
                        context_instance=RequestContext(request))
                              

My new index_ajax.html template has just the fragment we're interested in and my old index.html has been modified to include it:

index_ajax.html

Current counter value is {{ counter }}.

and index.html

{% extends "base.html" %}
{% block mytext %}
{% include "index_ajax.html" %}
{% endblock %}

I also alter the base.html template to drop the css selector in my .load() call. Looking at the requests in firebug confirms that only the piece of text I want is included each time but the page works as before - the first time the page loads normally and clicking the link fires an asyncronous call that returns only fragment we want and inserts it into the parent page.

base.html

<html>
  <head>
    <title>Simple Demo</title>
    <script src="//ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js" type="text/javascript"></script>
    <script type="text/javascript">
      $(function(){
          $("#click_me").click(function(){
              $("#replace_me").load("/");
          });
      });
  </script>
  </head>
  <body>
  <h1>This is a simple page</h1>
  <div id="replace_me">
    {% block mytext %}
    This is dynamic content that should be replaced.
    {% endblock %}
  </div>
  <a id="click_me" href="#">Click Me</a>
  </body>
</html>  

Things now work as I want but we can still clean up our code a bit. I usually use the decorator from django-annoying to give myself a nice render_to shortcut. You can read the code to see how it works but using it is simple - either specify your template in the call to the render_to decorator or return it in the "TEMPLATE" key in the dict that your view returns - render_to will handle generating a request_context and rendering your template for you. Now my view looks like (with the linked decorator copied to decorators.py next to my views.py):


from django.shortcuts import render_to_response
from django.template import RequestContext
from decorators import render_to

counter = [0]

@render_to()
def index(request):
    counter[0] = counter[0] + 1
    if request.is_ajax():
        template = "index_ajax.html"
    else:
        template = "index.html"
    return {'TEMPLATE': template, 'counter': str(counter[0])}

much nicer and everything works as before. This is such a common pattern for me, however, that I put the template picking logic in the decorator itself right after it pops the TEMPLATE variable:


    tmpl = output.pop('TEMPLATE', template)
    if request.is_ajax():
        if "AJAX_TEMPLATE" in output:
            tmpl = output.pop("AJAX_TEMPLATE")

This allows me to just specify my two templates in my returned dict and the right one will automatically be picked. One last time for views.py:


from django.shortcuts import render_to_response
from django.template import RequestContext
from decorators import render_to

counter = [0]

@render_to()
def index(request):
    counter[0] = counter[0] + 1
    return {'TEMPLATE': 'index.html',
            'AJAX_TEMPLATE': 'index_ajax.html',
            'counter': str(counter[0])}

And we're done. A single line of Javascript (more or less) enables our Asyncronous call and at the price of one more template we can alternately render fragments or the whole page for our view with the decorator cleaning up our view code and handling the template choosing for us.

Posted August 22nd, 2011 in Programming Python (Comment)


PDB Howto

Cool! See Max's awesome video editing skills render my pdb howto relatively stumble free! If you're curious about how to use the built in Python debugger this brief video tutorial should get you going.

Posted August 11th, 2011 in Programming Python (Comment)


The best way to OR a list of Django ORM Q objects

A co-worker asked me today about the best way to OR together a list of Q objects in the Django ORM. Usually you have a specific number of conditions and you can use Q objects and the bitwise OR operator to logical OR the query conditions together. The Q object is necessary because multiple arguments or successive calls to .filter are ANDed. But what if you have an arbitrary number of Q conditions?

One suggestion is to use the undocumented .add method of the Q object in a loop to add multiple query conditions together. I thought this might be a good use case for reduce and the operator module:



# Normal Usage with the | operator
from django.db.models import Q
qs = MyModel.objects.filter(Q(cond1=1) | Q(cond2="Y"))

#but given a list of Q conditions
from operator import __or__ as OR
lst = [Q(...), Q(...), Q(...)]
qs = MyModel.objects.filter(reduce(OR, lst))

Is this the most Pythonic approach?

Posted June 14th, 2011 in Programming (Comment)


Teaching Retrospective

As you can probably figure out from the last post, I have a new gig.

Thanks to the good folks at marakana.com I recently taught a Python Fundamentals course in San Francisco.

This was my first time teaching for Marakana and I enjoyed the experience immensely - as did my students judging by their class reviews at the end! I had a blast and am looking forwards to more python classes, including an under-development Pro Django course. More details to come in this space - and thanks to Jas, Brenda, Mike, Chris, and Robert for being great first-time students. I hope you all go on to Pythonic success.

Posted June 14th, 2011 in Programming (Comment)


Python Fundamentals Resources

This will just be a grab bag of extra resources and notes for my students taking the Python Fundamentals Course at Marakana. Don't forget to grab the labs.

Additional Resources

Important Basics You Should Know

Python builtin functions: see the http://docs.python.org/library/functions.html. Or


import __builtin__
dir(__builtin__)

Keywords: see the docs http://docs.python.org/reference/lexical_analysis.html#keywords. Or


import keyword
print(keyword.kwlist)

Tools

Additional Documentation Referred to in Class

Don't forget the old labs - new labs and samples coming soon!

Posted May 30th, 2011 in Programming (Comment)


WSGI Mysteries

Recently I had a mysterious error cropping up on a Satchmo Store I set up for a client. Every so often I would have an HTTP 500 error with the error log indicating that a template could not be read because the specified charset wasn't available. I edited the Django template loader code to produce more information and only managed to push the exception down into the Python source code.


str = codecs.open(filepath, encoding="utf-8").read()
File "/opt/python2.6/lib/python2.6/codecs.py", line 865, in open
    file = __builtin__.open(filename, mode, buffering)
LookupError: unknown encoding: ANSI_X3.4-1968

The template that was being opened was valid UTF-8 and even stranger - as the same user and using the same virtualenv as the WSGI app I can open a file specifying an ANSI_X3.4-1968 encoding.

It occurred to me that it must be problem with the environment somehow so I spent some time reading the WSGI documentation. Eventually I changed one line in my wsgi config for the app and resolved my problem. The configuration:


    WSGIDaemonProcess sitename user=sitename threads=1 display-name=%{GROUP}
    WSGIProcessGroup sitename
    WSGIApplicationGroup sitename
    # Previous setting:
    # WSGIApplicationGroup %{GLOBAL}

If I understand correctly all my wsgi applications configured using %{GLOBAL} were sharing the same interpreter. Somehow another process must be messing up the state of the interpreter somehow - using WSGIApplicationGroup with a value forces this application to run in its own sub-interpreter and this cleared up my problem. What I don't have is any insight into why my original error was occuring. Any thought?

Posted April 28th, 2011 in Programming (Comment)


Hidden Django QuerySet Features II

Recently while reading a co-workers code I discovered that Django's Queryset can be OR'ed together - but this may not always be a good idea. Django's ORM overloads the bitwise OR operator to express logical OR and the documentation demonstrates this with a Q object. Let's query the ORM to find recently active users - users who joined this year or who have logged in this year:


>>> from django.db.models import Q
>>> from django.contrib.auth.models import User
>>> from datetime import date
>>> jan_1st = date(2011, 1, 1)
>>> recent = User.objects.filter(Q(last_login__gte=jan_1st) 
                             | Q(date_joined__gte=jan_1st))
>>> print(recent._as_sql())

('SELECT U0."id"
  FROM "auth_user" U0
  WHERE (U0."last_login" >= %s
          OR U0."date_joined" >= %s )',
 (u'2011-01-01 00:00:00', u'2011-01-01 00:00:00'))

The where clause in the generated SQL is exactly what we wanted. It turns out, however, that you can use the | operator on querysets directly to yield the same thing.


>>> from django.contrib.auth.models import User
>>> from datetime import date
>>> jan_1st = date(2011, 1, 1)
>>> recent_login = User.objects.filter(last_login__gte=jan_1st)
>>> recent_join = User.objects.filter(date_joined__gte=jan_1st)
>>> recent = recent_login | recent_join
>>> print(recent._as_sql())
('SELECT U0."id"
  FROM "auth_user" U0
  WHERE (U0."last_login" >= %s
          OR U0."date_joined" >= %s )',
 (u'2011-01-01 00:00:00', u'2011-01-01 00:00:00'))

Sweet! It's always bothered me that chained calls were AND'ed together and there was apparently no way to do a simple OR without importing an additional class. However this feature has to be used with care - you are OR'ing entire queries instead of clauses so careless usage might lead to expensive queries. For instance imagine that we want to only look at certain automatically created recent users whose usernames start with "applicant". OR'ing entire querysets works. Sort of.



>>> from django.contrib.auth.models import User
>>> from datetime import date
>>> jan_1st = date(2011, 1, 1)
>>> applicants = User.objects.filter(username__startswith="applicant")
>>> recent_login = applicants.filter(last_login__gte=jan_1st)
>>> recent_join = applicants.filter(date_joined__gte=jan_1st)
>>> recent = recent_login | recent_join
>>> print(recent._as_sql())
('SELECT U0."id"
  FROM "auth_user" U0
  WHERE
   ((U0."username"::text LIKE %s  AND U0."last_login" >= %s )
     OR
    (U0."username"::text LIKE %s  AND U0."date_joined" >= %s ))',
 (u'applicant%',
  u'2010-01-01 00:00:00',
  u'applicant%',
  u'2010-01-01 00:00:00'))

Note that this did exactly what we asked - OR'ed two querysets. This results in duplication in the where clause - we really only want to run the LIKE search on the username once and then filter the results by OR'ing the two date criterion. The results from the this query will be correct but the execution (depending on your DB backend) may be slower as the LIKE operator is run on the entire data set twice. In my case I noticed an excessively complicated query in the SQL results pane of my django-debug-toolbar. Due to directly OR'ing two complicated querysets four or five expensive operations were being duplicated on a table with a million records. Switching to a Q object produced a much shorter (and faster) query with the same results.

Posted February 20th, 2011 in Programming (Comment)


Hidden Django QuerySet Features I

I've been paying attention to the generated SQL the Django ORM provides lately. I've had to a bit of performance tuning on some apps I wrote so I started out with the django-debug-toolbar app which will show you all the queries run on a given page and their runtime. This is an indispensable tool in my daily toolkit but I wanted to play with creating queries in my console. How do I see the SQL that is generated?

Somewhere (probably just through introspection) I found the semi-private _as_sql() method.


>>> User.objects.filter(is_staff=True)._as_sql()
('SELECT U0."id" FROM "auth_user" U0 WHERE U0."is_staff" = %s ', (True,))

Notice that this doesn't return all the fields ("select id from..."). This method also won't work on value QuerySets and shouldn't be depended on as it is not a part of the public interface of the QuerySet class. It does helpfully returns a two part tuple of the query string with placeholders and a tuple containing the query parameters. This is useful if you want to tweak the parameters on the fly and paste them into a db console. A better method to get the sql actually being executed is to access the .query member of a QuerySet:


>>> users = User.objects.filter(is_staff=True)
>>> users.query
<django.db.models.sql.query.BaseQuery object at 0x9df63ac>
>>> str(users.query)
'SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name", 
"auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", 
"auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", 
"auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."is_staff" = True '

Finally you can see all the queries that have run by looking at the connection object.


>>> from django.db import connection
>>> print connection.queries
[{'sql': 'SELECT ....',
  'time': '0.009'},]

I redacted the actual query but this returns rows of dicts with the sql that was run and how long it took to run.

Posted February 14th, 2011 in Programming (Comment)


Python Code Quality

(I'm presenting the Newbie Nugget tonight @ Baypiggies. My topic is Python Code Quality - read on for the scoop.)

Code quality sometimes seems like an inherently subjective term - you like OOP, I like procedural, you prefer CamelCase and I like delimited_identifiers. Explicit self is ugly, explicit self is explicit and therefore pythonic. And some areas of code quality are even harder to quantify - what makes an API elegant? How do you measure Pythonic-ness? Ok, that's not even a word - but I just want to issue the disclaimer at the beginning - high quality code will continue to be a matter of opinion. Low quality code - well that we can measure.

One more quick disclaimer here - why do we care about code quality? Let's face it - there are two reasons that we need to improve the quality of our code. The first reason is that I suck. It's true! Sometimes I write really poor code; usually, even. There are lots of reasons; excuses really - maybe I'm exploring the problem space. Maybe I thought it would be a one-off script. Maybe I'm new to the language or the library. Maybe I'm under time pressure so I just want it to work. Maybe I just don't care.

The second reason is that you suck too. In fact the only code worse than my code is other people's code - and I'm confident that you all could make that statement yourselves.

Seriously though - every working programmer at some point deals with maintenance, with bugfixing, with legacy code and has to start refactoring. I can't encourage strongly enough the Martin Fowler refactoring book - it isn't just for Java programmers and it will help you think practically about how you can get from "here" (large code base of varying quality) to "there" (better quality code that's easier to maintain, bugfix, etc).

So lets look at a couple of tools for finding crappy code in your python projects. Just for fun I ran these on the current Django trunk to see if I could find any dusty corners.

First up is a tool called clonedigger. Clonedigger is really cool and does exactly what it says - it looks for clones or regions of similar code. Often these are evidence of copy-n-paste style programming and should generally be refactored. DRY!

Clonedigger installs via easy_install and running it on the Django trunk took over an hour and produces an html report. It found 1323 clones and says that 6,143 out of 50,782 are duplicate lines (12%) but most of these were legitimate duplication; locale files for instance.

You can see an example of the output here. Basically clonedigger has detected that the classes that define the widgets for the DateInput, DateTimeInput, and TimeInput are 18 lines of code apiece but differ only by the classname in the call to super. Introducing a common parent class or having 2 of the three subclass the other would eliminate the duplication (reducing the code by 36 lines) and more importantly make clear that currently the widget for all three classes functions in exactly the same way - something that isn't instantly obvious when you scan the code.

There might be good reasons not to introduce another class in the hierarchy and arguably you shouldn't have DateTime subclass Date (or vice versa), similarly the clones found in django/db/models/fields/__init__.py might best be left alone (should IntegerField subclass FloatField or should it be the other way around). Clonedigger did find some repetition in in the generic views, however, and if you're interested you can download the whole output (137k gzipped) here

OK - the Django codebase certainly has less duplication than the stuff I produce, clonedigger has found some nice areas needing refactoring in my own code. I've also used another tool to find different sorts of problems to good effect, so let's take a look at the Cyclomatic Complexity in Django.

Cyclomatic Complexity is basically the measure of how complicated a unit of code is - it counts all the independent paths through a unit of code to produce a unique score. Obviously a function that has a very high cyclomatic complexity score (say 100) needs to be refactored. It's doing too much for you to get your head around, it can't be unit tested and can't safely be changed. The refactoring necessary might simply be to extract a lot of methods or functions but I frequently find an area of high cyclomatic complexity indicates a problem that needs some rethinking as to the approach.

I've used David Stanek's tool pygenie to scan python code and report on Cyclomatic Complexity - it isn't released but you can check it out of svn and use it to scan your python source.

Running it on the Django trunk produces a text report with any functions with a CC score of more than the ideal of 7. Django actually is outstandingly well written by this measure - the high scores are some dense thickets of third party code. The doctest and pure python Decimal implementation have functions with scores in the 20's, 30's and one 52!

The highest scores in code that originated in the Django project looks like it lives in the utils module. The normalize method in django/utils/regex_helper.py is pretty scary and has a CC score of 25! To be fair, it's reversing regex patterns - that's a legitimately complicated task that may be written as it is for performance reasons.

A more likely candidate for refactoring by mere mortals is the truncate_html_words function in django/utils/text.py - although with a CC score of only 14 and using regexes to parse html and close any tags in the truncated portion it's also legitimately complicated. The _html_output method of the BaseForm class could probably safely be tackled but even this doesn't look to bad.

Pygenie is actually a more useful tool than this demonstration shows - on my own codebases it picked up some unmaintained messes of procedural code that was complex only because I was lazy. You can look at the report for the rest of the Django code base but I encourage you to use this tool on your own code - it runs fast and is ignored at your own peril.

Code quality is subjective. It's possible to have crappy code with low duplication and low cyclomatic complexity. But removing duplication from your code and making sure that the codebase stays in discrete (testable) chunks definitely helps.

Posted February 25th, 2010 in Programming Python (Comment)


Pydelatt

Part of what I do for my clients is manage their software/hardware infrastructure. Most of my clients are not large enough to have dedicated sys-admin staff so in addition to wearing the software developer hat I sometimes get to wear the sysadmin hat as well. This is not always a good thing and sometimes I end up writing software (what I like to do) to fix a sys-admin style problem (the stuff I don't like to deal with).

So recently a colocated box I manage for a real estate company started to run low on disk space. The main culprit was the the mailbox accounts - realtors frequently mail large documents (pictures, contracts, flyers, etc) and most of the mail accounts had a gig or two of mail. I decided to set a policy of deleting old attachments and looked for a tool to accomplish this task.

No luck - Dan Born's Delatt looked like it would do what I wanted but I couldn't actually get it to work. This was probably my fault but trying to figure out what wasn't working meant debugging Perl. Not my favorite language, and more to the point my Perl chops are about a decade rusty now. So I wrote a tool in python to do what I want. Pydelatt accepts a maildir filename and strips out any attachments whose mime type is not text/*.

All the usual caveats apply (use at your own risk, attachments are deleted irrecoverably and user error may cause your hair to burst into flame) but I'm using it as a policy tool (`find -mtime 120 -size +3M | xargs -ix pydelatt.py 'x'`) tool and I've successfully run it on a couple hundred gigs of email without incident for a month now...

Posted December 21st, 2009 in Programming (Comment)


Off to present at Baypiggies again

I'm off to present again. My topic is Fixing Django with 3rd party apps and it's some best practices advice plus dev oriented apps I think are useful. The slides are here in s5 format (hit the spacebar to advance).

Update: The presentation went well - a few additional notes. The slides don't show it but I live demoed Rob Hudson's django-debug-toolbar and the command-extension runserver_plus/werkzeug debugger. My slide on South is non-informative because I followed Glen Jarvis' presentation on South... I had fun and I'll post links to the videos when they get posted.

I had follow up questions afterwords about finding cool 3rd party apps - and was trying to remember the recent blog post I saw that had a nice list. For anybody still looking for that check out Kevin Fricovsky's post on the apps that power mingus.

Posted October 22nd, 2009 in Programming (Comment)


Jutda Helpdesk

I mostly use small 3rd party django apps that provide discrete pieces of functionality. sorl-thumbnail or django-mptt, for example, don't provide any views, they are helpers to provide dynamic thumbnailing and tree-operations to existing models.

I do use a few more "stand-alone" apps (like django-filebrowser and the basic-apps suite) but I tend not to use apps that provide a ton of functionality or try to run the whole site. I used to have only one exception to this rule (satchmo, about which I'll have more to say in the future) but I recently added a second exception.

If you need a simple standalone helpdesk, Jutda Helpdesk is your one stop shop. Recently I needed a workflow with a particular client that had more structure than CC'ed emails or even basecamp todos. Jutda worked out of the box (after a couple of one-line fixes, patches for which were immediately accepted and applied) and provides an interface for users to report issues, admins to assign them and everybody to get email notifications as status changes occur.

This is a substantial project (there are features I didn't explore like ticket creation from an email inbox, an API and customizable RSS feeds) which just works. My compliments to Ross Poulton!

Posted May 14th, 2009 in Django Apps Recommends (Comment)


Sample fabfile

Dan asked in the comments on my Baypiggies post if I could post a sample fabfile. I'll post the fabfile of the project I'm working on right now along with an explanation since it's doing a few different things... First the code - then the commentary:

import os


config(
    project = 'apple',
    fab_hosts = ['redacted.com'],
    fab_user = 'apple',
    django = '/opt/django1.0/django',
    package_file = "pyenv/lib/python2.5/site-packages/easy-install.pth",
    package_location = "pyenv/lib/python2.5/site-packages/",
    pth = """/usr/lib64/python2.3/site-packages
/usr/lib64/python2.3/site-packages/PIL
/home/$(fab_user)/django_site
/home/$(fab_user)/pyenv
/home/$(fab_user)/django_site/$(site)
/opt/django1.0
/opt/python-packages"""
    )

# Local convenience functions not related to deployment


def local_django():
    """ Link to django. Not virtualenv installed since shared install
    on server """
    local("ln -s /web/django_src/Django-1.0/django ./$(package_location)")


def syslibs():
    """ Link packages from local sitepackages I don't want to build
    via pip. Installed already on dev box, installed in global
    environment on the server"""

    for f in ["_mysql_exceptions.py", "_mysql.so", "MySQLdb"]:
        local("ln -s /var/lib/python-support/python2.5/%s $(package_location)" % (f, ),
              fail="ignore")
    local("ln -s /usr/lib/python2.5/site-packages/PIL $(package_location)", fail="ignore")


def setup():
    """ Assuming you copied another sites requirements file,
    initialises pyenv """
    local("pip -E ./pyenv install -r requirements.txt")


# Deployment commands
# Select test or production, build the package to transport, than deploy:
# $ fab production build deploy


def test():
    config.site="test_site"


def production():
    config.site="prod_site"


@requires('site', provided_by = ['test', 'production'])
def remote_env():
    """Not deploying virtualenvs - just setting up python env for shell via ~/.python/django.pth
    files."""

    run("mkdir /home/$(fab_user)/.python")
    lines = open(config.package_file).readlines()
    # limit to all the packages in the src dir and rewrite path for server
    pkg_lines = [l.replace('/web/', '/home/') for l in lines if "/src" in l]
    # We just manage server environment with ~/.python/django.pth file
    run("""echo "$(pth)\n%s" > /home/$(fab_user)/.python/django.pth""" % "".join(pkg_lines))


@requires('site', provided_by = ['test', 'production'])
def build():
    """Build tarballs for transfer of the "libraries" (external django apps)
    and the site (settings, media, template and site-specific django apps)"""

    #local("pip -E ./pyenv/ freeze requirements.txt") # Not using till bundles work better
    local("tar -czvf pyenv.tar.gz pyenv/src") # not wanting to checkout on server, just tar src dir
    local("cd django_site/mysite;bzr export ../../$(site).tar.gz")


def django():
    """Remotely link in appropriate django on the server"""
    run("ln -s $(django) /home/$(fab_user)/pyenv/src")


@requires('site', provided_by = ['test', 'production'])
def deploy_pluggables():

    # Hmm. Bundles are buggy w/ bzr, don't want to have to install from req (and check out)
    #put("requirements.txt", "requirements.txt")
    #run("pip -E ./pyenv install -r requirements.txt")
    # So just tar pyenv/src, transfer over and untar
    put("pyenv.tar.gz", "/home/$(fab_user)/")
    run("tar -xzf /home/$(fab_user)/pyenv.tar.gz")
    run("rm /home/$(fab_user)/pyenv.tar.gz")


@requires('site', provided_by = ['test', 'production'])
def local_settings():
    """Put the remote local_settings.py file (not the one in /django_site/mysite)"""
    put("django_site/local_settings.py", "/home/$(fab_user)/django_site/$(site)")


@requires('site', provided_by = ['test', 'production'])
def deploy_site():
    """Deploy the django project and custom apps in django_site/mysite"""

    run("mkdir /home/$(fab_user)/django_site")
    put("$(site).tar.gz", "/home/$(fab_user)/django_site")
    run("cd /home/$(fab_user)/django_site/; tar -xzf $(site).tar.gz")
    run("rm /home/$(fab_user)/django_site/$(site).tar.gz")
    # Put the local settings file for the remote server
    local_settings()


@requires('site', provided_by = ['test', 'production'])
def deploy():
    """Transfer pluggables and django site to remote server and verify
    that remote environment is prepared"""

    deploy_pluggables()
    deploy_site()
    remote_env()

Ok - what does all that do? Let me explain the flow for this project and then show how the fabfile supports it. In this case I'm developing locally with pip and virtualenv but not using either on the deployment server. The deployment server has a few libraries globally installed (MySql driver and PIL, primarily) but I want my 3rd party django apps and my custom app for the individual site to be in the virtualenv. On the server I'm building the environment by including ~/.python in the $PYTHONPATH environmental variable and adding paths to the django.pth file in ~/.python. Supervisor is in charge of running each process as the appropriate user - eventually I hope to migrate to mod_wsgi and use virtualenv on the server but I'm not there yet... I'm also not installing a virtual env copy of Django for this project - I'm trying to stick to releases for any substantial projects so I just symlink in the 1.0 release. Similarly on the server I've got several releases of Django in /opt and just symlink to the appropriate version for each project.

The first few functions in the fabfile after the call to config are just convenience for local development. I've found fabric a great place to stash frequently run shell commands and save on typing (instead of issuing a series of find calls to clear out compiled files, temporary files (.pyc, .py~, etc), for example, I could put several calls to local() in a function called cleanup and `fab cleanup` instead).

Starting at test() and production() I'm building and deploying my project. The test() and production() functions just pick my destination directory - I usually deploy to a test directory and run the built in server with sqlite to test. If everything checks out I deploy to the production directory and restart my django process in supervisorctl. Next remote_env() builds my remote environment as described by making sure ~/.python exists and writing a .pth file in it... The .pth file gets the hard coded libs from the config file plus everything listed in the easy_install file. This is pretty hacky - hence the desire to move to virtualenvs on the server...

The actual build process in build() just packages my django site's source directory to a tarball using bzr and tars up the virtualenv's /src directory for transport. The deploy commands transfer and untar the files and copy the remote site's local_settings.py file over. Breaking my "pluggables" (eg: possibly 3rd party django apps I'm not editing for this project) as a separate step from my site allows my "mysite" directory to only contain code I'm directly working on and lets me `fab build deploy_site` and only transfer the site specific code...

If this doesn't make sense and you haven't looked at the presentations in my previous post be sure to do so...

Posted May 2nd, 2009 in Programming (Comment)


Baypiggies Presentations

Last night I participated in the Baypiggies Tools Night - I ended up in charge of the evening and listened to interesting presentations by Sandrine Ribeaux on Pylint, JJ on ... well ... random stuff in the Unix way, Drew demonstrating a bunch of different tools (depgraph makes cool pics like this out of your code's dependency graph, kcachegrind makes cool pics of your profiling output).

When all that (plus the newbie nugget on Big-O notation and python container types) was over we were almost out of time. I had three presentations prepared: one on using virtualenv to isolate python environments, one on using pip (Ian Bicking's easy_install replacement), and a presentation on fabric (the pythonic remote deployment tool). Due to the time limitations I did an abbreviated run through the first two and spent most of my time on fabric. I think a video of the audio and slides will be up at some point - in the mean time you can see my slides on virtualenv here, the pip slides here, and the fabric slides here - hit the space bar once the slides load to move through them.

I also ended up talking afterward about how I prepared my slides: I used the rst2s5 tool that's included in docutils to turn my slide's rst source into the html slides I used in my presentation. Any modern browsers will show a nice click through slide show using Eric Meyer's S5 slide format...

Posted March 28th, 2009 in Programming (Comment)


Supervisord 3.0?

I'm starting to run into a problem with the excellent supervisord. I currently use it to keep my Django processes alive on my VPS and now that I have a couple dozen managed processes I'm realising the shortcomings in the design of supervisord.

Supervisord is basically a friendly init system written in python. Rather than have to write init scripts in shell I just edit my supervisord.conf files, run supervisord as root, and all my long running processes (mostly Django instances) are started and managed by supervisord. This works well until I need to an additional process; currently reloading the config file means restarting the supervisor daemon which means restarting all the processes it controls (and a time wait/heavy server load while they all start simultaneously.)

I'm aware that there are some patches (twiddler) to allow you to dynamically add tasks without editing the .conf file. What I really want, however, is to be able to reload the conf file and only affect tasks that aren't already running (so adding a new process to the config file and reloading would only affect the newly added task.) It makes me very happy to see some discussion of this on the supervisor mailing list (see here, for example) towards the end of 2008. Of course now I'm just waiting anxiously for a 3.0 release - and wondering if I should stop complaining that my free ice-cream isn't being delivered fast enough and pitch in and help instead...

Posted March 5th, 2009 in Recommends (Comment)


Django Tree Menu

I plan to regularly highlight Django apps I've found useful. I know there are some pluggable app review sites springing up - but I think it's one way of thanking authors in a small way for sharing their code with the Django Community.

With that in mind - I recently switched from my own menu app to Django TreeMenus - mostly because they have a nice admin (I have to check out how they implemented the ordering buttons in the admin; it's very nice!) I do wish they'd use the indispensable mptt to add the tree management features. It would be nice to have one really polished reusable hierarchical tree app, instead of many custom re-implementations, but this is a small nit to pick. This is definitely worth your while if you want your menus to be adminable... It's just a `pip install -E env -e svn+http://django-treemenus.googlecode.com/svn/trunk#egg=treemenus` away :)

Posted March 1st, 2009 in Django Apps Recommends (Comment)


The business of software

I've got a post coming out about sprinting - it's value or lack thereof for both the developer and the client. I should clarify that by sprinting I mean the practice of working extra hours or dropping best practices (design, testing) in order to keep an unrealistic development schedule.

Of course I'm thinking about the disadvantages of sprinting because I've got some recent experience - I took on a project via the Sparq Group that I knew going in had a ridiculous schedule (due to pressures on the Client).

It was a classic sprint (and would have degenerated into a death march if my contact at Sparq hadn't done such a good job of staying on top of things with the client). It's taken me a couple of weeks since the main part of the job was completed to catch up on my sleep, my family, and my other clients...

I'm finally feeling more rested though - and ready to start communicating again. One piece of writing i saw lately that I thought I'd point out is Squeejee.com's article Why We Bill By the Hour. Good stuff - and sounds familiar to my own thoughts on Why I Don't Do Bids on Big Projects...

Posted August 28th, 2008 in Business (Comment)


Older