Accepted to Summer of Code 2007!

19 04 2007

It actually happened a week ago, but I was really busy and had no time to share this happy news with the world. But here I am again, a lucky student ready to hack some Python code during summer. This time I will not only write in Python, but also for Python – I’m going to implement a verification system for patches contributed to Python project via its patch tracker. See the application for detailed description. I have already set up a Trac instance for this project, so go ahead and post your comments there.

Last summer I learnt quite a bit about Python and its toolset, made some great friends and got involved into even more projects. I will be happy if this year will be no different.

A Logic File System and the web

1 03 2007

A Logic File System is generally a keywords-based file system extended by good navigation capabilities. There is a official home page with much more information and sample implementations. Although the implementations work nicely with standard UNIX tools, like ls, mkdir or find the concept itself presents a major shift in information management philosophy, which IMHO cannot be introduced overnight. In fact, I don’t think we’ll ever see this on our desktops as too many things rely on hierarchical structure right now (developing applications – along with version control systems – to name one). Current solutions (like Beagle) are simply good enough. What will probably work is creating a new platform which does the right thing from the start. Happy news is that this new platform is being build and (surprisingly!) heavily used already – popular web services like or flickr have these concepts built-in in a form of tags. As we slowly move our data from desktop into the web we’ll use logic file systems more and more, without even knowing it.

Getting back to the original paper – very interesting is the concept of intrinsic and extrinsic keywords. Intrinsic keywords describe an inherent qualities of a file (like size or last modification time), while extrinsic keywords are labels set by the user. By unifying these two properties into a single entity (a keyword) expressiveness of the system rises, while keeping the semantics simple. implemented a small part of this concept in form of system:filetype tags. With these two types of keywords in place you can execute ls length:>20min/type:video/google to list video files about Google longer than 20 minutes. Interesting characteristic of this is the fact that some intrinsic keywords are defined only for specific file types. For example, length keyword can be defined only for media files, like audio or video. By a set of extensions (or plugins if you like) system can incorporate a new set of search and navigation capabilities without requiring any user intervention or a single change in his data.

I hope the way current web applications evolve will eventually lead them to the ideas described in the LISFS paper so all of us can benefit from it without having to struggle with backward-compatibility problems our old hierarchical file systems impose.

BTW, all of this reminds me of an article on platforms and software evolution I read some time ago. Recommended reading.

Free your blog!

27 02 2007

During my search for free blogs today I got surprised by the fact that a really small percent of Python blogs (either on Planet Python or Unofficial Planet Python) is licensed under a free license (full list below). I believe it’s caused mostly by omission or unawareness, as in my example — I added a CC button pretty recently, although my intention was to share my knowledge with others from the very beginning.

Most of you already publish your code in open source projects, so you know the benefits of openness. Next step is freeing your ideas and hard work you’ve put into your blog. Choose a license that work for you and open your blog content for collaboration.

For a quick explanation of CC licensing, watch this video:

What follows is a list of Planet Python and Unofficial Planet Python blogs that are licensed under Creative Commons. If I omitted your blog or you’ve added a CC licensing recently, please let me know in the comments, so I can update this list.

Copying methods in Python

17 02 2007

Doctests should be self-explanatory:

class Copycat(object):
    def extract(self, klass, method):
        """Extract a `method` from given `klass` so it can be used on this object.

        >>> class A(object):
        ...    def inc(self, n):
        ...        return n + 1
        >>> p = Copycat()
        >>> p.extract(A, 'inc')(5)
        return lambda *args, **kwds: klass.__dict__[method](self, *args, **kwds)

    def copy_method(self, klass, method):
        """Copy a method from `klass` into this object.

        >>> class A(object):
        ...     def inc(self, n):
        ...         return n + 1
        >>> p = Copycat()
        Traceback (most recent call last):
        AttributeError: 'Copycat' object has no attribute 'inc'
        >>> p.copy_method(A, 'inc')
        self.__dict__[method] = self.extract(klass, method)

    def copy_methods(self, klass, *methods):
        """Copy methods from `klass` into this object.
        for method in methods:
            self.copy_method(klass, method)

Normally you can’t call foreign method on a given object, so this won’t work:

>>> class A(object):
...     def inc(self, n):
...             return n + 1
>>> class B(object): pass
>>> b = B()
>>>, 7)
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: unbound method inc() must be called with A instance as first argument (got B instance instead)

Using the trick with __dict__ you can overcome the type check and use the power of duck typing:

>>> A.__dict__['inc'](b, 7)

I used this technique with Mock class (from python-mock module) to easily test particular methods of a class, while mocking others. Let me give an example.

Imagine you’re testing a class which has few methods that touch the filesystem/database (so they are slow and rely on external unpredictable state) and few that are pure and do some logic, while delegating all the dirty work to the first group of methods. For example:

class ClassWeWantToTest(object):
    def __init__(self):
        self.that = self._init_that()

    def _init_that(self):
        # Touching the filesystem/database/...

    def dirty_work(self, argument):
        # Touching the filesystem/database/...

    def referentially_transparent(self, argument):
        # Working on argument and calling self.dirty_work()
        #  from time to time.

_init_that and dirty_work are “dirty”, while referentially_transparent is “pure”. Now, we want to test the referentially_transparent. We can’t make a mock and call referentially_transparent on it:

def test_it():
    m = Mock()
    m.that = [1,3,5,42]

    ClassWeWantToTest.referentially_transparent(m, 13)

This won’t work for the same reason the example with A and B classes didn’t work:

Traceback (most recent call last):
  File "", line 7, in ?
  File "", line 5, in test_it
    ClassWeWantToTest.referentially_transparent(m, 13)
TypeError: unbound method referentially_transparent() must be called with ClassWeWantToTest instance as first argument (got Mock instance instead)

This is where Copycat class comes in. Using the class we can rewrite test code into this:

class CopycatMock(Mock, Copycat): pass

def test_it():
    m = CopycatMock()
    m.that = [1,3,5,42]
    m.copy_method(ClassWeWantToTest, 'referentially_transparent')


Now everything seems to work. We can mock return values and set expectations as usual. Happy mocking!

Note: To make it work with current (0.1.0) version of python-mock, use this small patch.

Make your tests self-documenting

14 02 2007

Inspired by RSpec (which in turn was inspired by TestDox).

A piece of code that transforms standard unittest output:

test_doesnt_raise_and_exception_when_None_was_passed (test_webui.TestSomeArbitraryMethod) ... ok
test_returns_this_and_that_when_string_was_passed (test_webui.TestSomeArbitraryMethod) ... ok


Some arbitrary method doesn't raise and exception when None was passed ... ok
Some arbitrary method returns this and that when string was passed ... ok

Test case that generated output above looks like this:

class TestSomeArbitraryMethod(TestCaseWithSpec):
    def test_returns_this_and_that_when_string_was_passed(self):
        assert "this and that" == some_arbitrary_method("string")

    def test_doesnt_raise_and_exception_when_None_was_passed(self):
            assert False

If you like it, grab the source code of TestCaseWithSpec.

Cheesecake for all

9 02 2007

If you maintain a Python package that is registered on PyPI, go check out Cheesecake service now! We automatically test new releases, so if you have released a new version of your code recently, you can check its Cheesecake score right away.

Tasty cheesecake photo by Sharyn Morrow

Cheesecake is a tool that gives you feedback about state of your python package. Unit testing gives you feedback about behaviour of your code, while Cheesecake tells you about such things like whenever your package can be easily installed, how well it is documented and how strictly your code adheres to common coding standards (like PEP-8).

Cheesecake defines three types of indexes: installability, documentation and code kwalitee index. In short, installability tells you if your package can be easily found, downloaded and installed using distutils/setuptools facilities. Documentation index informs you how many of your code objects (modules/classes/functions) have docstrings and did you remember to create files like README or INSTALL (which users tend to look for first after unpacking the source). Code kwalitee checks your unit tests and runs pylint on the whole package. If you combine all of those different aspects of a package and check their conformance to a common practice – you get Cheesecake score. Want more details? Check out description of an algorithm for computing the Cheesecake index.

Score isn’t meant to define “better” and “worse” packages, it is only a helpful estimate of progress, as you make certain efforts to make your package easier to install, understand and modify. More work you put into your distribution, higher Cheesecake score you should get. We tried hard to make this correlation of good packaging practice and Cheesecake score high, but chances are we made some mistakes. If you think we scored some parts of your package wrongly or we missed some effort, we urge you to send us a bug report. The whole Python community will benefit, as the definition of a good Python package is still not well crystallized. We want Cheesecake to be a useful tool for all Python programmers who seek guidance on how to improve their distributions. The profit is mutual – developer can raise his knowledge of good coding practices and potential distribution problems, while his improved package will get used more often for the benefit of whole Python community.

So, check out Cheesecake service or try Cheesecake on your computer. Bon Appétit!

Curly bracket strikes back

1 02 2007

With decent web frameworks for Python and Ruby writing web applications is a real pleasure. You no longer have to clutter your screen with ugly Java/PHP/whatever mess, you can even write AJAX stuff directly in your language of choice (which is a hack on its own, but we can’t argue with so called de facto standard), so your UI needs can be mostly satisfied.

Flex demo

But there comes another threat – it’s called Flex (no, not the GNU lexer). On the surface it may look really nice, but under the hood some serious code bloat is going on. I’m not saying this particular code is bad (it is quite nice actually), but the general pattern is clear – lots, LOTS of typing. Effect – big amounts of code to read with intentions of the implementor hidden inside. I believe code can be kept clean. I wonder how in earth Bruce Eckel, proponent of Python, man who not so long ago wrote that this kind of code bloat costs time and money could support this technology. We can write (and maintain) code with text editors. With emerge of dynamic languages this trend was finally going upstream. Today, Flex compared to Ruby or Python just seems backwards. So please don’t use Flex. Otherwise we will again have to come up with pieces of middleware for automatic ActionScript generation and that’s no fun. We don’t need another hack for the web. And no excuses.