Devon

26 06 2006

DevonThis iteration took two weeks to complete because of exams I had. Fortunately exams will soon end; I have the last one on Thursday.

Most important thing about past iteration was testing Cheesecake on all PyPI packages, which revealed a few issues. First, lots of packages don’t have correct download URLs listed on their PyPI pages, what makes scoring impossible. setuptools try hard to find a download link, but it fails for many packages. PJE don’t want to include any more screen scraping code in setuptools, so now it’s package maintainers duty to update their download links. Without this manual work of Python developers, the whole idea of PyPI and setuptools is useless.

There is also a group of packages that have only an egg uploaded, without a link to project sources. Eggs are intended to be a binary distribution, and as such, will be harder to score. Cheesecake have support for eggs now, but scoring techniques for scoring them will have to be improved at some point. We still have many ideas waiting to be implemented, so please be patient.

Of course there is a reason I’ve done all these tests. My goal is to start PyPI integration soon, so I want to be sure I’m heading in the right direction and that all changes in Cheesecake are for the good of Python community. I’ve already fixed some scoring methods and there are more improvements on their way. If you’re a Python developer, please try running Cheesecake on your project. I would love to hear your comments.

You can read in more detail about stories completed in Devon on Grig’s blog.

Advertisements




Camembert summary

13 06 2006

Since camembert milestone is complete, few summary words is due.

Let’s start with the mistakes. Coverage statistics I’ve published few days ago are bogus. All because I didn’t remove .coverage file from my cheesecake development directory. During next run statistics got messed up. I should have suspected an error because line numbers in coverage output were obviously bad. I have to be careful the next time. Fortunately buildbot pointed out my laziness and I was able to fix the bug. So, remember to always use -cover-erase option! Actual number is 79% (with cheesecake_index scoring 75%).

Now about the good things. In this iteration I’ve managed to prepare buildbot setup for automatic generation of Cheesecake documentation (using wonderful epydoc package) and its coverage statistics (using coverage.py and cover2html script I’ve written). So, if you want to check out what’s currently going on, you have two more sources of up-to-date information (existing one was Trac of course).

Last thing I wrote was a simple tool for converting our ReST README file into Trac Wiki format. This way we are able to maintain a single file and automatically convert it to any necessary format on demand. HTML output is supported by standard docutils distribution, but there wasn’t any solution for Trac Wiki. With a bit of reading through docutils code I’ve come up with a working script that successfully converts Cheesecake README from ReST format into Trac Wiki. Current revision available for download is 14. If you’re interested in writing a custom ReST->anything converter, you may want to read that sources. docutils have quite verbose but self-explanatory API, so it’s not hard to start developing your own script. I also have few simple advices that may help you in your way:

  • Everything you have to do is to write a custom Writer. For examples you may want to look at the HTML Writer code.
  • To make your writer usable use the publisher interface to easily create a command line tool, like standard rst2html or rst2latex.
  • rst2pseudoxml is very helpful during debugging. This tool produces a document tree as seen by the ReST parser.
  • Very important thing is having good unit tests. Because we’re writing a converting engine, most of tests include feeding our application with sample data and validating output it produces. For this simple task the power of unittest or doctest will be getting in your way. I’ve prepared special testing script, run_text_tests.py, which can be found in the rest2trac package. It finds all files with .in extension in a directory and injects their contents to Trac Writer. After each test it checks if Writer output is the same as corresponding .out file contents. If not – it exits with an error. It should be very easy to use it in our project.

That’s it. Feel free to comment on everything I’ve written. I am happy to hear any feedback.





This episode is sponsored by the __getitem__ and __metaclass__

8 06 2006

Yesterday I committed quite large piece of code. Cheesecake is much more modularized now, as you will see below. The general idea was to move all scoring out of Cheesecake class, making each index as much self-contained as possible.

I started with a base Index class. Each cheesecake index that scores separate element of a package inherits from this base class. This way we have IndexUrlDownload (add points if a package has been successfully downloaded from provided URL), IndexUnpack (add points for successful unpacking of a package), IndexRequiredFiles (rise score for existence of useful files, like README or INSTALL) and much more, all inheriting from Index. Each index can be a container for other indices, so that the value of this index will be a sum of these child indices values.

Life of an index have three stages. First, we define the index, subclassing from Index and defining some general parameters. Most important attributes are max_value number and compute method. First defines maximum score this index can give a package, while the second is used to compute index value for given package. For index to be actually used during Cheesecake score computation it has to be put into a list of indices that cheesecake script use. This list is called CheesecakeIndex and currently consist of three indices representing three different views on a package “goodness”: installability, documentation and code “kwalitee”. If you add your Index instance to CheesecakeIndex list of subidices, it will be automatically called and its value will affect overall score.

Each index have to base its score on some package characteristics. I’ve used special convention here that in my opinion simplifies whole indices implementation. From the Index point of view you name the compute method parameters in a way that corresponds to Cheesecake class attributes’ names. For example, when looking at get_pkg_from_pypi method you may notice it defines three instance variables: download_url, distance_from_pypi and found_on_cheeseshop. So, if you want to use the value of found_on_cheeseshop inside your index, define compute method like this:

def compute(self, found_on_cheeseshop):
    pass # your code goes here...

Cheesecake code that will call your index will take care of the rest. For explanation check out get_method_arguments function.

But there’s more than this to indices. Check out this line of test code:

index = self.cheesecake.index["INSTALLABILITY"]["url_download"]

It clearly shows that you can get to index children by this dictionary-like syntax. This magic comes from proper use of __getitem__. But that’s not all. Go ahead and grep cheesecake_index.py for “url_download”. You won’t find anything! So, where the name comes from? I’ve followed a DRY rule here. I have already defined a name in the class definition, so why I should define it again? Getting this done was a bit tricky. I’ve used NameSetter meta-class that during class definition takes its name and using index_class_to_name function injects name attribute. Thanks to this mechanism, it’s enough to say IndexUnpackDir and I get “unpack_dir” name for free.

I’ve refactored most of the code I wanted to change, so I’m pretty happy with it now. Side effect of rewrite is much better coverage score (especially when you compare it to the last results):

Name Stmts Exec Cover
cheesecake/config 35 29 82%
cheesecake/logger 80 70 87%
cheesecake/cheesecake_index 592 531 89%
cheesecake/_util 85 80 94%
cheesecake/codeparser 92 90 97%
TOTAL 884 800 90%

I’ll get to automatic coverage generation soon, so you’ll be able to browse these statistics for each build.





The worst project ever

6 06 2006

Some people have very strong emotional relationship with Cheesecake:

for the record, the “cheesecake” mandatory-inane-code-metrics-invented-by-bozos project is the worst project ever in Python’s history.

“we don’t have any creative talent whatsoever, so let’s come up with a way to fuck with all the creative people that has made Python into what it is.”

seriously, if the only thing you find interesting in computing is the chance to control what other people do, and how they do it, please go work in some other field.

Well, I guess Cheesecake is right in front of pylint, setuptools, doctest, or even Python interpreter itself, as these tools also don’t hesitate to point programmer his/her mistakes. With a company of these projects we feel perfectly well.





Stage two: refactoring

5 06 2006

CamembertTo make Cheesecake easier to maintain and more extensible I’ve decided to do some major refactoring this week. You may expect things to be broken inside mk branch for some time. But at the end of the week you’ll hopefully be able to hack your own index and easily plug it into Cheesecake. I will write about it in more detail later. And for all agile fans I will also incorporate automatic creation of documentation and coverage statistics into buildbot.

Oh, and check out what Titus Brown has to say about Cheesecake. If you’re interested in explanation how Cheesecake development process looks like, read last Grig post.





What Cheesecake is all about…

5 06 2006

This is copy&paste from the project comments page that summarize my view on Cheesecake.

What do you mean there should be a README file? If I add an empty README file I get a good cheesecake score, while somebody who has excellent information in a file DOC, gets a bad score? I’m afraid cheesecake’s kwalitee is going to become a factor that won’t be respected by developers. If cheesecake gives a bad score to a project it may mean that either the project is bad, *or* cheesecake is bad and the project is good. And if there are many good projects that get a bad score cheesecake will render itself wrong!

And that’s exactly the reason we’re asking the community for feedback. We don’t want to implement “book standards” but real working and useful practices that good Python developers use. If there will be a few good projects with DOC instead of README file, we’ll add this check to Cheesecake. What I could call most common misunderstanding about Cheesecake is a misconception that we are trying to come up with our own standards and try to enforce them on Python developers. What we’ll actually do is listen to what developers have to say and come up with a greatest common divider for all these good programming advices. Cheesecake score will represent project compliance to this common and established way of doing things. Important part of this is that Cheesecake won’t be ever complete. It will have to incorporate changes in the same way the community and methodology changes over time. Having a tool to point you to current trends and suggest good advices on trivial things like file naming convention or distribution method is invaluable. This way you can focus on coding, leaving boring stuff to Cheesecake. One of our goals is to create an easy reference for all factors that affect Cheesecake score, so that every developer can easily look up why he’s loosing points. And that doesn’t mean these rules are set in stone – we encourage all developers to question them.

The point I’m trying to make is that Cheesecake is written to help programmers, not to put blame on them. It’s one of agile development advices: “Criticize Ideas, Not People”. If most developers use README it’s probably a good thing to have a README in your project. It tells nothing about the quality of your documentation, it only suggest you don’t have file that most people will try to look for first, right after package has been unpacked. It also make it easier for package managers in different open source distributions, like Debian or Gentoo, as they don’t have to check manually which of your DOC/UserManual/anything file contains actual documentation they can, for example, incorporate into a man. Having one good way of doing things is very efficient for collaboration and project maintaince. Python was built upon this principle, so Cheesecake is merely a continuation of this thinking, but on different level.





Make your code covered

1 06 2006

I committed a big chunk of code to Cheesecake yesterday, along with a bunch of tests. But how can I be sure I actually covered all of the code? Of course nothing will substitute good suite of tests, but I thought trying out a coverage tool won’t be a bad idea. To install coverage simply download a one-file module and drop it into your PYTHONPATH. You may also think about making an alias for it, so you can easily call it from the command line:

$ alias coverage='python /usr/lib/python2.3/\
site-packages/coverage.py'

I wanted to check what code is executed by current set of tests that Cheesecake have. I’ve used nose runner, with a special option:

$ nosetests --with-coverage

After about 20 seconds it finished its work. Results can be seen by invoking coverage directly. For the Cheesecake package I wanted to check only few modules, because not all are written by us. So the command line went like that:

$ coverage -m -r cheesecake/_util.py\
cheesecake/cheesecake_index.py cheesecake/codeparser.py\
cheesecake/config.py cheesecake/logger.py

Results were as follows (stripped out – irrelevant here – line numbers of not-covered statements):

Name Stmts Exec Cover
cheesecake/cheesecake_index 633 410 64%
cheesecake/config 35 29 82%
cheesecake/_util 47 40 85%
cheesecake/logger 80 70 87%
cheesecake/codeparser 89 87 97%
TOTAL 884 636 71%

As you can see, codeparser code (which I was working on yesterday) is covered quite well. Results also clearly show we have a lot of work to make Cheesecake tested appropriately. 64% is certainly not enough.

HTML table was generated by cover2html.py script.