Martin Probst's weblog

Atom XML Schema

Oct. 8, 2005, 8:53 p.m. — 2 comments

Does anyone know about an up-to-date XML Schema definition for Atom 1.0? I only found this one, which is nice, but it doesn't fit the current spec very good, and I'm too lazy to fix it ;-). There must be a (non-RELAX NG) schema out there, or not?

I'm currently playing around with the Atom Publishing Protocol, Atom itself and this idea of a Atom-based web storage facility. I'm not completly convinced that it's useful, I mainly wanted to try how hard it would be to implement something like that using X-Hive/DB. Or maybe it's just that XQuery is being finalized, and I need a new quick moving target to complain about changes in the spec ...

Speaking of that, we just released X-Hive/DB 7.0, which is really cool. I will probably write some stuff about it later, when the website is properly updated.

Collaborative Editing with Gobby

Sept. 28, 2005, 12:47 p.m. — 1 comment

There is a text editor for Macs called SubEthaEdit, allowing multiple users to edit files collaboratively. Quite cool, but while the editor is free you have to get yourself at least the smallest hardware dongle at $ 500 (iMac mini).

Now there is Gobby, an editor doing roughly the same for Linux, Windows and Mac. I just tried it out and it does work on a local machine. Unluckily I didn't have a second box to try the advertised Zeroconf support etc., but it looks very promising!

Now all we need is a generic protocol for alle realtime collaborative editors ...

Media-less Linux installation

Sept. 7, 2005, 2:11 p.m. — 0 comments

Install Linux without any media. If I had known of this slightly earlier (don't know since when it exists, though) it would have saved me a lot of trouble. Installing linux on a ThinkPad X30 without any external drive can get quite difficult.

When installing Gentoo on it I managed to get there by booting a kernel which had it root filesystem on a NFS share on a second box. Works, but is quite a lot of hassle setting up the server. Plus you learn a lot of things about tftp, NFS etc. you really never wanted to know.

When installing Ubuntu I found out that you just need to have the kernel + initrd. I formatted my USB key, marked the primary partition as bootable using fdisk, installed grub and the Ubuntu kernel on it, and it actually worked, pulling the whole installer from the net. Except that sometimes my wireless LAN card was recognized in the installer, sometimes not. This works probably better by now.

The method described by Marc Herbert seems a little more difficult than the USB key drive, but if you don't have a Linux system to set up the keydrive or don't have a Notebook that supports booting from keydrives, it's definetly the way to go.

[via Ben Maurer]

Java Unit Test Coverage

Sept. 4, 2005, 11:16 a.m. — 1 comment

I've spent one and a half day last week setting up a Java Unit Test code coverage system. This was somewhat surprising to me, I don't think something like that should take that long. The major problem was the state of the available tools. I wanted to find if there exist any usable open source tools first, so I avoided Clover, JCover & Co. Instead I tried:

* jcoverage GPL - http://www.jcoverage.com/
Doesn't work with Java 1.5, not updated for ages. * Quilt - http://quilt.apache.org
Doesn't work, not quite sure why. Not updated for ages. * ucovered - http://jxcl.sourceforge.net/
Doesn't work, Javadoc even states the error in question, but seems to be abandoned, too. * cobertura - http://cobertura.sourceforge.net/
Does work (hurray!) and seems to be in active development. Creates really nice coverage reports, but has quite some overhead problems. Even with some tricks the overhead seemed to be something like factor 4 to factor 5, and that just doesn't work if you have a testsuite of >10,000 tests that does already take several hours * EMMA - http://emma.sourceforge.net/
This is what we are running now. EMMA doesn't test real line coverage, but rather code block coverage. A code block is some java code that is not broken up by flow logic but rather a simple sequence of statements. I'm not sure if it's because of that, but EMMA is a lot faster than cobertura. Drawback is that the results are not displayed as nice.

So now we have something that is somewhat working. Somewhat because I ran into (presumably) a bug of ant (1.6.3) where custom junit task result formatters don't get their extension passed along if the <junit/> task is set to forkmode='once'. This currently makes it impossible to view the results of the unit tests if they are run with code coverage enabled, and by that makes it quite difficult to hunt down errors. I still have to check if that bug is fixed in a later version of ANT.

The forkmode='once' also lead to quite a number of errors on our side, as our test machinery relies on static class fields in several places, and those might be set to something wrong after a test. That's probably an error on our side, but annoying nonetheless. The forkmode='once' is necessary though, as anything else slows down the testing horribly.

In the aftermath coverage testing is quite nice, and the results are not as horrible as I expected. In most packages we have a coverage of over 90%. Most of the untested code is in generated classes. I presume most of it is untestable and not used at all. Code coverage in terms of lines or blocks is of course a very bad criterium for test completeness, path coverage wouldn't be that much better too, but it can at least give you good pointers to areas that are under- or untested. Another step to better software development ;-)

PS: Also a plus for EMMA is that it's self contained, only two jars, as opposed to other projects which require 6-8 libraries to be on your classpath. This is generally just a little more work to do when setting up, but wait until tool A requires a different version of a library than tool B. DLL hell for java, but that's another story ...

BOM of death

Aug. 4, 2005, 4:40 p.m. — 0 comments

Note to self: next time you get really strange XML parse and comparison errors, try running this before looking at XML and Java files, cursing at XSLT, JUnit, Eclipse & the world in general for an hour:

find | grep -v .svn | xargs sed "s|\\xEF\\xBB\\xBF||" -i.from-bom

(Unix shell script to remove UTF-8 byte order marks from all files below the current directory).

Afterwards start cursing about Notepad, Windows, and Microsoft's use of the BOM in general.

Writing strategy

July 15, 2005, 10:41 p.m. — 1 comment

Bennaco tells us How to Ruin a Writing Project in 10 Easy Steps. After that, he writes how to really do it, step 1:

1. Decide that you're not going to really "do it". Which is to say, decide that you are not going to approach the whole big, terrifying, thing in one go. Instead, you're going to do some noodling around, some very small, easy, graspable, low-intensity, and non-threatening things, one at a time, until the project gets done.

While he is talking about how to write as in literature, I feel this does apply a lot to programming, too. If your starting to write a big piece of complex software, do not try to approach the whole big thing at once. Just start writing all the little parts that glue together to the whole.

Start off by dissecting the problem into smaller building blocks. This is the most important step and requires quite some time. Dissect the blocks more until you can describe what each block really does in two sentences, without "and then something magic happens". Really figure out how the single parts work together, otherwise you'll be screwed afterwards. Discuss every detail with you team members, if any, to really make sure it works this way.

Then start to write all the smaller parts. Don't start with your "public static void main(String[] args)", but rather with the smaller helper routines, the data model your working on, conversion etc. If you have a proper development environment, you can test those parts using you favorite Unit Testing framework. The important thing is not to implement something that does the full job partially, but rather do a small part completely. Otherwise you will end up with a codebase that is completly cluttered by adding feature after feature without a bigger plan. That results in rewrites over rewrites and lots of bugs, not to forget the maintenance nightmare.

If you just continue to do so, at some point you will start using these components and glueing them together, more or less automatically. At this point, everything you should be finished with all the low level stuff and just put together the system in a bigger sense.

Finally putting the blocks together and seeing how it takes off can be quite rewarding. On the XMLDB project I did at the end of my bachelor studies, the senior developers supervising us recommended to just get something working as fast as possible. We did not go that way but rather took 2 months of planning out of 7. Then we started writing components, testing and slowly putting them together. The system didn't run a single query until after 4 1/2 months. But at that point, most of the hard work was done and we managed to deliver a working XQuery and XUpdate implementation including a persistent storage backend on time. And that with seven students, of whom one was busy writing a GUI for the server and one was doing documentation, infrastructure and other related work.

I just cited Bennaco's first point, but the rest is quite similar to what I wrote. Lots of refinements of an abstract plan until it's really trivial to write the single steps and glueing them together.

XML Editor for Eclipse

July 8, 2005, 11:55 a.m. — 4 comments

I just installed the Eclipse Web Tools Project stuff. It's not like I was doing web development, but these tools include something I've been looking for for ages:

### A decent XML Editor

Finally. I tried about 8 different tools, open source and commercial alike. All of them sucked in one or more ways - some we're merely text editors with highlighting, a lot were simply defunct, and something that not a single one got right was simple editing (proper indentation, proper cursor placement, etc.). The only one that was tolerable was the <oXygen/> editor, but well over $1000 * is a lot too much if your just using the XML editor.

It's still a little bit strange to install a full blown web development environment just to get something as basic as an XML editor (shouldn't this be provided by the editing platform by default?), but whatever.

* Update: I stand corrected, <oXygen/> is indeed a lot cheaper. Must have confused it with some other tool. Anyways judging from the first glimpse I prefer the WTP XML Editor over <oXygen/>, mainly because editing seems smoother.

Discussion of Apple's RSS extensions

July 6, 2005, 12:26 p.m. — 0 comments

Sam Ruby asks for linking to the discussion of Apple's RSS extensions in his blog. It's a worthwile read on how to (and especially how not to) extend existing XML formats.

The topic is quite interesting. I'd be interested in a more general discussion of non breaking extensions to existing XML formats - might be worthwile reading.

Evolution & Spam filtering

June 29, 2005, 10:50 a.m. — 0 comments

After quite a long and annoying hunt I think I have found out why Evoltion refuses to filter spam for me. Evolution uses SpamAssassin as it's backend and SpamAssassin has a certain feature called bayes_auto_learn.

It basically means that everything that gets classified as definetly spam (>15) or definetly not spam (<=0.1) is also automatically sent to train the bayesian filter.

I really wonder of what use this is. The bayesian filter will learn the same rules that are already implemented in SpamAssassin by that, if I'm not mistaken.

Apart from that, for me this was a nice bug. When you mark a message as spam in Evolution, it's supposed to train the filter. But the spam I'm getting (advertisement on stock options and such) always gets rated as 0.1 by SpamAssassin and is then automatically trained as not spam. Evolution would have to call sa-learn with the --forget option to force training the message as spam as SpamAssassin tries to avoid training messages multiple times.

So basically the spam filtering worked, but all the spam I got was automatically trained to be ham, no matter what I did with clicking etc. I whish spam filtering in Evolution was as easy and helpful as in Thunderbird...

Off to XIME-P

June 15, 2005, 4:23 a.m. — 0 comments

I'll be off to XIME-P, the International Workshop on XQuery Implementation, Experience and Perspectives. There will be a number of talks about directions and future development of XQuery. I'm especially interested in the upcoming Update language.

Also, I'll be spending 4 days in Baltimore, so I have two free days. Everybody told me Baltimore is not that interesting so I will try to get to Washington and do some tourism.