Martin Probst's weblog

Techwriter Wiki

Nov. 27, 2004, 6:54 p.m. — 1 comment

Post for Lars:

[Übersetzungshilfe] Das Techwriters Wiki möchte Wissensbasis für technische Redakteure und Übersetzer werden. Es ist noch jung und kann jede Unterstützung brauchen. via Der Schockwellenreiter

XML Size Statistics

Nov. 27, 2004, 6:41 p.m. — 1 comment

When storing XML in a database the single nodes are put into containers and stored on pages. Because it's generally easier to have fixed-size containers (representing objects) it's quite nice to do this with a default size and overflow containers.

But what default size should be used for what kind of nodes? We have to get some statistics on that point, but I get the impression that usually most text nodes are really small, e.g. not more than about 100-200 characters. Other nodes like elements are not that important as their name is usually only stored exactly once.

It seems as if the best solution was an incremental growth for the containers. Usually text nodes will be rather small (<50 chars) but if they are bigger than that the will probably be bigger than 100 chars or even 200 chars too. Most textnodes will be something below 10 chars though, at least for data oriented XML as opposed to document oriented XML (think of formatting XML with breaks and tabs between the elements). So the first text node container should be like 20 chars, the next size maybe 100 and thereafter really big ones. But these are only guesses - I need statistics on that.

DHTML Lemmings

Nov. 13, 2004, 10:56 p.m. — 0 comments

DHTML Lemmings. Plain unbelievable.

Namespace prefixes in XML

Nov. 7, 2004, 6:34 p.m. — 0 comments

In the latest W3C XQuery Working Draft the type xs:QName was altered. In former specifications it represented a qualified name as the namespace URI in combination with the local name, now the XQuery processor has to keep track of the user defined namespace prefix too. This seems to be a minor change which is useful to convert xs:QNames into strings, but in my opinion it's a major change of the data model.

The question is whether to see an XML document as a text document or whether to interprete it as a tree of nodes. The former way has the pro that users editing XML documents with notepad will usually be less suprised by the actual results of queries. While this would be nice, I think it's a horrible idea for a structure oriented query language, especially in a database context.

While designing an XQuery database we quite stumbled over such questions very often. What about whitespaces and indentation, what about character references, what about XML namespace prefixes etc., I'm sure there are still things to come. Others have run into this kind of problems too as you can read this post from Dare Obasanjo.

I think the only clean solution is to draw a line clearly separating the text representation and the tree representation of XML documents. In the tree representation, namespaces are just unique IDs and the prefixes are completely ignorable. Each qualified name has a namespace ID, but once it has been transformed from text to tree representation the namespace prefix is gone. Same goes for ignorable whitespace, character references and CDATA sections. Otherwise it becomes really tedious to store such things as where namespaces were declared with which prefix or you would even need to store texts twice, once in a normalized format usable for full text search and once in the representation the user expects. But what happens if these contents are updated?

Assigning prefixes, escaping non-representable characters to (character- or entity-) references, inserting CDATA sections and many other things are presentational logic. This might be handled by the XML editor or by an output filter when converting XML documents into their text representation, everything else clearly borks things outside the text representation. And even worse it keeps people thinking of XML as text with a bunch of angle brackets, as opposed to tree-structured data.

The Backside of GUIs

Sept. 16, 2004, 7:28 p.m. — 0 comments

Don Park blogs about the Backside of GUIs. It sounds a little bit funny, but to me it seems to be a rather intuitive place for settings within a GUI.

Why not use the powerful features of coming windowing systems like Avalon or Looking Glass to provide settings on the backside of GUIs? No more searching through obscure settings dialogs, just turn around the very window whichs behaviour you want to alter. This could also be done for single dialogs of an application, like advanced search settings or similar things.

Choosing the right technology

Sept. 1, 2004, 7:50 p.m. — 0 comments

Mark Pilgrim blogs about how to choose the right technology, nomatter what for. Finally we can make great fact based decisions automatically.

Dependency Injection

Aug. 20, 2004, 8:15 a.m. — 0 comments

Rickard Oberg writes about Dependency Injection. This is completely new to me but it looks like an interesting approach. Oberg is using a Container called Pico to manage his Java objects, their lifecycle and their dependencies. He writes about new design goals he adopted after writing code using Pico.

Pico can be used to group objects together in a container. If you have dependencies between your objects (especially objects which need another objects in their constructor) Pico will try to solve them using reflection. After adding several classes to a PicoContainer you can start the whole container and Pico will create your objects and start those which are capable of doing something.

By grouping objects together in nested containers you can manage whole cloud of dependent objects. This looks nice, but it obviously needs a lot of thinking in the design phase as well as a really different approach to OO software design. Pico (and possibly other similar frameworks) hide the dependencies, object configuration and lifecycle issues from the programmer. This might be a good idea in many situations but I'm pretty sure that you can blow an impressively big hole into your own foot with this.

Recovering overview of classes with lots of methods

Aug. 5, 2004, 10:10 p.m. — 0 comments

I think everybody knows this. Editing a quite comprehensive class within Eclipse can get tedious because one's loosing the overview over the methods. Eclipse provides an outline over the methods, but this isn't sorted in a sensefull manner.

The Eclipse Protocols Plug-In helps with this by grouping methods into categories (marked by comments) and letting the user browser them. The grouping is quite easy using drag & drop. This is a nice tool to enhance the readability of your code which is in turn a good idea as code is usually read a lot more often than it is written ...

[via Manageability]

IBMs Common Public License

July 20, 2004, 12:28 p.m. — 0 comments

As posted before I took a look at the CPL, IBMs OpenSource license.

It seems to be some kind of a "mid-way" between the BSD style licenses (e.g. Apache or LGPL) and the GPL (or similar licenses). With the LGPL one may use and modify the source in own projects and distribute the resulting applications in binary without providing the full sourcecode and under your own licensing terms. You only have to give credit to the authors. This is IMHO a nice thing as it might help Open Source software to spread. Company lawyers are probably less afraid of such licensing terms than those of the GPL requiring full redistribution of source code.

On the other hand, this might result in some Evil Company™ taking over your nice open source project and making bigbucks while suppressing their userbase. There is nothing wrong with companies making bigbucks but I wouldn't really feel happy with that. Look for example at Transgamings WineX - if I was one of the Wine developers I would be fairly pissed. They took most of the Wine work, only added features for gaming support, and didn't give their improvements back to the community. This seems somewhat unfair as their product seems to be heavily dependend on Wine.

With the CPL it's possible to have some kind of a middle way. If others are modifying your source and redistributing it they have to provide the source code with it. But if they only use the program (e.g. link against it, use it like one would use an XML parser) it's ok for them to include it in closed source distributions while providing appropriate creadit to the original authors. This might be a way to go.

The only obstacle is that the definition of "uses the source code" versus "is a derivative of the source code" is rather weak or non existant. One would also have to look at the compatibility of the CPL to other licenses. This comes in handy if one needs to change the license of ones product.

Bug hunting with AOP and Eclipse

July 20, 2004, 12:04 p.m. — 0 comments

Today I stumbled across an Eclipse Plugin called Bugdel. Bugdel provides an Aspect Weaver to include debugging code into your existing Java applications. This results in a clear separation of debugging/logging code and real application logic which is generally regarded to be a Good Thing ™.

Bugdel supports the common set of points to weave aspects into as method calls, method executions, field getting and settings etc. It's also possible to weave aspects to line numbers. Bugdel supports wildcards in method and class names so you can easily weave certain debugging aspects to a lot of methods.

What looks really good is the integration into Eclipse. Bugdel provides an own Editor to Eclipse (based on the standard Java editor) where you can easily add the "pointcuts" (Bugdel term for AOP join points).

I have to look into this to find out whether the Bugdel pointcuts can only be used from within Eclipse. It would be great if they could also be compiled into the app using Ant or something similar.

Bugdel is distributed under the terms of the CPL (IBMs OpenSource license). I only took a glimpse at the CPL but it looks kind of strange to me. I'll read some more on it.

(via Eclipse Plugins)