Martin Probst's weblog

Prevayler

January 9, 2005 at 15:05 #

I stumbled across a persitance framework for Java called "Prevayler. It's a rather interesting approach to persistence.

The basic idea is to keep all information within RAM within usual Java objects. These objects are persisted using some arbitrary Java serialization technology - implementing "Serializable" will do. All operations on the objects are done using objects representing transactions. These transaction-objects are passed through the system and serialized to a log. If something crashes the application the serialized transactions are replayed from an initial state of the system. To keep the logs small the system saves the current state to disk (using serialization) in regular intervals, like once a day. This is done by keeping a hot standby server which is running synchronized to the real server. If a backup is requested the standby server stops syching with the server, dumps his objects and resynchs.

This is a very smart way of easily achieving quite a good persitancy for objects without much hassle. While it is nice, it has a lot of limitations. The developers themselves seem to think they have found a universal solution to overcome relational databases in general. Does it really do that?

With Prevayler programmers have to be quite smart to really get their objects to be persistent - while they are completly unlimited regarding the use of Java features they have to pay attention on some certain things like not keeping references to objects. Also, the speed of the system relies on the ability of the programmer to create a sensible data structure to access her data - if she chooses the wrong data structure things will get really slow. SQL and RDBMS have been invented to solve these problems, and they are quite good at it, even though the conversion from the data represented in the relational world back to your business application takes some (mostly trivial) effort.

Next thing is Prevayler doesn't do anything in parallel. No transaction may be run parallel to another if the user doesn't synchronize it himself. This is a major con. Programming concurrent applications isn't trivial and one of the biggest benefits of RDBMS is to provide concurrency control, different levels of transactional security etc.

Also Prevayler won't help you with big amounts of data. The authors claim that falling RAM prices will solve that problem but that does not seem plausible. Real business applications can have several hundred GB of "live" data - thats not an area you will reach with cheap RAM anytime soon. These amounts of data can be managed with a (rather) cheap x86 computer and big harddrives, even if it might get slow. Prevayler simply fails.

I think Prevayler is a smart idea to solve a limited problem - persistance of data in small applications. If they can add intelligent means of swapping objects to disk to it it might get really useful. But this would again put limitations on the programmer, like forcing him to inherit from special objects, implementing certain interfaces etc.

The end of the story is that database management systems, relational or not, provide a lot of features Prevayler doesn't give you. Prevayler is suited for applications that are written by really good programmers, won't produce too much data and don't require any concurrency.

I can't really see how the programmers of Prevayler come to the conclusion that they obsoleted DBMS - and why do they think that thousands of capable programmers and scientists have just overlooked the Prevayler approach? That seems quite arrogant to me.


Joel on the HPI

January 9, 2005 at 11:41 #

Joel on Software writes this:

The moral of the story is that computer science is not the same as software development. If you're really really lucky, your school might have a decent software development curriculum, although, they might not, because elite schools think that teaching practical skills is better left to the technical-vocational institutes and the prison rehabilitation programs. You can learn mere programming anywhere. We are Yale University, and we Mold Future World Leaders. You think your $160,000 tuition entititles you to learn about while loops? What do you think this is, some fly-by-night Java seminar at the Airport Marriott? Pshaw.

The trouble is, we don't really have professional schools in software development, so if you want to be a programmer, you probably majored in Computer Science. Which is a fine subject to major in, but it's a different subject than software development.

Which basically sounds like the very idea of my University, the Hasso-Plattner-Institut for Software Systems Engineering. I wonder how long it will take until this is insight leads to a large scale change in our CS education systems. It's not really new and everyone who has finished University in CS and starts to work knows it - when will the Universities start to do something about it?


Project management under GNOME

December 18, 2004 at 00:09 #

I just gave Imendio Planner a try. It's a simple (compared to MS Project) yet very useful project planning application. With Planner you can easily define ressources and tasks and compile them in a Gantt chart. The most important features are available such as four types of end-start conditions within the Gantt chart, assigning ressources to tasks, sub-tasks and more.

The tool has a nice simple GNOME GUI and seems to be a lot easier to use than MS Project. It lacks the (IMHO) important feature of displaying your use of ressources which is a pity. While it exports to HTML and prints nicely an option to export the Gantt chart to some graphics format might be helpful too.

Anyway I would use this in upcoming projects as its very simple to use, free, and fits into my usual development environment.


C++ builds the easy way with scons

December 6, 2004 at 19:50 #

One of the minor but still annyoing pitfalls of development with C++ are Makefiles. The syntax is rather cryptic, if dependencies are getting bigger large Makefiles have to be maintained and more complex tasks require really dirty hacks.

There are a lot of make replacements out there. Today I took a look at SCons which looks really nice. It's written in Python and does not invent a new syntax but rather uses Python as the language to write build files in. Build files are declarative using calls to functions SCons provides to tell the system which targets have to be built. After executing the build script the targets are made using a set of implicit rules.

The major pros of SCons are the smart helper functions. You don't have to define dependencies between source files - SCons takes care of that by scanning the files itself (supporting quite a nice set of languages already). Implicit rules are available for compiling executables, libraries (shared and static) and some other files. The developers claim it should be easily extendible (maybe I'll try with antlr when I get some spare time). SCons doesn't just look at file modification times but uses md5 hashes by default, which avoides the whole mess applications create when touching files accidently. Also SCons keeps track of the state of intermediate files - a change in a source file that doesn't lead to a change in the object file won't lead to re-linking libraries or executables. Because SCons does not recurse into nested directories (it rather "includes" sub-build files) it should also be quite good with multiple build jobs and/or distributed compiling - recursive makefiles are a major obstacle for this as the make execution only sees a few source files at a time.

The biggest pro is probably also a con - using Python as the Makefile language. This enables users to easily manage complex build problems using a real programming language. On the other hand it enables people to create really cryptic build files as the syntax does not have any concept of order, grouping etc. It should be possible to overcome this by employing templates, coding standards etc. but it adds another thing to control and manage.

Another con is that a POSIX compliant make should be available nearly anywhere while SCons would be another dependency. However if you distribute binary packages anyway this shouldn't be that important.

The pros seem to overweigh the cons, at least for me. I think I'll use it in future smaller C/C++ projects, if it's evil despite the good impression I'll find out all too soon I guess ...


More XML sizes

November 29, 2004 at 08:14 #

Yes Lars, you don't really have to have size statistics if you've got growing containers. But what should the growth rate be? How big should the smallest container be?

I can imagine that most XML text nodes will be below 20 characters/bytes (only whitespace separating other tags), but what will be the next size step? Some size statistics about real wild life documents would be nive to have. This will also vary a lot across different document types and uses of XML. A smart XML DBMS would try to adjust its storage settings to the document in question.


Techwriter Wiki

November 27, 2004 at 18:54 #

Post for Lars:

[Übersetzungshilfe] Das Techwriters Wiki möchte Wissensbasis für technische Redakteure und Übersetzer werden. Es ist noch jung und kann jede Unterstützung brauchen. via Der Schockwellenreiter

XML Size Statistics

November 27, 2004 at 18:41 #

When storing XML in a database the single nodes are put into containers and stored on pages. Because it's generally easier to have fixed-size containers (representing objects) it's quite nice to do this with a default size and overflow containers.

But what default size should be used for what kind of nodes? We have to get some statistics on that point, but I get the impression that usually most text nodes are really small, e.g. not more than about 100-200 characters. Other nodes like elements are not that important as their name is usually only stored exactly once.

It seems as if the best solution was an incremental growth for the containers. Usually text nodes will be rather small (<50 chars) but if they are bigger than that the will probably be bigger than 100 chars or even 200 chars too. Most textnodes will be something below 10 chars though, at least for data oriented XML as opposed to document oriented XML (think of formatting XML with breaks and tabs between the elements). So the first text node container should be like 20 chars, the next size maybe 100 and thereafter really big ones. But these are only guesses - I need statistics on that.


DHTML Lemmings

November 13, 2004 at 22:56 #

DHTML Lemmings. Plain unbelievable.


Namespace prefixes in XML

November 7, 2004 at 18:34 #

In the latest W3C XQuery Working Draft the type xs:QName was altered. In former specifications it represented a qualified name as the namespace URI in combination with the local name, now the XQuery processor has to keep track of the user defined namespace prefix too. This seems to be a minor change which is useful to convert xs:QNames into strings, but in my opinion it's a major change of the data model.

The question is whether to see an XML document as a text document or whether to interprete it as a tree of nodes. The former way has the pro that users editing XML documents with notepad will usually be less suprised by the actual results of queries. While this would be nice, I think it's a horrible idea for a structure oriented query language, especially in a database context.

While designing an XQuery database we quite stumbled over such questions very often. What about whitespaces and indentation, what about character references, what about XML namespace prefixes etc., I'm sure there are still things to come. Others have run into this kind of problems too as you can read this post from Dare Obasanjo.

I think the only clean solution is to draw a line clearly separating the text representation and the tree representation of XML documents. In the tree representation, namespaces are just unique IDs and the prefixes are completely ignorable. Each qualified name has a namespace ID, but once it has been transformed from text to tree representation the namespace prefix is gone. Same goes for ignorable whitespace, character references and CDATA sections. Otherwise it becomes really tedious to store such things as where namespaces were declared with which prefix or you would even need to store texts twice, once in a normalized format usable for full text search and once in the representation the user expects. But what happens if these contents are updated?

Assigning prefixes, escaping non-representable characters to (character- or entity-) references, inserting CDATA sections and many other things are presentational logic. This might be handled by the XML editor or by an output filter when converting XML documents into their text representation, everything else clearly borks things outside the text representation. And even worse it keeps people thinking of XML as text with a bunch of angle brackets, as opposed to tree-structured data.


The Backside of GUIs

September 16, 2004 at 19:28 #

Don Park blogs about the Backside of GUIs. It sounds a little bit funny, but to me it seems to be a rather intuitive place for settings within a GUI.

Why not use the powerful features of coming windowing systems like Avalon or Looking Glass to provide settings on the backside of GUIs? No more searching through obscure settings dialogs, just turn around the very window whichs behaviour you want to alter. This could also be done for single dialogs of an application, like advanced search settings or similar things.