Martin Probst's weblog

Division of Labour

April 29, 2010 at 13:59 #

Weird: everyone who has worked in small companies/startups and large companies knows that you spend a lot more time in large companies on overhead. That can be all sorts of things, like buying a piece of software, taking time off, etc.

The strange thing is that in theory, one would think that a large organisation is structured better and caters to many more people, thus has a better division of labour, and thus the overhead for individuals goes down.

But it's exactly opposite. In a startup, you spend time on things that might not be your exact role, like administrating a web server when you are strictly speaking a developer. But that actually contributes to the company's success. And even if you count that time as overhead, I still feel I had less overhead in smaller organisations.

I wonder if this is just for software companies and bureau type workers, as opposed to assembly line workers. Still seems like a weird failure; economies of scale would suggest the opposite effect.


Reducing XPath

January 5, 2010 at 14:43 #

Michael Kay writes on his blog: "Could XPath have been better", suggesting XPath would have been a nicer language without all the little inconsistencies. Instead, he'd rather map more or less everything to built in functions and their application to sequences, including the axes, predicates, and so on.

This very much sounds like an implementors pipe dream: remove all the annoying inconsistencies and make it easier to create fast implementations.

If you are done replacing all implicit syntax with function calls, I think you might find that you have written a LISP interpreter with built in functions (some with funny or punctuation names) for DOM navigation. Not that that would be a bad thing.

Though this makes one wonder what the actual proposed value of XPath is, once you reduced it to a LISP dialect. Probably the restricted expressiveness and from that the ability of analyzing the function applications to produce a clever execution strategy.

This always reminds me of Erik Meijer and his presentation on LINQ at VLDB 2005 (?), where he demonstrated how LINQ effectively maps certain function applications (selection, projection) to different repositories. I still like the approach: provide a somewhat unified syntax, hand over an Abstract Syntax Tree at run time to the data source/repository, and let that find a good way of executing the query. Integrating the query language into the programming language very much reduces the pain for users, and creates a uniform interface for many different data sources.

This is of course limited to the .NET platform and effectively SQL only, afaik, and I have never actually used LINQ, so I have no idea how good it works out in practice. I might imagine that tool support (profiling! indexes!) can be difficult.


Generating Eclipse build files with XQuery

November 16, 2009 at 20:27 #

A friend of mine had a problem today. He was trying to make a huge Ant-based project usable from within Eclipse. The build file would manage dependencies through XML property files within about 30 subdirectories, each declaring which sub-projects need to be compiled before itself.

Writing all those .project and .classpath files is a very tedious task, and clicking them together in Eclipse is even more tedious. XQuery to the rescue!

I simply imported all the property files into xDB and ran this query:

import module namespace fw = 'java:java.io.FileWriter';
import module namespace xn = 'java:com.xhive.dom.interfaces.XhiveNodeIf';

for $project in /project
let $name := substring-before( substring-after(document-uri($project/root()), '/build/'), '/')
where $name != 'swami'
return
(:let $name := if ($name = 'daogen') then 'DaoGenerator' else $name:)
let $dep-internal := tokenize($project/property[@name='${module.name}.depend.internal']/@value, ',')
let $dep-common := tokenize($project/property[@name='${module.name}.depend.internal']/@value, ',')[not(ends-with(., '.jar'))]
let $deps := ('libs', distinct-values(($dep-internal, $dep-common)))
let $fw := fw:new(concat('/Users/martin/tmp/build/', $name, '/.classpath'))
let $pw := fw:new(concat('/Users/martin/tmp/build/', $name, '/.project'))
let $classpath :=
<classpath>
{ comment { $name } }
	<classpathentry kind="src" path="src"/>
	<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
{ for $dep in $deps return 
	<classpathentry combineaccessrules="false" kind="src" path="/SW {$dep}"/>
}
	<classpathentry kind="output" path="bin"/>
</classpath>
let $project := 
<projectDescription>
	<name>SW {$name}</name>
	<comment></comment>
	<projects>
	</projects>
	<buildSpec>
		<buildCommand>
			<name>org.eclipse.jdt.core.javabuilder</name>
			<arguments>
			</arguments>
		</buildCommand>
	</buildSpec>
	<natures>
		<nature>org.eclipse.jdt.core.javanature</nature>
	</natures>
</projectDescription>

return (
  fw:write($fw, xn:to-xml($classpath)), 
  fw:write($pw, xn:to-xml($project)),
  fw:close($fw),
  fw:close($pw)
)

What does this do? It iterates through all project descriptions, takes the project name from the document URI, takes out the dependency information (those are the tokenize calls), and create a .project and .classpath XML snippet for each sub-project.

xDB does not include functions to write to the file system out of the box. We could create our own custom extension functions in Java and put them on the classpath, but there is a much easier way through the Java Module Import functionality.

We simply import java.io.FileWriter, create a FileWriter through the fw:new(...) calls for the right file location, serialize the XML snippets using xn:to-xml(...), and then make sure to close the file writers.

We still needed to do some fixups in Eclipse (mainly adding .jar files to the build path and fixing some wrong circular dependencies), but this certainly saved us hours. The world is a much nicer place when you have effective XML tools at your hands :-)


Mac Mini Media Center (M³C)

September 12, 2009 at 19:23 #

So, after much procrastination I bought myself a new Mac Mini and have set up my Mac Mini Media Center (M³C).

Some observations:

  • While equinux' TubeStick has a cute name and is much cheaper than Elgato's EyeTV, the hardware didn't work well for me (only received 16 of about 30 DVB-T stations), and the software is just no comparison to EyeTV, both usability and features are much better in EyeTV.
  • Snow Leopard completely messes up the Apple Remote, and full screen video display seems to lag sometimes. Hopefully they'll fix this.
  • I got myself an Apple wireless desktop, which works, but the next time I think I'd rather go for a keyboard with integrated touchpad
  • Getting my Toshiba TV to display the digital video over a digital connection without cutting of the screen or black borders was hard. I used to SwitchResX and all looks good with these exact settings. Create a new custom screen resolution and enter the values for horizontal and vertical in the text boxes. Then reboot and see if you can select the resolution (good luck!).
    Vendor 5262
    Type 103.00
    
    H 48
    V 24
    
    Horizontal / Vertical
    1832		1024
    572		32
    44		10
    192		58
    28,1		50
    
  • Running my own webserver (i.e. this weblog) on the Mac works fine so far, I just have to understand launchd some day so that I can make the services start at boot time. Well, another day ;-)

Server move

September 6, 2009 at 19:16 #

As you might know, I used to run this weblog on a virtual server hosted by Hosteurope, the smallest possible configuration. However a small virtual server doesn't seem to be enough for even the smallest weblog possible (at least when it's written in Rails...), so I moved this weblog to my own server today.

The trick is that I got a DynDNS domain and point my real domain (martin-probst.com) to that one through a CNAME record, and the media server / TV in my living room happily serves the files.

$ dig www.martin-probst.com
[... snip ...]
;; ANSWER SECTION:
www.martin-probst.com.	81215	IN	CNAME	martinpr.homeip.net.
martinpr.homeip.net.	60	IN	A	92.225.50.126

The server is a new Mac mini, so it will certainly not have the dreaded out of memory problems. I think I'm even saving money - Apple claims an idle energy consumption of about 14 Watt, which should be slightly cheaper than my server hosting in total. Of course this calculation doesn't include the hardware, but I wanted that media server anyway ;-)

In the process I also upgraded Rails to 2.3.4, which was a bit painful. But I came from 1.2.something, so some friction probably has to be expected.


Hardware on Ubuntu, once again

March 11, 2009 at 17:54 #

This is really getting ridiculous. Today I wanted to scan some document, and after some googling and searching I found out that the old & crappy USB Scanner I have here (Mustek Bearpaw 1200 CU) doesn't work on Mac OS X, does theoretically work on Windows XP, but the driver is so bad it crashes the OS all the time, but getting it to work on Ubuntu is trivial.

I still remember the times when hardware support on Linux was really bad, and getting your Wifi to work was a matter of luck. For Wifi is hear it's still not totally easy, but my experience with Windows is that it's no better on the Wifi front...


Mobile phone contracts

February 17, 2009 at 20:55 #

Recently, I changed my mobile phone provider from O2 to Simyo. It's quite funny - the regular, contract based mobile phone providers should be delivering a premium for the fact that you pay them a monthly fee and bind yourself to a commonly two year contract. And it's quite the opposite. With Simyo, I can now actually understand my bills, they have web tools that are actually useful, and I'm paying a lot less. O2 and the other providers appear to be investing the premium money mostly into commercials and sales (all these mobile phone shops in the towns must be really expensive...).

To me, usable web tools and understandable bills are a majore feature in providers of anything, even at a potential slight premium. The complete failure of most phone-related companies at this is really a shame. I would actually happily switch my fixed line provider (Alice) for another one, if I just knew a German telephone company that was actually any better.


Shell meta programming

December 8, 2008 at 11:08 #

I'm currently reworking X-Hive/DBs command line startup scripts for various utilities, and I'm facing an interesting challenge with shell programming.

The issue is that I want to have a ".xhiverc" file that contains various settings in a Java property file style. Normally, I would simply read those settings from within Java, and everything would be nice and fine. But this file is supposed to contain, amongst others, the memory settings for the virtual machine - and once the JVM is running, it's of course too late to read those.

So I need to somehow read the file from the shell. That should be easy, right? ". ~/.xhiverc" and everything is fine - or maybe not. What if the user wants to override those settings from the environment? E.g., we have XHIVE_MAX_MEMORY defined in the .xhiverc, but the user has exported XHIVE_MAX_MEMORY="2G". This is where the meta programming comes in: we have operate on variables of which we don't know the name statically.

Current solution: iterate through all legal variable names, save their state in ${VARNAME}_BACKUP, source the .xhiverc, and then re-set them to the previous value if they were non-empty. As the scripts need to be POSIX compliant (i.e., no bashisms), we don't have ${!VARNAME}, so this already involves some interesting eval scripts (eval export ${var}_BACKUP=\"\$${var}\" - the backslashes are not a Wordpressian/PHP escaping problem).

Now the next interesting thing: how to test if a variable is set? Testing if it's empty is [ -n "${VARNAME}" ], but what if someone wants to override a default setting to be undefined? If you know the name, it's "${XHIVE_MAX_MEMORY+x}" = "x". If you don't, it's again some horrible eval combination - maybe I'm missing it, but there doesn't seem to be a standard "defined" command/test.

I have the feeling I'm doing something wrong - this should be easier (tm). Maybe I should just forget about the whole thing, and have a XHIVE_DEFAULT_MAX_MEMORY and a second XHIVE_MAX_MEMORY, same for the other variables...

What surprised me along this, this of course also has to work in Windows batch. And everyone knows that Windows batch is probably one of the most horrible programming environments ever "invented". But this particular problem is actually not too difficult. Once someone on StackOverflow.com enlightened me over the byzantine details of the Windows batch FOR loop, it's a relatively simple loop containing an IF DEFINED %%i:

  FOR /F "eol=# tokens=1,2* delims==" %%i in ('type "!XHIVERC!') do (
    REM only set variables if not already defined as environment variables (they take precedence)
    IF NOT DEFINED %%i (
      SET %%i=%%~j
    )
  )

SSD is the new disk, disk is the new tape

November 21, 2008 at 09:34 #

Tim Bray has some very interesting performance numbers for storage systems.

There is this saying that memory is the new disk, disk is the new tape. I think we have to insert something there - SSD is the new disk, disk is the new tape, and memory is somewhere between the CPU cache and the SSD.

The problem is then, how to benefit from these enhancements. If you have ye olde database system, you could simply put all of the data on SSD. This will be fast, but quite a bit of a waste. DBMSes currently manage the cache hierarchy on their own, having a memory cache for the really hot data, a disk storage for the not-so-hot, and tapes for backups.

It would be really nice if the DBMS was aware of the wildly different seek times of SSDs and disks, and if it thus could manage this aspect of the storage hierarchy, too. Ideally, it would lazily remember which data was accessed recently, and move the old stuff to disk. For example, in everyones favorite running performance example - called "Twitter" - presumably next to no one cares about tweets that are older than a month or so, so you could move them to tape disk.

This is again a good example of a change in requirements for databases which as it is now requires developers to implement the smarts themselves. Let's hope databases will learn this...


Java & Ruby complexity

November 19, 2008 at 07:33 #

Patrick Mueller writes:

Same sort of nutso thinking with Java. A potentially decent systems-level programming language, it could have been a successor to C and C++ had things worked out a bit differently. But as an application programming language? Sure, some people can do it. But there's a level of complexity there, over and above application programming languages we've used in the past - COBOL and BASIC, for instance - that really renders it unsuitable for a large potential segment of the programmer market. [...] We're seeing an upswing in alternative languages where Java used to be king: Ruby, Python, Groovy, etc

I really don't agree with the notion of complexity in Java. Complexity as a term is IMHO highly unprecise, so maybe we're just thinking differently about it here.

Much of the stuff people people don't like about Java is actually it's verboseness (compared to e.g. Ruby), but that's nearly the opposite of complexity. The inventors of Java explicitly left a lot of features out - like closures - because they feared they would create a too complex programming language.

Ruby & Co have all these features, plus a lot of nice meta programming, and a somewhat weird module/inclusion/inheritance system. I personally think that Ruby is much more complex than Java in the long term. The interesting question is whether people will be happy with the added complexity in the long term.

I see this as a trade off in programming languages: language features like cool meta programming, closures, or a really worked out type system (a la OCaml & Haskell) can remove a lot of accidental complexity: with them. you're able to write programs much more succinct, or have proofs of global properties of you're program that weren't possible before.

On the other hand, language features can create a lot of complexity, if not done really well. I'm reading the Scala mailing list, and I remember discussions of the sort "is this code legal Scala? and if it is, what does it mean?" (usually from a type system point of view), and if I remember correctly the language designers weren't quite sure about it either. This is exactly what you don't want in a language: unclarity or ambiguity of expressions, unexpected "side effects" of expressions.

Quite a lot of Ruby/Rails code one happens to see is clever in very interesting ways. But I really see that cleverness as a problem: who will understand the tricks that made the code a bit shorter in five years? Probably someone, but it might take him a long time to do so. Already now it's sometimes quite difficult to find documentation on a particular library method/class in Ruby, as the documentation system is apparently not up to handle the language's module inclusion features.

At what point do all these clever tricks sum up to something that is no longer understandable? Are we really sure that the modularization works out good enough that we don't have to be afraid of all ending up as a large meta-closure-soup? ;-)

Don't get me wrong: I like dynamic languages for a lot of features. I'm just weary of some of the effects. Pushing accidental complexity out of the application and into the programming language (now as feature complexity) should normally be a good thing: it sounds reasonable that this should reduce overall complexity, and give programmers a broader understanding of what's happening. We need a good modularity system and proper abstractions to have a real positive effect from this - and I'm not sure I see this in e.g. Ruby.