Martin Probst's weblog

J2EE not ready for the enterprise

Dec. 17, 2007, 11:45 a.m. — 1 comment

Well, not really. But it's surprising how just about anything I've ever used gets i18n and especially Unicode support wrong, in some way or another.

After a lot of searching I just found out that you have to put "<%@ page contentType="text/html; charset=UTF-8" %>" on every single JSP page to get the correct encoding, putting it in one include doesn't work. How annoying. Thanks go to Cagan.

Puny Java Webstart Log Analyzer

Dec. 14, 2007, 11:17 a.m. — 0 comments

Just for kicks I wrote a trivial Java Web Start application. I wanted to analyze my web server access logs, and while grep + some Ruby code work it's nicer to have something graphical.

I couldn't find any existing JWS app, so I wrote one that parses a log in Apache's combined format and produces some charts (using the nice JFreeChart library).

So here is JLogAnalyzer. It takes about one second per megabyte of logfile, which is quite awful, but it works. It displays the top 20 requested URIs, the top 20 User Agents and requests per day.

screenshot of jloganalyzer

Doing some Java after a long time work gave mixed feelings. Available libraries like JFreeChart are really nice, and Web Start is also great. Writing the code to parse and aggregate information from the log wasn't - at least if you compare to equivalent Ruby code. Collections without closures simply suck. Running the parsing in a background task, including progress meter and all bells and whistles is ridiculously easy using SwingWorker and ProgressMonitorInputStream.

My only grief about JFreeChart is the use of a custom data set hierarchy that doesn't use generics, instead of providing an easy interface to the Java Collections API. E.g., why require a PieDataSet if you could simply accept a List containing Map.Entry's?

Use Spotlight instead of find/locate

Dec. 5, 2007, 10:16 a.m. — 0 comments

Might be really obvious, but you can use the command line Spotlight tool instead of find, locate, and grep -R. The cool thing is that it combines the speed of locate with the up-to-dateness of grep and find. It's not a complete replacement, but useful anyways.

You can limit mdfind to a certain directory with the -onlyin switch, e.g. mdfind -onlyin foo subsitutes grep '...' -R foo. It doesn't have full regular expressions (as far as I know), but I don't need those most of the time anyways.

Finding something by filename as in find is a bit ugly, you have to use some Spotlight specific attribute, e.g. mdfind 'kMDItemDisplayName="comment*"' will look for files starting with 'comment'. Beware that this will also look through the Apple translated names, e.g. mdfind 'kMDItemDisplayName="Öffent*"' will find /Users/martin/Public, at least on a German mac. Looking for the correct key, kMDItemFSName, works, but is apparently not answered from a cache - it's faster than a global find, but still quite slow on my system.

mdimport -X gives a list of known attributes that can be used in searches. With some bash scripting one could probably get quite close to locate or grep's comment line interface.

Content sanitation, html5lib and Iñtërnâtiônàlizætiøn

Nov. 29, 2007, 4:13 p.m. — 0 comments

As I wrote, I migrated to a handwritten blog engine mainly because I was unsatisfied with the way Wordpress handled my content*. So one of the goals was to properly handle any input HTML and Unicode characters.

Unicode

Unicode support turned out to be more tricky than expected. I decided that for anything written mostly in a western European language, there is only one encoding and it's called UTF-8. Debugging Unicode issues can be quite ugly, as it can be quite difficult to find out how something is actually encoded. Useful utility: hex editor to find out what these bytes really are. Sadly, that doesn't help much with MySQL.

First thing is to make sure that really every single part of the tool chain is unicode aware. There is a nice collection of tipps here. In my case, LC_ALL on my server had to be set to en_US.UTF-8, my MySQL tables had somehow been created as non-unicode. The original wordpress database had a totally bizarre mix of unicode and non-unicode columns in every table.

A very useful command in MySQL is

mysql> show create table <tablename>;

Watch out for the DEFAULT ENCODING and per-column encodings.

Also important is to run all mysql scripts with the proper charset set, it appears to default to some latin charset:

mysql -u ... -p --default-character-set=utf8

HTML

I'm preprocessing all data from the outside using html5lib. html5lib will parse anything and produce a DOM tree that is similar to what a browser would create. I added some code to wrap plain text outside of block level elements in <p/> containers.

It works nice, although it's quite slow. One caveat: to html5lib, UTF-8 is called 'utf-8', not 'utf8'. You won't notice your Babylonian problems until a German U-Umlaut Ü shows up as the character '端' - probably some broken auto detection.

Anyways, now my database contents look good :-)

MacBook Pro defects summary

Nov. 29, 2007, 9:12 a.m. — 0 comments

Since I bought my MacBook Pro (1st generation) in April 2006 I'm a very happy Mac OS user. The operating system and most applications on it are really nice.

On the other hand, the hardware dongle (i.e. the MacBook) I bought on that day has been horrible. The hardware itself is nice, it looks good, it's reasonably fast, etc., but I've never experienced so many quality problems in any electronics equipment.

When it arrived, the keyboard (which is btw. much worse than the keyboard on my old Thinkpad X30) was broken, some keys didn't work. You would think that Apple knows how to build keyboards by now...

Then in June 2006, on a business trip to Chicago, the whole system broke down, motherboard had to be exchanged. Luckily, that was on the last day.

Next thing was that the right fan died, somewhen in September.

Then I noticed that the notebook started to give me electric surges when it was connected to AC power. Apple refused to repair this, as the voltage was apparently below 50 V. I really find this development in the American torture debate alarming. The problem has stopped by now, though.

Early this year, the other fan - left side - died. I was just out of the 1 year guarantee and Apple first refused to repair that. After insisting they did, as it's obviously a manufacturing failure if both fans break in such a short time frame.

And now yesterday I open Disk Utility and see that my internal harddrive reports a S.M.A.R.T. failure. Great.

For me this has been mostly annoying, as all repairs so far have been covered by guarantee. For Apple, selling me this sort of hardware must have been quite a financial loss. Assuming they make 1000 € out of a 3000 € MacBook Pro, the four repairs, including one that required expensive hardware, have probably eaten that up.

I'd be interested if the quality record of later models is similar. Several of my friends had quite a lot of hardware trouble with their models, too. I can't imagine how Apple makes money from this, if their hardware fails so hard within guarantee time.

RubyGems upgrade 0.9.5

Nov. 21, 2007, 7:56 p.m. — 0 comments

RubyGems is available in version 0.9.5. And installing it with gem update --system broke about everything on my system :-(

Gems couldn't find any already installed gem anymore. The fix was easy, though a bit time consuming: re-install all gems you have.

$ export installed_gems="`ls /Library/Ruby/Gems/gems/ | sed s/-[^-]*$//g | sort -u`"
$ gem install $installed_gems

(with /Library/Ruby/Gems/gems/ being the path to your gems)

InputManagers in Leopard

Nov. 20, 2007, 9:12 p.m. — 1 comment

In Leopard, InputManagers need to be installed in /Library and owned by root, for security reasons. Tutorials how to re-enable them can be found e.g. on Mac OS X hints or in the TextMate blog.

Something not mentioned in those tips is that not only the input managers themselves but also the InputManagers directory must be owned by root and only writable by the owner (g-w).

These two commands did the trick for me:

$ sudo chown -R root:wheel /Library/InputManagers
$ sudo chmod -R go-w /Library/InputManagers

By the way, does anyone know what the '@'-sign after the rights in a directory listing means? As in drwxr-xr-x@ 11 root wheel 374B 11 Sep 04:25 SafariBlock?

MySQL backup/restore task for Capistrano

Nov. 20, 2007, 10:59 a.m. — 0 comments

This is a simple backup task for Capistrano (which could really use a lot more documentation...).

The tool reads database configuration from the local database.yml. This Works For Me (tm) as I keep the local and remote database configuration identical - YMMV.

While the script doesn't require you to type the database user's password, it will echo it to the console for the restore task. Avoiding that seems to be quite tricky - I tried sending the backup directly over the stream and piping in the password before, but that gives an obscure error.

So the following will have to do for now, but I'm quite pleased with it. I should probably include a warning/confirmation before restoring, but hey, command lines are for experts ;-)

$config = YAML.load_file(File.join('config', 'database.yml'))

desc "Backup the database to db/" + Time.now.strftime("backup_#{$config['production']['database']}_%Y-%m-%e.sql")
task :backup, :roles => :db, :only => { :primary => true } do 
  backup_path = File.join('db', Time.now.strftime("backup_#{$config['production']['database']}_%Y-%m-%e.sql"))
  on_rollback { delete backup_path, :recursive => false }
  backup_file = File.new(backup_path, 'w+')
  run "mysqldump --default-character-set=utf8 " +
    "--user=#{$config['production']['username']} " +
    "--password " +
    "-B #{$config['production']['database']}" do |channel,stream,data|
    if stream == :out
      backup_file.write(data)
    else
      if data =~ /^Enter password:/
        channel.send_data($config['production']['password'])
        channel.send_data("\\n")
      else
        raise Capistrano::Error, "unexpected output from mysqldump: " + data
      end
    end
  end
  logger.info "Database dumped to #{backup_path} successfully."
end

desc "Restore the database from backup"
task :restore, :roles => :db do
  backups = Dir[File.join('db', "backup_#{$config['production']['database']}_*.sql")]
  raise Capistrano::Error, "no backup found!" if backups.size == 0
  last_backup = backups.sort[-1]
  put(File.read(last_backup), "#{current_path}/db/restore.sql")
  logger.info "Restoring from #{last_backup}"
  run "mysql --default-character-set=utf8 " +
    "--user=#{$config['production']['username']} " +
    "--password=#{$config['production']['password']} " do |channel, stream, data|
    raise Capistrano::Error, "unexpected output from mysql: " + data
  end
  logger.info "Restored successfully."
end

New blog engine

Nov. 20, 2007, 10:59 a.m. — 1 comment

I ported my old WordPress blog over to a hand-written Ruby solution. You probably already noticed that my permalinks were not that perma, so apologies for re-appearing entries in your feed readers.

I decided to move away from WordPress after taking a look in my archives. Through various import/export operations and the liberal re-formatting of entries - done by WordPress itself or various plugins - the data in the database was a complete mess. Corrupt UTF-8, double, triple and quad escaped anything, mixed encoded and non-encoded HTML... took me quite some time to clean it up (thank God for RegExps).

Writing a simple blog in Ruby on Rails is an easy exercise, at first. It gets a lot more complicated once you consider trackbacks/pingbacks, proper permalinks, comment spam, etc., but more on that in separate entries.

Migrating to Google Apps (copying IMAP mails)

Oct. 29, 2007, 12:35 p.m. — 0 comments

Now that Google has announced IMAP support for Gmail I’m migrating my email to Google Apps.

I’ve always had a HostEurope WebPack that provides some webspace, PHP, MySQL and IMAP. Some time ago I also ordered a virtual root server, to have some fun with rails, and a general space for experimentation. Then I wanted to take the webpack down as I didn’t need it anymore. But to be honest, I soon figured out that configuring and properly maintaining a whole email setup (MTA, IMAP, various spam filters, …) is indeed a lot of work.

So I moved all my email related stuff to Google Apps. So far it looks quite nice. It’s a bit strange that my regular Google user account didn’t integrate with the new one, but I simply dropped the old account.

Now I’m copying all my IMAP emails over to Google Mail. Surprisingly, I couldn’t find an easy to use, readily working script to copy IMAP messages from one host to another. There are several, but they seem to be either unmaintained, requiring obscure dependencies, or require bizarrely complicated setup.

So in a first class wheel reinvention act I wrote my own IMAP copy tool, in ruby; imapcopy.rb. Only dependency is highline for the password prompt, but if you don’t want that, you can easily adapt the code.

I really like it: it does everything I needed, doesn’t require any configuration, it only copies messages that are not present on the new host, and even prints a nice spinner ;-). Sample usage:

ruby imapcopy.rb user1@somehost.com user2@gmail.com@gmail.com
Password for user1@somehost.com:
Password for user2@gmail.com@gmail.com:
...