Martin Probst's weblog

Higher order functions in XQuery

August 1, 2008 at 07:39 #

The requirements for XQuery 1.1 contain a MAY item called "higher order functions". I'm really fond of the idea of higher order functions in XQuery, and as there are currently no use cases for that, I'll contribute some in here:

Simple predicates

A simple use case is to pass a predicate to another function, as shown below:

(:: Selectively only copy those elements for which $pred returns true 
    TODO works recursively, but copys all attributes regardless of $pred
  :)
declare function local:selective-copy($nodes as element()*, $pred as function($node as element()) -> xs:boolean)
{
	for $n in $nodes
	return 
	  (: call $pred :)
	  if ($pred($n)) then element { node-name($n) } { $n/@*, local:selective-copy($n/*, $pred) }
	  else ()
};

declare function local:mypred($node as element()) as xs:boolean
{
  let $name = node-name($node)
  return namespace-uri-from-QName($name) = ('http://www.example.com/example', 'http://www.foo.com/bar')
};

let $xml :=
  <user xmlns="http://www.example.com/example">
    <name><first>Fritz</first> <last>Müller</last></name>
    <password xmlns="http://www.example.com/secure">vj/b5ZaUYQ6kU</password>
    <preferences> ... </preferences>
  </user>
(: user the curry operator & to get a function "handle" :)
let $predicate := &local:mypred
return local:selective-copy($xml, $predicate)

Currying

The next use case would be the natural extension - real currying of functions. local:selective-copy is as above, but we'll generalize our predicate a bit:

(: this function compares the namespace of a given node against a set of legal namespaces :)
declare function local:namespace-matches($namespaces as xs:anyURI*, $node as element()) as xs:boolean
{
  let $name = node-name($node)
  return namespace-uri-from-QName($name) = $namespaces
};

let $xml :=
  <user xmlns="http://www.example.com/example">
    <name><first>Fritz</first> <last>Müller</last></name>
    <password xmlns="http://www.example.com/secure">vj/b5ZaUYQ6kU</password>
    <preferences> ... </preferences>
  </user>
(: user the curry operator & to get a function "handle"
   we pass in a namespace to check against, and get an unary function in return :)
let $predicate := &local:mypred("http://www.example.com/example")
return local:selective-copy($xml, $predicate)

The & operator

The & operator ("curry operator") returns a handle to the function given by name, possibly setting values for parameters by specifying them in parentheses. This provides the reification for functions, i.e., the way from a function to a value.

Currying is only allowed from left to right, i.e., one cannot curry the second argument, but leave the first argument to a function unbound. I don't think this is a large restriction, and it makes the syntax much easier.

Calling function handles

The $someval(...) syntax allows straight forward calling of a function handle, so it is somewhat of the inverse of the & operator.

Types for higher order functions

I think the static typing feature of XQuery never really gained much traction. However it would of course be possible to introduce a new type, called "function()", and specify parameters and return values as shown above. I think it should be possible to type-check that, but I'm not an expert.

Implementation

I didn't implement this yet (I currently don't have access to an XQuery implementation), but it should not clash with any existing syntax, so from a grammar point of view it should be ok. It does require some changes to the runtime system, but that shouldn't be too difficult, IMHO.

The nice thing about higher order functions is that they can allow some method of dynamic dispatch. That is, they allow to write programs that decide at runtime which code is going to be executed, in an elegant way.

This is of course not complete. It doesn't support a terse syntax for lambdas, which would also be nice (not having to declare all those pesky one-line functions). XQuery should also have something like a "fn:resolve-function($name, $argc)" that provides dynamic access to the curry operator by specifying the QName and argument count of the function.

I think the example shows that this little extension can get you a lot of nice functionality and a lot less typing. Please leave a comment with your opinion!


XQuery Scripting Extensions and Use Cases

July 9, 2008 at 07:48 #

In march, the W3C published a first draft of the XQuery Scripting Extensions use cases and a working draft.

The XQSE propose to extend XQuery to add a defined expression evaluation order, variable assignments, a while loop, and some other control flow statements. This worries me a lot - I've spent considerable time implementing and using XQuery, and I really feel that this extension will break XQuery as a language.

The problem is that XQuery was intended to be a functional, declarative language. This allows implementations to reorder statements, executing them in a (hopefully) optimal order, benefiting from index lookups and the like. Now that they add side effects and state to the language, this is no longer possible in the general case. It will also greatly complicate XQuery implementations.

Of course, the question is, what benefits might this extension of the language give. The use cases document provides some insight.

Use cases section R Q1-3 define queries that perform some modification of a persistent document and at the same time return a result (the new bid, number of deleted accounts, ...).

Use case Q4 describes the use of a while loop to constantly poll the current highest bidder on some item and perform an action if it changes. I'm not sure what this seemingly strange scenario is supposed to solve, and the editors appear to be weary of this, too.

Q1-3 can easily be solved by allowing queries to perform modifications and return values at the same time. I've implemented this in X-Hive/DB, and it was actually simpler than what the W3C prescribed. I've since argued against this arbitrary limitation of the XQuery Updates specification, and maybe now would be a good time to fix that issue? Simply drop the separation between modifying and non-modifying queries and be done with it. About Q4, I'm not really sure. It looks like something that could be easily solved with some event based or message sending system. The fact that the editors are unsure how to implement this should probably be taken as a warning sign. Are we sure someone really needs this?

The use cases XHTML / AJAX both describe a scenario where a script first has to show a "busy" notice to the user, and then look up some data. I'm not sure if they envision XQuery running on the client, but otherwise this is perfectly solvable today. The client side JavaScript execution allows this trivially, and I really see no reason why to mix this client side GUI stuff with the server side operation. It breaks the MVC pattern for no really good reason.

Use case WS is again a variation of the "I want to perform changes and report results" theme. Drop the limitation in the XQuery Updates spec and be done with it.

In summary, it seems like they want to solve an actual problem, but approach it at a really complex angle. 5 use cases can trivially be solved by modifying the updates spec, 2 use cases (XHTML) should probably be considered harmful anyways, and one (R-Q4) I fail to understand. The collateral damage caused by the XQSE spec in complexity is not worth the net effect, if it can also be achieved by simplifying another spec!

I think we should rather concentrate on XQuery 1.1 with the grouping and windowing functionality that many people really need. Which, by the way, could also really use some simplification.


BlogImpl implements Blog

May 20, 2008 at 15:27 #

There's a weird thing you'll see all over the place in Java frameworks and applications of all kind. One might call it Implitis.

The symptoms are that each and every class implements some interface, and it's always the sole implementor of the interface. So you have a Blog and a BlogImpl, a Window and a WindowImpl, and so forth.

While in some cases there may be reasons for this pattern (e.g. Java's limited visibility rules force you to make some methods public that really shouldn't be), I think in many cases this is just someone who has read some book, and now wants to decouple everything and everyone.

If there is only one thing that implements your Foo interface, and you can't even give it a better name than FooImpl, I declare this a code smell. If the class is conceptually any different from it's interface, there should be a qualifying name addition for that, like a DatabaseFoo vs. a FileFoo, or a MailFoo vs. an HTMLFoo. If not, you're probably just complicating you application for no apparent reason.

I was reminded of this by the Spring criticism, see last post. This really ticks me off, people talking with glazing eyes off decoupled everything (do they realize that decoupled means no connection whatsoever? how does that work then?), and in practice only writing horribly complicated, ugly, implitis-infected code.


Spring criticism

May 20, 2008 at 15:06 #

A nice collection of criticisms on Spring, via Stefan Tilkov. The Spring part is highly interesting, it reproduces a lot of my frustration and annoyances with it.

The part about PHP isn't aggressive and derogative enough. When discussing PHP, one should always swear.

And finally, the part about Rails appears dubious, I feel it slides into marketing. The SQL injection problems, and the model object to database coupling things appear to be highly construed.


Java 3

May 20, 2008 at 07:23 #

Ola Bini has posted an interesting list of features he'd like to have in a hypothetical Java 3.

There are some points I definitely wouldn't like, such as "No primitive arrays." - I know they are a pain to have in the type system, but getting rid of them and thus making it nearly impossible to implement custom data structures is really not the way forward. One should try to come up with a primitive collection-like data structure that better fits into the language and type system, but dropping them altogether is not a good idea.

What I'd really like to see in Java 3 are two things: more syntactic sugar for common things, some important platform features, identifiers as first class citizens, a structured way out of the type system, and named parameters. Which is roughly the order of difficulty in implementation and unlikeliness of having these features actually appear :-)

  • Syntactic sugar would be stuff like allowing the [] and []= operators on everything that implements List or Map, respectively. Plus collection literals (Maps and Lists, or Arrays) with type inference for the generic collection type. This really doesn't harm at all, as it totally fits with the language as-is. I know it's "just syntactic sugar", but this could really make a lot of stuff easier.
  • Platform features - we need a way to reliable and possibly forcibly reload code, possibly modifying the runtime layout of objects, adding methods and fields. Plus, we need a way to reliably kill rampant threads. And we need a way to reliably isolate certain application/platform layers, such that they can be independently reloaded, garbage collected, etc. The latter one might be the first step in a solution for the first ones. I know this is hard, but if Java-as-a-platform is supposed to be cool (and to grow), we really need this. There must be a way to achieve this - somewhere along the lines of passing immutable message objects between layers.
  • Identifiers as first class citizens. Everybody wants methods/functions as first class citizens, and I agree, but something else we will need for that, and what would be really useful in other areas, would be identifiers as first class citizens. Proposed syntax would be something like this: Identifier id = &java.lang.String.toString;
    id = &com.corp.MyClass.myField;
    . These should come in three flavors: locally scoped, fully resolved, and attached to a certain instance (Method handles alike). This would simplify a lot of the meta programming, in particular all the APIs like JPA that need to identify certain fields of a class. This is somewhat similar to the Methods-as-first-class citizens, but a more general feature that could coexist nicely with these.
  • Escaping the type system is done all the time by Java frameworks, even today. Everyone is using some sort of reflection or XML configuration (shudder) to invoke methods on unknown types, inject fields, or whatever. But it's an unnecessary pain to do that, so I'd propose to introduce a "var" type that allows calling anything on it:
    var foo = somebla();
    foo.myMethod(); foo.somethingElse(); ...
    
    If a method doesn't exist, a well-known method could be called (think "method missing"). And if all breaks, this would simply give a NoSuchMethodException or similar. This would allow leaving the type system without paying such a large tax for it. Also, we should allow a much simpler way of reflection, e.g.:
    String methodName = "run";
    Object x = ...;
    x.(methodName)(args);
    
  • Named parameters are actually the best thing since bread came sliced. Our way of calling methods just by putting arguments in parenthesis in some order is wrong, wrong wrong. This comes from the bad C inheritance where you'd simply push stuff onto the stack, with no checking whatsoever. We don't need this today, and having named parameters makes a whole class of nasty errors highly unlikely. E.g. think about calling MyConstructor(String,String,String,int,int,double,String). How often do you mix up the order? This would also allow for much better error messages from the compiler. Of course, this feature comes from good old Smalltalk (at least for me): Point moveToX: 12 withY: 15 andTitle: "Hello".. Much better.

Maybe one should start a project to actually try this stuff out. I know that many people have stated that Ola's list is just a description for a subset of Scala (or some other language), and they are right, in some sense. But this is an important point: I don't want all the complexity of Scala. And I don't like their implicits, the "object" feature - which is syntactic sugar for something that shouldn't be there at all, static must die -, the sometimes cryptic syntax, the weird rules about operator/method precedence, and so forth. This deserves further qualification (a lot), but I'm just not happy with Scala.

I guess one should also look at the work done by Gilad Bracha (Newspeak) and possibly Ian Piumarta/Alan Key in their COLA stuff. The former, because Bracha introduces some really nice and useful features, the latter just because it's totally awesome :-)


Google Groups being spammed

May 19, 2008 at 06:17 #

I'm getting lots of comment spam attempts for some Google Groups pages, e.g. "http://groups.google.us/group/dyt-cheaptickets". The mentioned groups actually exist, and the spam content is being hosted by Google.

There is no "report abuse" link, so I have no idea how to tell the Google guys about this.

Also, I wonder how the spammers make it past the CAPTCHA?


Implementing a blog

May 13, 2008 at 13:10 #

It's quite funny. Implementing a basic blog is basically the "hello world" of web application frameworks. You simply shove titles, body text and some timestamps into a database and retrieve it from there, again. Next lesson is 1-to-many mapping with comments.

The funny thing is that implementing a blog that way is not really implementing a blog. Implementing a blog is writing a tiny handler that shoves stuff into the database and out of it, and then spending ridiculous amounts of time filtering comments for spam. Which is one of the really difficult things to do on the internet - validating users as real users, without being inaccessible.


XQuery pretty printer and grammar open source

May 7, 2008 at 10:11 #

I just released my XQuery pretty printer, and with that the XQuery grammar, as open source, Apache 2.0 license. I've created a Google Code project for that: xqpretty.

I hope somebody finds this useful. I have a hunch that at least one company will take a look at it :-)


More XQuery pretty printing

April 23, 2008 at 17:57 #

I've upgraded the XQuery pretty printer once again. It's actually surprisingly difficult to get whitespace handling and indentation only close to 'right', at least in a language that is syntactically as complex as XQuery.

This version should insert whitespace at the correct places, properly handle long lines in many more cases (before, you'd get long runs of empty lines in some cases), and remove whitespace in other cases.

I've also added some options for the curious. You can now display regular HTML, pure HTML with no other tags around it, plain text, indented, the parse tree before formatting, and the HTML before it is run through the indenter. The options are available through the XQuery formatter form, too.

Of course you can get all the other formatting modes without HTML decoration around it, too. And as you can see you can easily construct URLs for the formatter. So the only thing left is to wire this up to my JavaScript syntax highlighting in this blog, and I'll have nicely formatted XQuerys!


JSPs suck, episode 217

April 22, 2008 at 13:41 #

Turns out that the JSP parser doesn't even try to be an HTML parser (I figured it's not an XML parser long time ago...). If you comment out JSP code like this:

<!--c:if> ... bla ... </c:if-->

You'll get this:

org.apache.jasper.JasperException: /WEB-INF/jsp/task/showTask.jsp(109,4) The end tag "&lt;/c:if-" is unbalanced

I truly have to find myself another Java templating engine/language. JSPs come out of the box, but they are simply to annoying in the long run. I guess the effort to integrate something - anything! - else would already have payed off.