Martin Probst's weblog

XQuery Scripting Extensions and Use Cases

July 9, 2008 at 07:48 #

In march, the W3C published a first draft of the XQuery Scripting Extensions use cases and a working draft.

The XQSE propose to extend XQuery to add a defined expression evaluation order, variable assignments, a while loop, and some other control flow statements. This worries me a lot - I've spent considerable time implementing and using XQuery, and I really feel that this extension will break XQuery as a language.

The problem is that XQuery was intended to be a functional, declarative language. This allows implementations to reorder statements, executing them in a (hopefully) optimal order, benefiting from index lookups and the like. Now that they add side effects and state to the language, this is no longer possible in the general case. It will also greatly complicate XQuery implementations.

Of course, the question is, what benefits might this extension of the language give. The use cases document provides some insight.

Use cases section R Q1-3 define queries that perform some modification of a persistent document and at the same time return a result (the new bid, number of deleted accounts, ...).

Use case Q4 describes the use of a while loop to constantly poll the current highest bidder on some item and perform an action if it changes. I'm not sure what this seemingly strange scenario is supposed to solve, and the editors appear to be weary of this, too.

Q1-3 can easily be solved by allowing queries to perform modifications and return values at the same time. I've implemented this in X-Hive/DB, and it was actually simpler than what the W3C prescribed. I've since argued against this arbitrary limitation of the XQuery Updates specification, and maybe now would be a good time to fix that issue? Simply drop the separation between modifying and non-modifying queries and be done with it. About Q4, I'm not really sure. It looks like something that could be easily solved with some event based or message sending system. The fact that the editors are unsure how to implement this should probably be taken as a warning sign. Are we sure someone really needs this?

The use cases XHTML / AJAX both describe a scenario where a script first has to show a "busy" notice to the user, and then look up some data. I'm not sure if they envision XQuery running on the client, but otherwise this is perfectly solvable today. The client side JavaScript execution allows this trivially, and I really see no reason why to mix this client side GUI stuff with the server side operation. It breaks the MVC pattern for no really good reason.

Use case WS is again a variation of the "I want to perform changes and report results" theme. Drop the limitation in the XQuery Updates spec and be done with it.

In summary, it seems like they want to solve an actual problem, but approach it at a really complex angle. 5 use cases can trivially be solved by modifying the updates spec, 2 use cases (XHTML) should probably be considered harmful anyways, and one (R-Q4) I fail to understand. The collateral damage caused by the XQSE spec in complexity is not worth the net effect, if it can also be achieved by simplifying another spec!

I think we should rather concentrate on XQuery 1.1 with the grouping and windowing functionality that many people really need. Which, by the way, could also really use some simplification.


BlogImpl implements Blog

May 20, 2008 at 15:27 #

There's a weird thing you'll see all over the place in Java frameworks and applications of all kind. One might call it Implitis.

The symptoms are that each and every class implements some interface, and it's always the sole implementor of the interface. So you have a Blog and a BlogImpl, a Window and a WindowImpl, and so forth.

While in some cases there may be reasons for this pattern (e.g. Java's limited visibility rules force you to make some methods public that really shouldn't be), I think in many cases this is just someone who has read some book, and now wants to decouple everything and everyone.

If there is only one thing that implements your Foo interface, and you can't even give it a better name than FooImpl, I declare this a code smell. If the class is conceptually any different from it's interface, there should be a qualifying name addition for that, like a DatabaseFoo vs. a FileFoo, or a MailFoo vs. an HTMLFoo. If not, you're probably just complicating you application for no apparent reason.

I was reminded of this by the Spring criticism, see last post. This really ticks me off, people talking with glazing eyes off decoupled everything (do they realize that decoupled means no connection whatsoever? how does that work then?), and in practice only writing horribly complicated, ugly, implitis-infected code.


Spring criticism

May 20, 2008 at 15:06 #

A nice collection of criticisms on Spring, via Stefan Tilkov. The Spring part is highly interesting, it reproduces a lot of my frustration and annoyances with it.

The part about PHP isn't aggressive and derogative enough. When discussing PHP, one should always swear.

And finally, the part about Rails appears dubious, I feel it slides into marketing. The SQL injection problems, and the model object to database coupling things appear to be highly construed.


Java 3

May 20, 2008 at 07:23 #

Ola Bini has posted an interesting list of features he'd like to have in a hypothetical Java 3.

There are some points I definitely wouldn't like, such as "No primitive arrays." - I know they are a pain to have in the type system, but getting rid of them and thus making it nearly impossible to implement custom data structures is really not the way forward. One should try to come up with a primitive collection-like data structure that better fits into the language and type system, but dropping them altogether is not a good idea.

What I'd really like to see in Java 3 are two things: more syntactic sugar for common things, some important platform features, identifiers as first class citizens, a structured way out of the type system, and named parameters. Which is roughly the order of difficulty in implementation and unlikeliness of having these features actually appear :-)

  • Syntactic sugar would be stuff like allowing the [] and []= operators on everything that implements List or Map, respectively. Plus collection literals (Maps and Lists, or Arrays) with type inference for the generic collection type. This really doesn't harm at all, as it totally fits with the language as-is. I know it's "just syntactic sugar", but this could really make a lot of stuff easier.
  • Platform features - we need a way to reliable and possibly forcibly reload code, possibly modifying the runtime layout of objects, adding methods and fields. Plus, we need a way to reliably kill rampant threads. And we need a way to reliably isolate certain application/platform layers, such that they can be independently reloaded, garbage collected, etc. The latter one might be the first step in a solution for the first ones. I know this is hard, but if Java-as-a-platform is supposed to be cool (and to grow), we really need this. There must be a way to achieve this - somewhere along the lines of passing immutable message objects between layers.
  • Identifiers as first class citizens. Everybody wants methods/functions as first class citizens, and I agree, but something else we will need for that, and what would be really useful in other areas, would be identifiers as first class citizens. Proposed syntax would be something like this: Identifier id = &java.lang.String.toString;
    id = &com.corp.MyClass.myField;
    . These should come in three flavors: locally scoped, fully resolved, and attached to a certain instance (Method handles alike). This would simplify a lot of the meta programming, in particular all the APIs like JPA that need to identify certain fields of a class. This is somewhat similar to the Methods-as-first-class citizens, but a more general feature that could coexist nicely with these.
  • Escaping the type system is done all the time by Java frameworks, even today. Everyone is using some sort of reflection or XML configuration (shudder) to invoke methods on unknown types, inject fields, or whatever. But it's an unnecessary pain to do that, so I'd propose to introduce a "var" type that allows calling anything on it:
    var foo = somebla();
    foo.myMethod(); foo.somethingElse(); ...
    
    If a method doesn't exist, a well-known method could be called (think "method missing"). And if all breaks, this would simply give a NoSuchMethodException or similar. This would allow leaving the type system without paying such a large tax for it. Also, we should allow a much simpler way of reflection, e.g.:
    String methodName = "run";
    Object x = ...;
    x.(methodName)(args);
    
  • Named parameters are actually the best thing since bread came sliced. Our way of calling methods just by putting arguments in parenthesis in some order is wrong, wrong wrong. This comes from the bad C inheritance where you'd simply push stuff onto the stack, with no checking whatsoever. We don't need this today, and having named parameters makes a whole class of nasty errors highly unlikely. E.g. think about calling MyConstructor(String,String,String,int,int,double,String). How often do you mix up the order? This would also allow for much better error messages from the compiler. Of course, this feature comes from good old Smalltalk (at least for me): Point moveToX: 12 withY: 15 andTitle: "Hello".. Much better.

Maybe one should start a project to actually try this stuff out. I know that many people have stated that Ola's list is just a description for a subset of Scala (or some other language), and they are right, in some sense. But this is an important point: I don't want all the complexity of Scala. And I don't like their implicits, the "object" feature - which is syntactic sugar for something that shouldn't be there at all, static must die -, the sometimes cryptic syntax, the weird rules about operator/method precedence, and so forth. This deserves further qualification (a lot), but I'm just not happy with Scala.

I guess one should also look at the work done by Gilad Bracha (Newspeak) and possibly Ian Piumarta/Alan Key in their COLA stuff. The former, because Bracha introduces some really nice and useful features, the latter just because it's totally awesome :-)


Google Groups being spammed

May 19, 2008 at 06:17 #

I'm getting lots of comment spam attempts for some Google Groups pages, e.g. "http://groups.google.us/group/dyt-cheaptickets". The mentioned groups actually exist, and the spam content is being hosted by Google.

There is no "report abuse" link, so I have no idea how to tell the Google guys about this.

Also, I wonder how the spammers make it past the CAPTCHA?


Implementing a blog

May 13, 2008 at 13:10 #

It's quite funny. Implementing a basic blog is basically the "hello world" of web application frameworks. You simply shove titles, body text and some timestamps into a database and retrieve it from there, again. Next lesson is 1-to-many mapping with comments.

The funny thing is that implementing a blog that way is not really implementing a blog. Implementing a blog is writing a tiny handler that shoves stuff into the database and out of it, and then spending ridiculous amounts of time filtering comments for spam. Which is one of the really difficult things to do on the internet - validating users as real users, without being inaccessible.


XQuery pretty printer and grammar open source

May 7, 2008 at 10:11 #

I just released my XQuery pretty printer, and with that the XQuery grammar, as open source, Apache 2.0 license. I've created a Google Code project for that: xqpretty.

I hope somebody finds this useful. I have a hunch that at least one company will take a look at it :-)


More XQuery pretty printing

April 23, 2008 at 17:57 #

I've upgraded the XQuery pretty printer once again. It's actually surprisingly difficult to get whitespace handling and indentation only close to 'right', at least in a language that is syntactically as complex as XQuery.

This version should insert whitespace at the correct places, properly handle long lines in many more cases (before, you'd get long runs of empty lines in some cases), and remove whitespace in other cases.

I've also added some options for the curious. You can now display regular HTML, pure HTML with no other tags around it, plain text, indented, the parse tree before formatting, and the HTML before it is run through the indenter. The options are available through the XQuery formatter form, too.

Of course you can get all the other formatting modes without HTML decoration around it, too. And as you can see you can easily construct URLs for the formatter. So the only thing left is to wire this up to my JavaScript syntax highlighting in this blog, and I'll have nicely formatted XQuerys!


JSPs suck, episode 217

April 22, 2008 at 13:41 #

Turns out that the JSP parser doesn't even try to be an HTML parser (I figured it's not an XML parser long time ago...). If you comment out JSP code like this:

<!--c:if> ... bla ... </c:if-->

You'll get this:

org.apache.jasper.JasperException: /WEB-INF/jsp/task/showTask.jsp(109,4) The end tag "&lt;/c:if-" is unbalanced

I truly have to find myself another Java templating engine/language. JSPs come out of the box, but they are simply to annoying in the long run. I guess the effort to integrate something - anything! - else would already have payed off.


Wiring up Jersey JAX-RS & Hibernate

April 22, 2008 at 11:42 #

Recently I started playing with JAX-RS, and in particular Jersey's implementation thereof. JAX-RS is specified in JSR311. The goal is to create a common annotation-based library to ease the development of RESTful Java web applications.

So far, this is really nice. The basic idea is that you annotate your Java classes that represent resources with URI templates through the @Path annotation, and the framework takes care of mapping requests, parameters and so on. At the moment, this approach seems a lot more natural to me than other stuff I tried (Spring MVC for example).

Of course, some assembly is required ;-) My goal was to wire up Jersey's resource classes with Hibernate (JPA flavor), so that resources can directly map to persistent classes. This should allow for a pretty natural development process - decide on your resources, annotate them for persistence, and publish them on the web through Jersey annotations.

I had to jump through three hoops to get this working.

Opening a transaction for web requests

This was fairly simple. If you resources are persistent classes, you'll need a database session to load and manipulate them. I created a class ("Persistence") that stores Hibernate Sessions in a ThreadLocal variable, so that resources can access it.

The only trouble was that Jersey's default ServletContainer class has it's service() method marked final, so I couldn't override it to open a transaction before the request and close it afterwards. After some playing with AspectJ, I figured that it's easier to simply create my own HttpServlet that only wraps Jersey's servlet. All methods forward to Jersey's ServletContainer, but the service() method looks like this:

  @Override
  public void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException,
      IOException {
    System.out.println("Start transaction");
    String method = req.getMethod();
    boolean readonly = "GET".equals(method) || "HEAD".equals(method);
    try {
      persistence.open(readonly);
      container.service(req, resp);
      if (readonly) {
        System.out.println("Read only - rollback");
        persistence.rollback();
      } else {
        System.out.println("Transaction commit");
        persistence.commit();
      }
    } catch (RuntimeException e) {
      System.err.println("Transaction rollback");
      persistence.rollback();
      throw e;
    }
  }

Paul Sandoz of Jersey changed service() to be non-final, so in the future you should be able to simply inherit from ServletContainer for this.

Loading persistent resources from Hibernate

By default, Jersey simply instantiates resource classes (those with a @Path annotation) using the default constructor. Of course this doesn't work when your classes are supposed to be loaded from a persistent store.

There are two possibilities here. First is to always access persistent classes through a container class that represents the whole collection, and that isn't persistent itself:

Persistent sub-resources

@Path("books")
class Books {
  @Path("{id}")
  Book getBook(@PathParam("id") long id) { ... }
}

Pro: fairly simple. We'll see how to properly inject the Hibernate session into resource classes in the next session.

Con: your paths are slightly less cool. You'll have /books/12 instead of /book/12, and also you'll be in trouble with implicit views where /books/new is supposed to map to Books, but then use some new.jsp to render an HTML input form. Also, if you have many sub resources it might be a bit ugly to force all requests through some initial container class, instead of only manipulating the nested resource.

Hooking into resource creation

The second method is to hook into Jersey's way of instantiating resources. Instead of using the plain SerlvetContainer from Jersey, you can override it and tell it to use your own ComponentProvider, which is basically your own IoC container.

I created a HibernateComponentProvider like this. Persistence is our persistence class from above:

public class HibernateComponentProvider implements ComponentProvider {

  private final Persistence persistence;

  public HibernateComponentProvider(Persistence persistence) {
    this.persistence = persistence;
  }

  public <T> T getInjectableInstance(T instance) {
    return instance;
  }

  public <T> T getInstance(Scope scope, Class<T> c) throws InstantiationException,
      IllegalAccessException {
    T o = c.newInstance();
    inject(o);
    return o;
  }

  @SuppressWarnings("unchecked")
  public <T> T getInstance(Scope scope, Constructor<T> constructor, Object[] parameters)
      throws InstantiationException, IllegalArgumentException, IllegalAccessException,
      InvocationTargetException {
    Class<T> clazz = constructor.getDeclaringClass();
    Entity entity = clazz.getAnnotation(Entity.class);
    if (entity != null && parameters.length == 1) {
      // assume find by id
      T instance = (T) persistence.getSession().get(clazz, (Serializable) parameters[0]);
      if (instance == null) {
        throw new WebApplicationException(Response.Status.NOT_FOUND.getStatusCode());
      }
      inject(instance);
      return instance;
    } else {
      T instance = constructor.newInstance(parameters);
      inject(instance);
      return instance;
    }
  }

  public void inject(Object instance) {
    // no-op for us
  }

}

This is a bit hackish. If a resource has a visible default constructor, getInstance(Scope, Class) will be called. If we instead have a public constructor that takes a single long parameter, getInstance(Scope, Constructor, parameters) will be called. At that point, we can check if the class is a persistent class (marked by the Entity annotation), and if so, load it from Hibernate instead of instantiating it the normal way.

The trouble is that I don't quite understand how Jersey picks the proper constructor yet, so this might break. Also, it kind of sucks that we have to provide a constructor that is never going to be used, just for Jersey to give us access to the parameters. But maybe this will get better in the future?

Anyways, you can now write your persistent class like this:

@Path("book/{id}")
@Entity
class Book {
  public Book(@PathParam("id") long id) { /* no-op, or whatever */ }
}

And you'll get the properly loaded persistent instance automagically.

Pro: nice URLs, more natural development, somehow seems right to me.

Con: some more work, resource classes might get injected twice, no idea if this will break due to the constructor picking stuff. Also, you may not provide another public constructor, which is kind of against the spirit of non-invasive annotation based frameworks. Hopefully we'll hear more about this.

Injecting persistent resource classes

One more thing we have to do is inject our persistent classes after load with the fields managed by Jersey. The problem is that if we simply load a class from Hibernate, e.g. as the result of a query, it will bypass Jersey's injection, and injected properties such as UriInfo or HttpContext will not be set.

Hibernate provides hooks into the object life cycle via the org.hibernate.Interceptor interface. I've changed my Persistence class to extend org.hibernate.EmptyInterceptor, and override onLoad() like this:

@Override
public boolean onLoad(Object entity, Serializable id, Object[] state, String[] propertyNames,
    Type[] types) {
  provider.inject(entity);
  return super.onLoad(entity, id, state, propertyNames, types);
}

Provider is a ComponentProvider we obtained from the WebApplication (see next section), and through the inject() call the loaded entity will get it's UriInfo and friends.

Combining this with the above HibernateComponentProvider will mean that in some cases (directly loaded root resources) classes will get injected twice, but that shouldn't be a problem. Well. I hope :-)

Wiring it all up

The only thing left is to wire all this stuff together. As I said, we have our own servlet that simple wraps a ServletContainer from Jersey. But we'll have to extend this ServletContainer to override it's configuration methods to supply our own component providers etc. Enter HibernatingServletContainer, a nested class of our custom Servlet:

  private final class HibernatingServletContainer extends ServletContainer {
    @Override
    protected void configure(ServletConfig sc, ResourceConfig rc, final WebApplication wa) {
      super.configure(sc, rc, wa);

      wa.addInjectable(Session.class, new Injectable<Context, Session>() {
        @Override
        public Class<Context> getAnnotationClass() {
          return Context.class;
        }

        @Override
        public Session getInjectableValue(Context a) {
          return persistence.getSession();
        }
      });

      wa.addInjectable(ComponentProvider.class, new Injectable<Context, ComponentProvider>() {
        @Override
        public Class<Context> getAnnotationClass() {
          return Context.class;
        }

        @Override
        public ComponentProvider getInjectableValue(Context a) {
          return provider;
        }
      });
    }

    @Override
    protected void initiate(ResourceConfig rc, WebApplication wa) {
      persistence = new Persistence();
      wa.initiate(rc, new HibernateComponentProvider(persistence));
      provider = wa.getComponentProvider();
      persistence.setComponentProvider(provider);
    }
  }

This basically creates our persistence instance (which loads the Hibernate configuration and provides transactions), registers our own HibernateComponentProvider with the web application and retrieves the ComponentProvider from Jersey's WebApplication (which wraps our own provider) so we can use it later.

First, configure() is called. This method delegates to it's parent, but then adds two additional injectable classes: the Hibernate Session, retrieved from persistence, and the ComponentProvider itself, which we'll need later.

Then initiate is called, which sets up our Persistence layer and our ComponentProvider. All happy and fine, now we can start to write our actual application code :-)

Using it

Using this code you will have to remember two things: classes that need Hibernate sessions, e.g. for queries or to persist instances, will need to get them injected like this:

@Path("books")
class Books {
  @Context Session session;
}  

And when you need to instantiate sub-resources, for example when a resource is created by a POST request, you'll need to create them through the ComponentProvider. This way, their own @Context annotated fields will be set up properly:

@Path("books")
class Books {
  @Context Session session;
  @Context ComponentProvider provider;
  
  @POST
  @ConsumeMime("...")
  public Book createBook(...) {
    Book book = provider.getInstance(Scope.ApplicationDefined, Book.class)
    book.setFields(...);
    session.persist(book);
    return book;
  }
}

It might be easier to create resources from their representations through a BookProvider or similar, but I'll have to look into that another time.

I hope this will help people to get started with Jersey and Hibernate. I'm still researching this stuff, so maybe I'll come up with a better solution later. I'm also not yet sure if the whole thing - directly accessing persistent classes as resources - turns out to be a good idea, but I'm quite confident.