Monday, March 14, 2011

Implement toString with Xtext's Serializer

Xtext uses EMF to generate the model API of the abstract syntax tree (or graph) of a DSL. For all implementation classes, the toString() method is generated. For a simple model element, this default implementation returns a string looking similar to this:

my.dsl.impl.SomeElement@67cee792 (attr1: SomeValue)

Well, this is not too bad. However this looks completely different to my DSL syntax, which may look like this:

SomeValue { 
    The content;

Especially for debugging and logging, I prefer that DSL like output. Since Xtext does not only generates a parser for reading such a text, but also a seralizer for creating the text from a model, I was wondering if that mechanism could be used for the toString method as well. (Actually, Henrik Lindberg pointed out the serializer class -- thank you, Henrik!)

In the following, I describe how to do that. Actually, this is a little bit tricky, and it will cover several aspects of Xtext and the generation process:
  • use the generated serializer for formatting a single model element
  • tweak the generation process in order to add a new method
  • define the body of the newly added method

We will do that by adding a post processor Xtend file, which adds a new operation to the DSL model elements. The body of the operation is then added using ecore annotations. But first, we will write a static helper class implementing the toString method using the serializer.

Use the serializer

Xtext provides a serializer class, which is usually used for writing a model to an Xtext resource. The Serializer class (in org.eclipse.xtext.parsetree.reconstr) provides a method serialize(EObject obj), which returns a String---this is exactly what we need. This class requires a parse tree constructor, a formatter and a concrete syntax validator. Thanks to google Guice, we do not have to bother about these things. Xtext generates everything required to create a nicley configured serializer for us. What we need is the Guice injector for creating the serializer:

Injector injector = Guice.createInjector(new  my.dsl.MyDslRuntimeModule());
Serializer serializer = injector.getInstance(Serializer.class);

Now we could simply call the serialize method for a model element (which is to be an element of the DSL):
String s = serializer.serialize(eobj);

Since this may throws an exception (when the eobj cannot be successfully serialized, e.g., due to missing values), we encapsulate this call in a try-catch block. Also, we create a helper class, providing a static method. We also use a static instance of the serializer.
Since this helper class is only to be used by the toString methods in our generated implementation, we put it into the same package.

package my.dsl.impl;

import org.eclipse.emf.ecore.EObject;
import org.eclipse.xtext.parsetree.reconstr.Serializer;

public class ToString {
 private static Serializer SERIALIZER = null;

 private static Serializer getSerializer() {
  if (SERIALIZER == null) { // lazy creation
   SERIALIZER = Guice.createInjector(new my.dsl.MyDslRuntimeModule())
  return SERIALIZER;

 public static String valueOf(EObject eobj) {
  if (eobj==null) {
   return "null";
  try {
   return getSerializer().serialize(eobj);
  } catch (Exception ex) { // fall back:
   return eobj.getClass().getSimpleName()+'@'+eobj.hashCode();


Post processing

Now we have to implement the toString() method of our model classes accordingly. That is, instead of the default EMF toString method, we want to call our static helper method for producing the String.

A generic solution, which can not only be applied for adding the toString method but for all kind of operations, is to use a post processor extension (written in Xtend) to add new operations to the generated ecore model. The overall mechanism is described in the Xtext documentation. We have to write an Xtend extension matching a specific naming convention: <name of DSL>PostProcessor.ext. In our exampel that would be MyDslPostProcessor.

The easy thing is to add a new operation to each classifier:
import ecore;
import xtext;

process(GeneratedMetamodel this) :

create EOperation addToStringOperation(EClassifier c):
    ... define operation ... ->

For defining the operation, we need:
  • the return type of the operation
  • the body of the operation

The return type is an EString (which will result in a simple Java String). In EMF, we have to set the type via EOperation.setEType(EClassifier). That is, we need the classifier of EString. With Java, this would be no problem: EcorePackage.eINSTANCE.getEString().
Unfortunately, we cannot directly access static fields from Xtend. At least, I do not know how that works. Fortunately, we can substitute EcorePackage.eINSTANCE with calling a static method of EcorePackageImpl. This static method can then be defined as a JAVA extension in Xtend:

EPackage ecorePackage(): 
 JAVA org.eclipse.emf.ecore.impl.EcorePackageImpl.init();

Note that we return an EPackage instead of the EcorePackage. I assume this is necesssary because we use the EMF metamodel contributor and EcorePackage is not available then. We can now set the EString classifier as return type of the operation: setEType(ecorePackage().getEClassifier("EString"))

Now, we need the body of the operation. Ecore does not directly support the definition of a body, that is there is no field in EOperation for setting the body. Fortunately, we can exploit annotations for defining the body. The default EMF generator templates look for annotations marked with the source value "". The key of the annotation must be "body", and the value of the annotation is then used as the body of the operation. In the body, we simply call our static helper method for producing the DSL-like string representation.

The complete post processor extensions looks as follows:
import ecore;
import xtext;

process(GeneratedMetamodel this) :

EPackage ecorePackage(): 
 JAVA org.eclipse.emf.ecore.impl.EcorePackageImpl.init();

create EOperation addToStringOperation(EClassifier c):
 setName("toString") ->
 setEType(ecorePackage().getEClassifier("EString")) ->
  'if (eIsProxy()) return super.toString(); return ToString.valueOf(this);')) ->

create EAnnotation addBodyAnnotation(EOperation op, String strBody):
 setSource("") ->
 createBody(strBody) ->
create EStringToStringMapEntry createBody(EAnnotation annotation, String strBody): 
 setValue(strBody) ->

If you (re-) run the GenerateMyDSL workflow, the EMF toString() implementations are replaced by our new version. You can test it in a simple stand alone application (do not forget to call doSetup in order to configure the injector):

public static void main(String[] args) {
 MyElement = MyDslFactory.eINSTANCE.createElement();

Closing remarks

You probably do not want to really replace all toString methods with the serializer output, as this would create rather long output in case of container elements. In that case, you can add the new operation only to selected classifiers, or use the (generated) Switch-class to further customize the output.

Although the solutions looks straight forward, it took me some time to solve some hidden problems and get around others:
  1. How to create the serializer using the injector -- and how to create the injector in the first place
  2. How to access a static Java method from Xtend without too much overhead. Would be great if static fields could be accessed from Xtend directly.
  3. How to use the post processor with the JavaBeans metamodel contributor. If I switch to the JavaBeans metamodel, my extension didn't get called anymore.
  4. I'm still wondering where "EStringToStringMapEntry" is defined. I "copied" that piece of code from a snippet I wrote a couple of months ago, and I have forgotten how I found that solution in the first place.
  5. Sorry, but I have to say it: The Xtend version 1.0.1 editor is crap (e.g., error markers of solved problems do not always get removed). But I've heard there should be a better one available in version 2 ;-)

Friday, March 4, 2011

Traverse DAGs with Xtend

Like OCL, Xtend (a sublanguage as part of the Xpand project for model queries) provides some really powerful collection operations. These operations allow to easily retrieve elements from arbitrary models, i.e. graphs. Searching a single element in a graph is often very simple with these operation. However, when a collection is to be returned (which is not stored in some attribute), this might be a little bit more complicated, especially if the order matters. In the following, I have assembled some examples showing how to traverse a directed acyclic graph (DAG) with some common traversal strategies:

1) unsorted search traversing directed connections
2) depth-first search traversing directed connections
3) breadth-first search traversing directed connections
4) traverse a directed connection in counter-direction
5) unsorted search, traversing directed connections in counter-direction
6) depth-first search traversing directed connections in counter-direction
7) breadth-first search traversing directed connections in counter-direction

Actually, traversing a connection in counter-direction is not really possible. What I mean by that is, that the query is to return the element of the non-navigable end of the connection. E.g., for a given directed connection x->y, x is to be returned for a given y.

Example: a type hierarchy.

For the examples, I use a "real" world example: a type hierarchy. Our (very simple) model looks like that (in pseudo Java code):
Type {
    String name;
    Collection<Type> super;
Let's also assume a container in which all types are stored, e.g.
Collection<Type> allTypes;
In OCL, you can even retrieve all types by a simple query, however we often have some type of container defined in our model anyway (and we use Xtend here ;-) ).
For testing the code, I have defined a concrete example. The following type hierarchy visualizes some type instances, the super types attributes are drawn as connections: That is, we have a base type A. B, C, and D are direct subtypes of A. E is a subtype of B. F is a subtype of B and C (we allow multi inheritance ;-) ), and so on.

Preliminary remarks:

  • I haven't looked into any algorithm book to find the best algorithm (what ever "best" means). The solutions below are simply the one I implemented when I needed it, without much thinking about the algorithm. If you know a better solution, please let me know!
  • I'm using create extensions for simplicity and performance reasons here. If you have small (or mid sized) models, I'd assume this would be ok.
  • Although Xtend is quite similar to OCL, the algorithms will probably not work with OCL, as OCL is side effect free (and I modify collections in the algorithms, which will not work that way with OCL).

Super type queries (or: traverse directed connection)

Since we store the super types in the model, the easiest query is to retrieve the direct super types of a type. E.g., A.super returns an empty list; F.super returns B, C. Now, lets calculate the transitive closure of super types. This is very easy with Xtend:

1) Transitive closure of super types, unsorted:

create Set[Type] superTypesTransitive(Type type):
 this.addAll(type.supers) ->

We use a set in order to avoid duplicates, which would be added in case of multi-inheritance. This solution is straight forward. If you are not used to OCL syntax, you may be a little bit confused by the implicit syntax for the collect operation: type.supers.superTypesTransitive() returns superTypesTransitive() for all type.supers elements.
J.superTypesTransitive() will return F,B,C,A, just as expected. (J.superTypesTransitive() is just more OO-like way for writing superTypesTransitive(J). Frankly, I don't know if this is better readable, but it definitely looks cooler ;-)). The returned collection is not ordered, and often enough this is sufficient. However, sometimes we need the collection to be ordered. There are two often used strategies for traversing a tree or DAG: depth first search and breadth first search. We will implement both.

2) Transitive closure of super types with depth first search order:

create List[Type] superTypesTransitiveDFS(Type type):

As we need a sorted collection, we have to use a list instead of a set. However, we have to reject duplicates in our code now. Since Xtend does not provide a loop statement, I have used the collection operation forAll here. Since the forAll operation expects a boolean expression inside, the !=null part "casts" our chained expression into a boolean expressions. If you know of a nicer solution, please let me know.
J.superTypesTransitiveDFS() will return F,B,A,C.

3) Transitive closure of super types with breadth first search order:

create List[Type] superTypesTransitiveBFS(Type type):
 let todo = new java::util::ArrayList :
  todo.addAll(type.supers) -> 
  bfsSuper(this, todo);
private Void bfsSuper(List[Type] result, List[Type] todo):
 if todo.isEmpty then
  result.add(todo.first()) ->
  todo.addAll(todo.first().supers.reject(e|todo.contains(e) || result.contains(e))) -> 
  todo.remove(todo.first()) ->
  bfsSuper(result, todo);  

The breadth first search is a little bit more complicated, as we need a helper list, and a helper method. Note that we cannot create an Xtend type of List here, instead we have to use the Java ArrayList, which is available when we use the JavaBeans Metamodel in the project's Xpand/Xtend settings.
J.superTypesTransitiveBFS() will return F,B,C,A.

Sub type queries (or: traverse connections in counter direction)

So, we have three different queries for super types. Now we want to write the very same queries for sub types. Unfortunately, the sub type information is not directly stored in the model but must be derived instead. First, we write a simple query for the direct sub types:

4) Computes direct sub types (navigate in counter-direction):

create Set[Type] subTypes(Type type, Collection[Type] allTypes):

This solution relies on the afore described queries. Frankly, I needed some time to figure it out (as I'm more an imperative and OO guy ;-) ) and I was really surprised how short (one line) it is. If you have to write that with Java, you will need a lot more lines. So, if you ever wondered why to use a special transformation language -- this is at least one argument.
Now, let's compute the transitive closure:

5) Transitive closure of sub types, unsorted:

create Set[Type] subTypesTransitive(Type type, Collection[Type] allTypes):

Note that this query does not rely on the subTypes extension, but only on the super type queries. Since we use a Set, we do not have to take care of duplicates. Again: A Java solution would be much longer.
A.superTypesTransitive(allTypes) will return B,C,E,G,H,I,J,D,F
Now, let's implement the depth first search for sub types:

6) Transitive closure of sub types with depth first search order:

create List[Type] subTypesTransitiveDFS(Type type, Collection[Type] allTypes):

Since we have implemented an extension for subTypes, the solution is quite similar to the super kind depth first search algorithm.
A.subTypesTransitiveDFS(allTypes) will return B,E,F,I,J,C,D,G,H. Note that we cannot simply use the unsorted solution with a list instead of a set, as this will not result in a depth first search order (in our case, it would return something like B,C,D,E,F,I,J,G,H). The same is true for superTypesTransitiveDFS, by the way. Last but not least the breadth first search for sub types.

7) Transitive closure of sub types with breadth first search order:

create List[Type] subTypesTransitiveBFS(Type type, Collection[Type] allTypes):
 let todo = new java::util::ArrayList :
  todo.addAll(type.subTypes(allTypes)) -> 
  bfsSub(this, todo, allTypes);
private Void bfsSub(List[Type] result, List[Type] todo, Collection[Type] allTypes):
 if todo.isEmpty then
  result.add(todo.first()) ->
        reject(e|todo.contains(e) || result.contains(e))) -> 
  todo.remove(todo.first()) ->
  bfsSub(result, todo, allTypes); 

This algorithm also resembles the one for super types. By the way: I didn't find the let expression explained in the documentation of Xtend (however, some examples use it). Did I miss it or is there really more OCL in Xtend as told in the docs?
A.subTypesTransitiveBFS(allKinds) will return B,C,D,E,F,G,H,I,J, which is easily validated as this is the order of the types as shown in the little figure.