October 2009 Archives

OOPSLA Thursday

DSL -- A Programmer's View

This was a survey of different techniques for implementing domain specific languages (DSLs) given by an academic. I thought it was a good blend of principles and practice, summarizing the advantages and disadvantages of different approaches.

Of the tools discussed, I had only seen ANTLR before. There was also a tool called SableCC that really seemed to offer little advantage over ANTLR. The main difference I saw was that you could embed little snippets of your target language in ANTLR but not SableCC.

The core examples were done in Haskell and, although I'm barely familiar with the language, I think it works really well in this kind of scenario because its syntax is clean and therefore does not distract one from the points being made. Both the embedded parser (Parsec) and embedded language worked well.

My recollection of the major points:

  • compilers offer optimization and high performance (if you put enough effort into them) but are very inflexible and the source tends to get "lost" in compilation, leading to problems understanding the resulting system, particularly if it needs to be debugged
  • interpreters are quite flexible and can be made to generate high quality error messages customized for the domain; performance is often poor
  • DSLs implemented with either compilers or interpreters are hard to compose, hard to decompose; on the other hand you can easily add new semantics for the language just by writing a new compiler or interpreter for it
  • for the presenter, a very interesting option is "polymorphic embedding"; unfortunately he did not have time to present this, so I'll have to read the slides when I get them

That's it for OOPSLA for me this year. Next year it's SPLASH in Reno. I haven't yet decided if I want to go again or try something else instead.

Random Observations

The Jeopardy-style quiz was interesting. None of the three teams could identify a design pattern projected as a diagram on the big screen. Even the audience, which included big names from the pattern community, had problems. It looked a bit like the Proxy pattern, a bit like Strategy. It turned out to be the Bridge, which I think is a good choice for a quiz. It is not one that ever struck me as one I'll be eager to apply. Perhaps it's a good interview question if you want to deliberately make a candidate uncomfortable.

I finally did get to talk to some people informally on Wednesday, but for an outsider, the "hallway track" seems pretty mythical. I had the impression there were cliques and tribes and I felt very much the outsider. Perhaps this is an argument for going to the same conference multiple times, though I could see the danger of it getting incestuous after a while. And OOPSLA has been going for decades now...

OOPSLA Tuesday / Wednesday

More tutorial notes

Realtime Programming on the Java Platform

This was an information-dense walkthrough of the realtime spec and how to use realtime implementations. The realtime spec makes no changes to the language as such, however it does require a special version of the runtime and the underlying OS and hardware must also support realtime. Typically, an application will get roughly a 30% reduction in throughput when it is ported to realtime.

For real-world applications, it's infeasible to provide cast-iron real-time guarantees. However, for applications that justify it, a system can be written (with a lot of effort) that meets realtime goals (which will typically firm bounds on things like response times) with high probability.

A lot of the effort goes into ensuring that non-realtime threads and garbage collection do not interfere with realtime threads. At the highest level, you can define realtime threads that are not allowed to read from the Java heap at all (any attempt causes a runtime exception). These threads are only allowed to use specially defined storage that is allocated before they run and is effectively immortal (and immune from garbage collection). The API for defining special memory contexts as specified is more over-elaborate.

The Sun realtime Java product offers realtime garbage collection. This works by running GC as a realtime thread and ensuring that the GC operation never needs to move objects. Thus the GC can be pre-empted at any time by a higher-priority thread with no corruption of heap state.

Of note: apparently financial companies think they care about "soft" realtime until they understand they need to pay a penalty in throughput. When they do, they nearly always decide they'll accept rare high latency in exchange for constantly high throughput.

Erlang / OTP

Really, OTP is an important part of the Erlang story. Despite the name, OTP has little to do with telephony and everything to do with abstracting common usage scenarios. There's a common pattern whereby the OTP module (e.g. genserver, genfsm) controls the lifecycle, and the application code is implemented in callback functions that are called appropriately. Thus, OTP can take care of setup and (particularly important) error handling. The OTP philosophy for handling errors is to kill the offending process. A supervisor process is then responsible for handling the crash, often by restarting the process (in a known good state). Application design is largely about designing trees of supervisor and so-called worker processes.

Combinatorial Testing

This was about the problem of choosing combinations of input parameters (considered abstractly, so they could include things like machine state, OS version, etc.) Full coverage of all combinations is infeasible, but you want to cover some combinations. Covering all combinations of all pairs turns out to be good enough in practice much of the time. Surprisingly, choosing even these is not a solved problem. Various tools exist that come up with reasonable but often varying results. The presenter's favourites were PICT and (as honorable mention) Jenny. For all the tools, the results they generate are pretty basic and some effort is required to use them to generate actual test data.

OOPSLA Tuesday Morning

It's an interesting being "industry" at a somewhat academic conference (although I think this is less academic than most). Less than quarter of attendees are from industry and they seem to be a very diverse bunch. Students and faculty are more cohesive groups, so I think they're easier to cater to.

The Keynote

Barbara Liskov got well deserved applause for her reprise of her Turing award acceptance speech, talking about her search for ways to raise the level of abstraction in order to enhance the expressiveness of programs. She put herself clearly in the Dijkstra camp, so she believes that ease of reading source code (especially other people's code) is more important than ease of writing it and she believes that we want to make it easier for people to reason about the behaviour and correctness of the code.

She talked about some of the papers that really influenced her research, starting (if I recall correctly) with Dijkstra's famous "Goto Considered Harmful" that was published as a letter to CACM (back when it was a serious research journal). It was interesting to see how there were nuggets of insight and how much of a struggle it was for researchers to come up with ideas we take for granted today, such as how to do data abstraction with encapsulation.

The message I heard was that there are still many areas where the level of abstraction we work with is still too low.

Onward! Papers

These lived up to my expectations -- vaguely plausible to borderline crazy.

Jonathan Edwards' work on Coherence (a language where statements inside a "co-action" all run simultaneously from the point of view of the programmer) looks brilliant. We'd all like to be rid of ordering-related bugs in our code. Unfortunately there's no implementation -- he says it's more important to get the definition right first. The idea of a "co-action" works perfectly in the rigged example but how does it work for something that is more realistic?

"Traditional assignment considered harmful" seemed mostly provocative, as it seemed to be saying a "swap values" operator would be easier to work with than a traditional assignment operator. A member of the audience trenchantly argued that the thing on the right of the assignment operator is an expression, not just a simple value (and it seems to me that blows a huge hole in the argument).

The "Silhouette" talk wins top marks for eccentric presentation style. The professor made the slides in the presentation deck from one to the next on behalf of the grad student presenter, but the presenter was in a film (possibly Quicktime?) inside the slides. It worked somewhat. The idea is to explore the concept of using shapes and the way they nest and overlap to explore program design in a highly visual way. It's extremely fuzzy at the moment, so it's hard to judge if it might be worthwhile. Every other visual programming metaphor has pretty much failed, but this one at least looks a little different.

Finally, there was something called "PI", where PI is of course the Greek letter. This is a kind of a meta programming language, although the presenter insisted on calling it a pattern language. It was built to scratch in itch (always a good sign) that the presenter ran into while trying to build languages for domain experts. It seems to be in the Lisp/Scheme family in the sense that you type expressions at a prompt and some expressions cause changes to the runtime. However, it goes much further in allowing new syntax to be added just by adding definitions to be added to the grammar that PI understands. It looks like a lot of existing ideas packaged slightly differently, but from what I can tell there is an implementation.

OOPSLA Monday

Some more notes on the tutorials I attended

Smalltalk

I was a bit disappointed in this tutorial. It never got much beyond installing an image and getting used to the syntax (which is admittedly a bit different from the curly brace syntax I grew up with). The slide deck was the old-fashioned kind where every point has a bullet point. I felt many of them could have been skimmed over with little loss. I was particularly unhappy about getting several slides at the start about why I might be interested in learning about Smalltalk. Isn't flying to Florida and paying to attend the tutorial enough evidence to show that I am already interested?

The image used was Pharo (apparently a fork of the Squeak project) and the meatier part of the tutorial was spent exploring the Class Browser, Test Runner, inspectors and the Monticello Browser.

The presenter, James Foster, was obviously very knowledgeable about Smalltalk but a little fuzzy on Java and other curly brace languages he was comparing it with. I mentioned that I had some exposure to Ruby and he responded with a money quote from Ward Cunningham that "I always expected Smalltalk to come back, I just didn't expect it to be called Ruby". Well, yes.

Parameterized Unit Testing

This should have been called "demo of the PEX tool". At least two other people who attended were annoyed because the tool could only be used with C# in Windows. Even though I do a lot of C# on Windows these days I was somewhat concerned at licensing terms (pick academic non-commercial or temporary evaluation). Java barely got a mention: apparently Junit4 has some support for parameterized unit tests and Agitar has a product that can be used to generate values.

The tool itself is impressive. You provide a unit test that takes parameters for the input values. It generates initial values (generally, the simnplest possible) and uses profiler hooks to instrument the bytecode at runtime to provide an extremely detailed trace of actual execution. The tool sees what paths are available at each branch and which are taken. It then uses a constraint solver to decide what values to try to make the execution take other branches. (The constraint solver comes from research into theorem proving and it understands restricted domains like integer arithmetic and simple program constructs such as tuples and arrays.) Within seconds it generates thousands of runs with different test data and keeps only those that caused a new execution path to be taken.

The amount of work required to get this functional is impressive. For example, they built into the tool a formal understanding of the semantics of every single byte code in the CLR (including what exceptions it could throw, including arithmetic overflow, which is possible if you configure Visual Studio the right way).

The question arises: if my unit test is going to get unknown (to me) input values how can I make meaningful assertions about its behaviour? In the tutorial, they presented some "patterns". In many cases, it boils down to identifying the group properties of your code: are there inverse operations, commutative operations, etc? If so, you can build a sequence of operations that should always have the same result, no matter what the input. Others relate to seeing that state invariants are preserved, for example if you insert something into a collection, you always expect to find it there afterwards.

This kind of tool looks promising as a way to alleviate the tedious task of coming up with plausible inputs. It does come at a cost to maintainability of unit tests. It could be very useful in generating regression tests.

It's just a shame that this is Microsoft only and, at that, not available for production use unless Microsoft decides to include it in their Visual Studio offering at some time in the future.

OOPSLA Day One

My notes from the tutorials I attended on Sunday.

Dependency Injection in Dynamic Languages

Miško Hevery and an uncredited cohort named (I think) Coery, who seemed to be the Javascript expert.

DI can make large programs in dynamic languages more tractable. The same argument holds as for static languages: "wiring/construction" and business logic are separate concerns. By separating them you make the code easier to read and modify. Sometimes, when people say dynamic languages don't scale to large programs, they really mean the way they're written.

The example program was tic-tac-toe in Javascript. Even this small application had a main method that was starting to get out of control and consisted mainly of wiring. The DI version reduced main to essentially one line.

It turns out that DI is not all that hard to do in a dynamic language.

  • You choose what you're going to use as your "lookup symbol". This could be a string or it could be more sophisticated, approximating the type information you would get in a statically typed language, such as a Java interface.

  • You implement a chain of "providers", each of which knows about one kind of dependency resolution and delegates to the next if it doesn't know how to handle the request. At the end of the chain is a dummy provider that just errors out.

If you're building your own DI, can build custom providers specific to your domain. In Javascript, for example, it makes sense to wire listeners, so that the boilerplate code that usually is required falls away.

In the tutorial, they wrote a simple, short DI implementation in a couple of dozen lines of Javascript, test driven using the neat JsTestDriver framework.

Functional Programming in Object Oriented Code

Phil Goodwin gave this talk.

You can, with enough effort, use almost all features of functional programming in Java.

I found the ideas of how functional ideas could be incorporated into OO code thought provoking. OO and functional complement each other: one's weaknesses are generally the other's strengths and vice versa. Taking an idea from Bertrand Meyer, you can use commands to change things (OO) and use queries to find things out (functional).

Functional programming means reversing one common workflow in OO programming. In OO, you look for case statements and consider replacing them with polymorphism. In functional programming, you remove polymorphism in favour of explicit case statements. However, these case statements are much more agnostic about what types they work with.

(The example library is written in Java. A brave choice because, in many ways, Java is terrible for functional programming. Functions are not first class values (and it's hard to fake it). The type system is concerned about values, whereas functional programming languages are more concerned about the type of the functions. (Besides, the lack of type inferencing makes nesting of types painful and functional programming is all about nesting.)

The Google collections framework is a helpful starting point: you get the hooks with which you can use filter/map/reduce, etc..

Implementation: you can avoid blowing the stack using a trampoline to provide tail recursion removal and continuations.