Some Editor Tricks

Here's the scenario (fictional, but not too far removed from real life): I want to munge some data I have into code. Let's say I have this in my editor:

The Comedy of Errors
Much Ado About Nothing
Twelfth Night

and the result I want is this:

THE_COMEDY_OF_ERRORS = "The Comedy of Errors"
MUCH_ADO_ABOUT_NOTHING = "Much Ado About Nothing"
TWELFTH_NIGHT = "Twelfth Night"

My editor of choice is vim and I like to run in a Unix (these days that usually means Linux) environment. Given those tools, here's one way to do it.

In command mode, I type 3yy (yank three lines), move the cursor to the line where I want to put them and hit p (for put).

The Comedy of Errors
Much Ado About Nothing
Twelfth Night

The Comedy of Errors
Much Ado About Nothing
Twelfth Night

I use the bang command to pipe the first three lines through sed 's/ /_/g'. (To pipe three lines through a shell command, type 3!! in command mode, then type the command at the colon prompt at the bottom of the screen.) This sed command replaces all spaces with underscores, leaving:

The_Comedy_of_Errors
Much_Ado_About_Nothing
Twelfth_Night

The Comedy of Errors
Much Ado About Nothing
Twelfth Night

I pipe the first three lines through perl -pe 's/\w+/uc($&)/e'. This forces all letters to uppercase, leaving:

THE_COMEDY_OF_ERRORS
MUCH_ADO_ABOUT_NOTHING
TWELFTH_NIGHT

The Comedy of Errors
Much Ado About Nothing
Twelfth Night

I pipe the last three lines through awk '{ print " = \"" $0 "\""}'. This inserts an equals sign and surrounds the original lines with (simple) double quotes.

THE_COMEDY_OF_ERRORS
MUCH_ADO_ABOUT_NOTHING
TWELFTH_NIGHT

 = "The Comedy of Errors"
 = "Much Ado About Nothing"
 = "Twelfth Night"

At this point, it's a simple process to move the lines to their correct places using a combination of dd (delete line) and p (put line) and J (join lines) to get my desired result.

Things that really help when using this kind of approach:

  • an open shell window nearby to try one-liners on test data
  • trustworthy multiple levels of undo/redo
  • lots of practice with editor basics

Getting Back Online

Recently, we had a lightning strike that hit a tree in the garden. We also got an electricity spike that looks like it came in through the phone line. That spike took out the DSL modem, the router and the motherboard on my PC.

While recovering from that I made some mistakes that I'll summarize here. Maybe other people can avoid those mistakes or maybe I'll remember better next time.

Lesson One -- Buy a Surge Protector

...*and use it*. We had surge protectors on all the PC equipment. The DSL modem was in another room and there was nothing between it and the phone jack. Since the incident I've bought one of the power strips that includes the protector for the phone line. It's too late for this time of course, but it's early in the year yet and we will have many storms during the summer.

Keep the DSL Account Information

... in an accessible place. I was mostly okay here, but the password I had was wrong or out of date.

Avoid "Wizard" Setups

I bought a replacement modem in a store. This was in part because I've had problems with deliveries of items requiring a signature before where I've had to drive to some remote depot to pick up the item. I hooked it up, went through the setup "wizard" web interface and ran into an authentication problem because the password was wrong.

For the record, the modem was a D-Link DSL-2320B. After I ran through the "wizard" setup once, several of the original options were no longer available. It turned out that this was the key to my problems.

Again, for the record, a way to configure the modem for AT&T DSL that works is as follows:

  1. untick the "DSL Auto-connect" checkbox
  2. leave the VPI setting at 0
  3. leave the VCI setting at 35
  4. choose PPPoE if connecting directly to a PC; choose Bridging if connecting to a home router
  5. leave "Encapsulation Mode" as "LLC/SNAP BRIDGING"
  6. leave LAN settings at default or change them to taste if you know what you're doing

Use the Pinhole Reset Button If You Run Into Trouble

It's on the back of the modem and you press it with a pen or a pin for ten seconds to reset everything to factory defaults. It seems that this is what it took to get the modem to really forget the original, incorrect password and use the new, reset password I was entering through the web interface.

Beware of Premium Cost Services

... unless you enjoy paying for someone you their script over the phone. AT&T's initial phone support was actually pretty good -- I got through to a second level support person who knew what he was talking about and helpful, even though I was not using a supported modem. When I called again a couple of days later, I allowed myself to be browbeaten into signing up for the Tech Connect service, at well over $100. I figured I'd get even better support from a networking pro. Instead I was connected to a newbie who clearly didn't know anything beyond the script. After a long time going through the usual rigmarole of turning the modem off and on and checking the lights, the script eventually had me do the pinhole reset, which fixed the problem.

It's possible that, had I spent even more time on the phone and still not fixed the problem, I would have got to a second level of (paid) support. As it was, I was left kicking myself for not trying this myself.

The Internet is Your Friend

... even when the problem is connecting to the Internet. If I had made better use of other ways to connect, I could have saved myself some trouble. After I got my initial connection, I figured out how to get the home router up and running with hints from forum postings. (Basically, make the modem as stupid as it can be, just a bridge, and let the router take care of the PPPoE authentication.)

Phone support in general is slow and limited. The world wide web is, for DSL troubleshooting as for other things, vast and quick to search.

In Sum

Don't do like I did. First of all, protect yourself against lightning. Don't rely on your provider for support beyond the basics (connection information etc.).

Oh, and if you do run into trouble, post so others can learn from your experience.

Sinatra Hints

I've been caught by this for at least the second time: haml views and CSS files not being found in a brand new Sinatra app. The configuration is (to me) somewhat obscure, so I'm documenting the fix here.

To get the views and public directories working properly, be sure to configure app_file (set to nil by default in Sinatra::Base):

configure do
  set :app_file, __FILE__
end

To get static files working properly, configure static (set to false by default in Sinatra::Base):

configure do
  set :app_file, __FILE__
  set :static, true
end

Notes on Talking with Slides

I gave an internal company talk reporting on my trip to OOPSLA this year. Here are some notes on the experience.

Things I Would Do Again

Focus on some small, interesting topics

The conference lasted for five days (including the Sunday before the "official" start). I attended mostly tutorials but also a limited number of conference talks. Instead of trying to summarize everything I saw, I picked five topics that interested me. In the end I only presented four.

Avoid Wasting Time With Slideware

This time I created my slides using something called S5, so it was all one big HTML file. This meant I was editing using a tool I know well (my favourite text editor, vim). S5 has its quirks -- notably, it seems to behave differently in different browsers -- but the file format is relatively transparent and easy to modify.

Allow Plenty of Time for Preparation

I decided in advance that I was not going to rehash the material I had already seen. I had a CD with the slides of the presentations I had seen. Instead of copying those I researched the background and came up with my own take on the ideas. This meant many evenings and weekends googling for and reading references.

Break Up the Talk

I gave four mini talks, not directly related to one another. They varied in length and style. There was a good chance that most people found at least one part interesting.

Involve the Audience

I did steal two questions from a "Jeopardy"-style quiz that the OOPLSA organizers staged one evening. I started out the talk with these two questions. If nothing else, it gave me a chance to ensure that at least a few people were awake.

Put the Question on a Slide But Not the Answer

One effective way to motivate the audience is to tweak their curiosity. Putting the answers on the slides makes them complacent. If the answer isn't on a slide they have to listen to get the answer.

Move Material From Slides to Notes

There is a strong temptation to make the slides an outline of the entire talk. This has a powerful soporific effect on the audience. The slides should be a small part of the talk, only drawing attention from time to time.

Changes I Would Make If I Did It Again

Cut Ruthlessly

This means fewer slides and less material on each slide. I figured initially that a minute a slide ought to be about right on average. I didn't run out of time but I did rush the last part of the talk with one eye on the clock.

Avoid Abstraction and Generalities

People tend to understand specific, concrete examples better than more general, abstract statements. This seems to be even truer when they are listening to a talk. Even worse, the general and abstract seems to send people to sleep (not literally but their attention drifts).

Plan More Occasions to Engage the Audience

This would give me a chance to wake them up if necessary. It would also be a chance for me to see how much they are getting it.

Make Sure I Understand Everything I Plan to Say

Towards the end of my talk I came across a slide that included a point that didn't make sense to me when I read it in front of my audience. I stumbled and it was noticed and I'm sure it cost me some credibility. The emphasis is on everything. This means reviewing everything, especially the stuff that's written last thing at night. During the review, I want to make sure I agree with and can support with everything I plan to say.

Break Up Monotony in the Slides

I had very few images in my slides, mostly because I found the amount of work to create my own images burdensome and I found it hard to think of existing images that I could use that would be relevant. In future I'll look into something like Inkscape.

Ruby Notes

Mongo

In order to install the mongo_ext gem, I had to first ruby1.8-dev package.

Explicit Breakpoints

Nice to examine a failing rspec test: just insert require "rubygems"; require "ruby-debug"; debugger at the relevant point in the code. (From an rspec mailing list)

Custom Error Handling in Sinatra

Quick note for future reference:

Sinatra will ordinarily enable :raise_errors when :environment is :production. From my reading of the code, this causes Sinatra not to attempt to handle exceptions but to let them bubble up to the surrounding Rack container. In this situation, error-handling directives like

error do
    @error_msg = request.env['sinatra.error']
    haml :error_page
end

have no effect.

In order to fix the problem, just explicitly disable :raise_errors in the Sinatra application.

OOPSLA Thursday

| No Comments

DSL -- A Programmer's View

This was a survey of different techniques for implementing domain specific languages (DSLs) given by an academic. I thought it was a good blend of principles and practice, summarizing the advantages and disadvantages of different approaches.

Of the tools discussed, I had only seen ANTLR before. There was also a tool called SableCC that really seemed to offer little advantage over ANTLR. The main difference I saw was that you could embed little snippets of your target language in ANTLR but not SableCC.

The core examples were done in Haskell and, although I'm barely familiar with the language, I think it works really well in this kind of scenario because its syntax is clean and therefore does not distract one from the points being made. Both the embedded parser (Parsec) and embedded language worked well.

My recollection of the major points:

  • compilers offer optimization and high performance (if you put enough effort into them) but are very inflexible and the source tends to get "lost" in compilation, leading to problems understanding the resulting system, particularly if it needs to be debugged
  • interpreters are quite flexible and can be made to generate high quality error messages customized for the domain; performance is often poor
  • DSLs implemented with either compilers or interpreters are hard to compose, hard to decompose; on the other hand you can easily add new semantics for the language just by writing a new compiler or interpreter for it
  • for the presenter, a very interesting option is "polymorphic embedding"; unfortunately he did not have time to present this, so I'll have to read the slides when I get them

That's it for OOPSLA for me this year. Next year it's SPLASH in Reno. I haven't yet decided if I want to go again or try something else instead.

Random Observations

The Jeopardy-style quiz was interesting. None of the three teams could identify a design pattern projected as a diagram on the big screen. Even the audience, which included big names from the pattern community, had problems. It looked a bit like the Proxy pattern, a bit like Strategy. It turned out to be the Bridge, which I think is a good choice for a quiz. It is not one that ever struck me as one I'll be eager to apply. Perhaps it's a good interview question if you want to deliberately make a candidate uncomfortable.

I finally did get to talk to some people informally on Wednesday, but for an outsider, the "hallway track" seems pretty mythical. I had the impression there were cliques and tribes and I felt very much the outsider. Perhaps this is an argument for going to the same conference multiple times, though I could see the danger of it getting incestuous after a while. And OOPSLA has been going for decades now...

OOPSLA Tuesday / Wednesday

More tutorial notes

Realtime Programming on the Java Platform

This was an information-dense walkthrough of the realtime spec and how to use realtime implementations. The realtime spec makes no changes to the language as such, however it does require a special version of the runtime and the underlying OS and hardware must also support realtime. Typically, an application will get roughly a 30% reduction in throughput when it is ported to realtime.

For real-world applications, it's infeasible to provide cast-iron real-time guarantees. However, for applications that justify it, a system can be written (with a lot of effort) that meets realtime goals (which will typically firm bounds on things like response times) with high probability.

A lot of the effort goes into ensuring that non-realtime threads and garbage collection do not interfere with realtime threads. At the highest level, you can define realtime threads that are not allowed to read from the Java heap at all (any attempt causes a runtime exception). These threads are only allowed to use specially defined storage that is allocated before they run and is effectively immortal (and immune from garbage collection). The API for defining special memory contexts as specified is more over-elaborate.

The Sun realtime Java product offers realtime garbage collection. This works by running GC as a realtime thread and ensuring that the GC operation never needs to move objects. Thus the GC can be pre-empted at any time by a higher-priority thread with no corruption of heap state.

Of note: apparently financial companies think they care about "soft" realtime until they understand they need to pay a penalty in throughput. When they do, they nearly always decide they'll accept rare high latency in exchange for constantly high throughput.

Erlang / OTP

Really, OTP is an important part of the Erlang story. Despite the name, OTP has little to do with telephony and everything to do with abstracting common usage scenarios. There's a common pattern whereby the OTP module (e.g. genserver, genfsm) controls the lifecycle, and the application code is implemented in callback functions that are called appropriately. Thus, OTP can take care of setup and (particularly important) error handling. The OTP philosophy for handling errors is to kill the offending process. A supervisor process is then responsible for handling the crash, often by restarting the process (in a known good state). Application design is largely about designing trees of supervisor and so-called worker processes.

Combinatorial Testing

This was about the problem of choosing combinations of input parameters (considered abstractly, so they could include things like machine state, OS version, etc.) Full coverage of all combinations is infeasible, but you want to cover some combinations. Covering all combinations of all pairs turns out to be good enough in practice much of the time. Surprisingly, choosing even these is not a solved problem. Various tools exist that come up with reasonable but often varying results. The presenter's favourites were PICT and (as honorable mention) Jenny. For all the tools, the results they generate are pretty basic and some effort is required to use them to generate actual test data.

OOPSLA Tuesday Morning

It's an interesting being "industry" at a somewhat academic conference (although I think this is less academic than most). Less than quarter of attendees are from industry and they seem to be a very diverse bunch. Students and faculty are more cohesive groups, so I think they're easier to cater to.

The Keynote

Barbara Liskov got well deserved applause for her reprise of her Turing award acceptance speech, talking about her search for ways to raise the level of abstraction in order to enhance the expressiveness of programs. She put herself clearly in the Dijkstra camp, so she believes that ease of reading source code (especially other people's code) is more important than ease of writing it and she believes that we want to make it easier for people to reason about the behaviour and correctness of the code.

She talked about some of the papers that really influenced her research, starting (if I recall correctly) with Dijkstra's famous "Goto Considered Harmful" that was published as a letter to CACM (back when it was a serious research journal). It was interesting to see how there were nuggets of insight and how much of a struggle it was for researchers to come up with ideas we take for granted today, such as how to do data abstraction with encapsulation.

The message I heard was that there are still many areas where the level of abstraction we work with is still too low.

Onward! Papers

These lived up to my expectations -- vaguely plausible to borderline crazy.

Jonathan Edwards' work on Coherence (a language where statements inside a "co-action" all run simultaneously from the point of view of the programmer) looks brilliant. We'd all like to be rid of ordering-related bugs in our code. Unfortunately there's no implementation -- he says it's more important to get the definition right first. The idea of a "co-action" works perfectly in the rigged example but how does it work for something that is more realistic?

"Traditional assignment considered harmful" seemed mostly provocative, as it seemed to be saying a "swap values" operator would be easier to work with than a traditional assignment operator. A member of the audience trenchantly argued that the thing on the right of the assignment operator is an expression, not just a simple value (and it seems to me that blows a huge hole in the argument).

The "Silhouette" talk wins top marks for eccentric presentation style. The professor made the slides in the presentation deck from one to the next on behalf of the grad student presenter, but the presenter was in a film (possibly Quicktime?) inside the slides. It worked somewhat. The idea is to explore the concept of using shapes and the way they nest and overlap to explore program design in a highly visual way. It's extremely fuzzy at the moment, so it's hard to judge if it might be worthwhile. Every other visual programming metaphor has pretty much failed, but this one at least looks a little different.

Finally, there was something called "PI", where PI is of course the Greek letter. This is a kind of a meta programming language, although the presenter insisted on calling it a pattern language. It was built to scratch in itch (always a good sign) that the presenter ran into while trying to build languages for domain experts. It seems to be in the Lisp/Scheme family in the sense that you type expressions at a prompt and some expressions cause changes to the runtime. However, it goes much further in allowing new syntax to be added just by adding definitions to be added to the grammar that PI understands. It looks like a lot of existing ideas packaged slightly differently, but from what I can tell there is an implementation.

OOPSLA Monday

Some more notes on the tutorials I attended

Smalltalk

I was a bit disappointed in this tutorial. It never got much beyond installing an image and getting used to the syntax (which is admittedly a bit different from the curly brace syntax I grew up with). The slide deck was the old-fashioned kind where every point has a bullet point. I felt many of them could have been skimmed over with little loss. I was particularly unhappy about getting several slides at the start about why I might be interested in learning about Smalltalk. Isn't flying to Florida and paying to attend the tutorial enough evidence to show that I am already interested?

The image used was Pharo (apparently a fork of the Squeak project) and the meatier part of the tutorial was spent exploring the Class Browser, Test Runner, inspectors and the Monticello Browser.

The presenter, James Foster, was obviously very knowledgeable about Smalltalk but a little fuzzy on Java and other curly brace languages he was comparing it with. I mentioned that I had some exposure to Ruby and he responded with a money quote from Ward Cunningham that "I always expected Smalltalk to come back, I just didn't expect it to be called Ruby". Well, yes.

Parameterized Unit Testing

This should have been called "demo of the PEX tool". At least two other people who attended were annoyed because the tool could only be used with C# in Windows. Even though I do a lot of C# on Windows these days I was somewhat concerned at licensing terms (pick academic non-commercial or temporary evaluation). Java barely got a mention: apparently Junit4 has some support for parameterized unit tests and Agitar has a product that can be used to generate values.

The tool itself is impressive. You provide a unit test that takes parameters for the input values. It generates initial values (generally, the simnplest possible) and uses profiler hooks to instrument the bytecode at runtime to provide an extremely detailed trace of actual execution. The tool sees what paths are available at each branch and which are taken. It then uses a constraint solver to decide what values to try to make the execution take other branches. (The constraint solver comes from research into theorem proving and it understands restricted domains like integer arithmetic and simple program constructs such as tuples and arrays.) Within seconds it generates thousands of runs with different test data and keeps only those that caused a new execution path to be taken.

The amount of work required to get this functional is impressive. For example, they built into the tool a formal understanding of the semantics of every single byte code in the CLR (including what exceptions it could throw, including arithmetic overflow, which is possible if you configure Visual Studio the right way).

The question arises: if my unit test is going to get unknown (to me) input values how can I make meaningful assertions about its behaviour? In the tutorial, they presented some "patterns". In many cases, it boils down to identifying the group properties of your code: are there inverse operations, commutative operations, etc? If so, you can build a sequence of operations that should always have the same result, no matter what the input. Others relate to seeing that state invariants are preserved, for example if you insert something into a collection, you always expect to find it there afterwards.

This kind of tool looks promising as a way to alleviate the tedious task of coming up with plausible inputs. It does come at a cost to maintainability of unit tests. It could be very useful in generating regression tests.

It's just a shame that this is Microsoft only and, at that, not available for production use unless Microsoft decides to include it in their Visual Studio offering at some time in the future.