Domain-specific languages – Initial impressions of ANTLR4.

I’ve always been into writing my own domain-specific languages — I firmly believe that used right, they are a massive win for developers. Your system is different from everyone else’s, so the only person able to write the perfect language for your problem is you. General-purpose programming languages look awfully crufty compared to an elegant, dedicated language.

The problem, of course, is that a dedicated language is something of a beast to write, at least if you start from scratch. I’ve been programming DSLs for a few years, now, and I’ve even written the Canto toolkits, some open-source parsing toolkits for JavaScript and C#. But this kind of small-scale craftsman approach is to be compared with the industrial-strength behemoth that is Antlr. Antlr is a toolkit some 25 years in the making, and I’ve been tracking it’s progress, on and off, since about version 2. It recently moved to version 4. In the past, I’ve had trouble with it; it imposed such a strong idiom to your way of writing and it wasn’t one that gelled very well with my own way of approaching the problem. However, it’s just moved to v4, so I thought I’d give it another look.

The truth is, Antlr4 is good. Really good. Depressingly so, for someone who’s writing parsing toolkits. In fact, I love it. I love it more than my own projects.

The idea with Antlr is that you write a `grammar` for your language — describing the syntax that your language has. Here’s a snippet of the grammar to recognise JSON, from


grammar JSON;

json: object | array ;

object : '{' pair (',' pair)* '}'
       | '{' '}' ;

pair: STRING ':' value ;

So if you know JSON, you should be able to read this without difficulty. The first rules says “a JSON document is either an object or an array”. The second implements this line from; “An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).”

You write the grammar, and Antlr builds you code to recognise that language. It handles two main jobs for you. The first is lexing — splitting a file into discrete words, or tokens — and the second is parsing — recognising syntactic structures like statements, lists, loop declarations, etc. Previous versions did that, but not in a way I ever really grokked. Until Antlr4…

So what makes Antlr4 so good?

First, the integration for Visual Studio is superb. There’s a VS extension which basically makes ANTLR part of C#; just do “Add | New Item… | ANTLR Combined Grammar” and you’ve started developing your language. Every time you build, it builds you a parser from the grammar. This is a big deal; no messing with the guts of .csproj files, or running hokey batch scripts. It’s better integrated than tools like T4, for example, which are actually part of visual studio.

Second, the structure of grammars has changed to make describing your language a distinct act from writing an application like an interpreter. Previously, you had to embed scraps of C# code throughout the grammar to make it do anything. That left something of a bad taste in my mouth in previous versions. Code was powerful, but got really ugly. By that I mean, you couldn’t look at an ANTLR grammar and say ‘Ah, I see clearly what the language looks like!’ There were bits and bats all over the grammar itself, obscuring the definition of the language. Now, Antlr takes the approach of providing post-parsing tools like listeners and visitors, which help you separate the act of parsing from the act of interpreting. This is the way a good parser should be structured, so I’m really glad to see that Antlr allows that kind of structure.

Just those two things meant that I felt confident letting other devs in on the secret. It doesn’t take a language wonk to be able to extend a language or interpreter. I know this because my pair-programming partner at work, when I was called away for a meeting, extended the language and the interpreter I’d been working on with literally no reading on the subject; just looked at the code, saw what it was doing, and extended it, in the time it took me to attend a meeting. Part of that is that she’s just a very smart dev; the other part is that it’s an intuitive language to play with, and if you understand the Visitor pattern, and if regular expressions don’t make you cry, you’ll pick it up.

Anyway, I can’t recommend it enough. Go try it for yourself.


Canto34-cs; a lexing and recursive-descend parsing toolkit for .Net

Introducing Cant34-cs – another OSS project I probably won’t maintain!

Alright, folks. I’ve just uploaded another library which has served as the base of language projects I’ve run over the years. It’s called Canto34-cs, and it’s available on both GitHub, where you can get the source under the very permissive MIT License, and on NuGet, where it’s currently on an Alpha 1 release. I’ve used it in a lot of places, so it’s pretty useful despite the version.

This blog post takes you round the basics. By the time you get to the end, you should be able to write a program to recognise simple languages.

Whenever you write a language — say, a DSL — you need to write a lexer and a parser. Lexers are pretty much identical the world over, and basically just need to recognise tokens; ‘for’ and ‘$foo’ and ‘int’. You do this by inheriting `Canto34.LexerBase` and making calls in the constructor, like this;

public class SExpressionLexer : LexerBase
public const int OP = 1;
public const int CL = 2;
public const int ATOM = 3;

public SExpressionLexer(string input): base(input)
this.AddLiteral(OP, "(");
this.AddLiteral(CL, ")");
this.AddPattern(ATOM, "[a-z]+", "atom");

So this is a lexer which recognises three token types – an open paren, a close paren, and an ‘atom’ token comprising strings like ‘a’, ‘abc’, or ‘abbbbbbbc’. So now you can split the contents of your file into tokens like this;

var lexer = new SExpressionLexer(input);
var tokens = lexer.Tokens().ToList();

So now you have a list of Token objects, and you can start doing the real work to recognise what’s going on inside the file. In our case, we’re going to recognise lisp-style s-expressions; that means either an individual item;


or a list;

(a b c)

or a list of lists;

(def foo (lambda (x y) (+ x y)))

How’s that achieved? A program that recognises patterns in a sequence of tokens is a parser. In Canto34, you inherit from ParserBase;

public class SExpressionParser : ParserBase
internal dynamic SExpression()
if (LA1.Is(SExpressionLexer.OP))
var atom = Match(SExpressionLexer.ATOM).Content;
return atom;

var array = new List<dynamic>();
while (!EOF && !LA1.Is(SExpressionLexer.CL))

return array;

So look at this code. you’ll see the parser has one method — SExpression() — which recognises an s-expression. There are two basic ideas here. One, look at the next token. Here, `LA1` is a class-level property representing the next token — `LA1` stands for “look ahead 1 token”. By looking at the next token in the stream, you can perform appropriate logic; “if the next token is an open bracket, start a new list”, or “if the next token is `for`, start looking for a for-loop.”

The second concept is matching a token — the `Match(tokenType)` method looks at the next token; if the next token isn’t the expected one, an exception is thrown, indicating a syntax error. But if it’s the right type, it returns the token and advances one token in the file.

So let’s say you’re trying to match expressions like

x = 3;
y = 10;

The pattern is;


And your parser will have code like this;

public void Assignment()
var nameToken = Match(CalcLexer.ID);
var numberToken = Match(CalcLexer.NUMBER);
Console.WriteLine("Recognised {0} = {1}", nameToken.Content, numberToken.Content);

See how every call to Match() is advancing through the file? As we keep matching tokens, we’ll move from the start of the file to the end, reading and interpreting the file as we go.

The last piece of the puzzle is calling parsing methods from inside other parsing methods. This is where the real power of this kind of parser comes from. Let’s take that assignment method and make it more powerful. Let’s say we want to handle both straight assignments, and addition expressions, like this;

x = 4;
y = 3 + 4;
y = 1 + 2 + 3 + 4;

We first need to recognise these open-ended maths expressions like “1 + 2 + 3 + 4”. We can define the language as;

ID, EQUALS, additionexpression, SEMICOLON;

And our `additionexpression` is a recursively-defined expression;

NUMBER [PLUS additionexpression];

See how that works? Either it’s a number, or it’s a number followed by ‘+’ and another addition expression.

So we define a parsing method that reflects that recursive definition, like this;

public int AdditionExpression()
// get the number
var numberToken = Match(CalcLexer.NUMBER);
var number = int.Parse(numberToken.Content, CultureInfo.InvariantCulture);

// look for a plus; recurse.
if (!EOF && LA1.Is(CalcLexer.PLUS))
number += AdditionExpression();
return number;

See how it matches the definition? read a number, look for a plus, if you see it read a number, look for a plus…

So now we can finish the job by calling this method in the original Assignment method;

public void Assignment()
var nameToken = Match(CalcLexer.ID);
var number = AdditionExpression();
Console.WriteLine("Recognised {0} = {1}", nameToken.Content, number);

And now we can recognise everything we expect;

x = 12; // "Recognised x = 12"
x = 12 + 34; // "Recognised x = 46"
x = 12 + 34 + 56; // "Recognised x = 102"

So, we’re coming to the end of this intro to Canto34. Imagine the call stack when we are parsing that final expression. It’s going to look like this;

CalcParser.Assignment(); // x =
CalcParser.AdditionExpression(); // 12 +
CalcParser.AdditionExpression(); // 34 +
CalcParser.AdditionExpression(); // 56

This shape that the call stack makes is what gives this style of parser it’s name; a *recursive-descent* parser.

Introducing Blink: An Entity Framework Database-reset tool for TDD.

I’ve just started a new open source project you might be interested in if you use both TDD and Entity Framework 6.1.0.

It’s called Blink, and you can get hold of the pre-release binaries on NuGet and the source on GitHub.

Here’s the problem it tries to solve. (If you have this problem, let me know!)

When performing automated testing, it can be very expensive to initialize a fresh, real database. So expensive that you avoid testing against the real database at all costs. For example, the project that inspired me to start this library takes about a minute to build its database; that’s fine in a deployment scenario, but intolerable if you want to write tens or hundreds of integration tests. Blink re-initialises the DB in ~3s. That’s fast enough for TDD, if you’re careful about which tests you run.

It’s a very young project, currently so young it’s not really designed to be used on other people’s machine’s quite yet — there are some hard-coded strings that need replacing before it’ll work on anything other than a default instance of SQL Server 2012 x64, for instance. That’ll come soon, though.

This blog post is more of an announcement, though. If you’re interested, get in touch via the comments. Let me know if the project looks useful to you. We’ll see if we can’t make something good.

Here’s roughly what the code looks like;

// Create a new BlinkDBFactory, maybe inside [TestInitialize] or [SetUp]
var factory = Blink.BlinkDB.CreateDbFactory<TestDbContext, TestDbConfiguration>(
     () => new TestDbContext());

// Execute code, inside a transaction;
factory.ExecuteDbCode(context =>
    // use the context here;


// db edits are rolled back automatically

Dropbox hack; linking dropbox folders elsewhere on a disk

Dropbox is just about the best way to share files. I’ve tried things like source control, but to be honest, Dropbox just keeps things simple.

However, sometimes you need things like configuration files or data files to live in a particular space on disk, and Dropbox just doesn’t let you sync files in multiple places. Here’s how you can use symbolic links in windows to counterbalance that.

Here’s the example. I use some PowerShell scripts for building software. PowerShell scripts needs to live in

    <my documents>\WindowsPowerShell

So, I store them in


And then create a symbolic link using the mklink command, like this;

    > mklink /j <my documents>\WindowsPowerShell <dropbox?\Apps\WindowsPowerShell

That is, you specify the new directory first, and the existing directory second. And that’ll give you a folder that appears to be outside dropbox, but which syncs using dropbox.

Now you can do that on multiple PCs and share any number of folders between any number of machines

Cloning a TFS branch to Git in three simple steps

So it’s pretty straightforward to transfer TFS code, with history, into Git. This post shows a simple way to ‘clone’ TFS into a new Git repo.

1) Install Chocolatey

@powershell -NoProfile -ExecutionPolicy unrestricted -Command "iex ((new-object net.webclient).DownloadString(''))" && SET PATH=%PATH%;%systemdrive%\chocolatey\bin

2) Use Chocolatey to install GitTfs. Close Visual Studio first, though.

cinst tfsgit

Then restart the command window.

3) Choose a branch and get cloning;

git tfs clone http://tfsserver:8080/tfs/TfsInstanceName $/TeamProjectName/Path/To/Branch

That should sort you out. 

TypeScript, Knockout, TypeLite; strongly typed knockout development

TypeScript’s a great environment for development, and the strong typing is a real benefit, but in an MVC app it’d be good to integrate things between the server side and the client side. Let’s say you’ve got a data transfer object  defined in the server side, in C#, like so;

// C# Data Transfer Object
namespace TypeLiteDemo.ViewModels
  public class MyListItem
    public int Id { get; set; }
    public string Title { get; set; }

It would be great to request them from the server as JSON, and then on the client-side deal with it in a strongly-typed way with TypeScript. We want a matching declaration like this when dealing with objects from the server;

// TypeScript Data Transfer Object
declare module TypeLiteDemo.ViewModels {
  interface MyListItem {
    Id: number;
    Title: string;

So that we can do this;

$.getJSON('MyListItems', function(data: TypeLiteDemo.ViewModels.MyListItem[]) {
  // do something with strongly-typed data from the server

Even better, it would be great, if you’re using knockout mapping, to have the mapping viewmodel version, too;

declare module TypeLiteDemo.ViewModels {
  interface MyListItemViewModel {
    Id: KnockoutObservable<number>;
    Title: KnockoutObservable<string>;

So that you can do this;

$.getJSON('MyListItems', function(data: TypeLiteDemo.ViewModels.MyListItem[]) {
  // convert to knockout viewmodels;
  var vms = <TypeLiteDemo.ViewModels.MyListItemViewModel[]>ko.mapping(data);
  // now do something with the viewmodels;
  _.each(vms, function(vm) { /* extend the view models here */ }

Well, it should be no shock to learn that this blog post introduces a way of doing just that.

First, you’ll want to install TypeLite using NuGet. This gets you a default T4 template for generating the TypeScript definitions, but I’ll provide an alternative version below which generates the knockout viewmodels, too.

Copy and paste the code below into the file, modify the line that starts ‘var definitions’, and save. When you save, it’ll scan your assembly for the DTOs you mention, and produce two parallel sets of TypeScript defintions — one set for the DTOs, one for Knockout viewmodels.

<#@ template debug="false" hostspecific="True" language="C#" #>
<#@ assembly name="$(TargetDir)TypeLite.dll" #>
<#@ assembly name="$(TargetDir)TypeLite.Net4.dll" #>
<#@ assembly name="$(TargetDir)$(TargetFileName)" #>
<#@ import namespace="TypeLite" #> 
<#@ import namespace="TypeLite.Net4" #> 
<#@ import namespace="TypeLite.TsModels" #> 
<#@output extension=".d.ts"#>
 bool generateKnockoutFiles = true;
 var definitions = TypeScript.Definitions().For<TypeLiteDemo.ViewModels.MyListItem>();
///<reference path='typings/knockout/knockout.d.ts' />
///<reference path='typings/knockout.mapping/knockout.mapping.d.ts' /><#= definitions #>
<# if (generateKnockoutFiles) { #>
<#= definitions
 .WithFormatter(KnockoutMemberIdentifierConverter) #>
<# } #>
public string KnockoutTypeConverter(TsType type, ITsTypeFormatter formatter)
  return type.ClrType.Name + "ViewModel";

public string KnockoutMemberTypeConverter(string memberTypeName, bool isMemberCollection)
  if (isMemberCollection)
    return string.Format("KnockoutObservableArray<{0}>", memberTypeName);
    return string.Format("KnockoutObservable<{0}>", memberTypeName);

public string KnockoutMemberIdentifierConverter(IMemberIdentifier identifier)
  return identifier.Name;

Integration Testing EF6 — aggressively rebuild your database for an integration test

Sometimes you need to do a really end-to-end automated test involving your Entity Framework database. In cases like this, it’s important to be able to reset the database to a known state, but this can be fraught with difficulties — apps hold onto connections, and the code for re-building the database isn’t obvious. Here’s what I’m using at the moment.

This is a really aggressive database (re)initializer for EF code-first with migrations; use it at your peril but it seems to run pretty repeatably for me. It will;

  1. Forcibly disconnect any other clients from the DB
  2. Delete the DB.
  3. Rebuild the DB with migrations and runs the Seed method
  4. Take ages! (watch the timeout limit for your test framework; a default 60 second timeout might not be enough)

Here’s the class;

public class DropCreateAndMigrateDatabaseInitializer<TContext, TMigrationsConfiguration>
    : IDatabaseInitializer<TContext> 
    where TContext: DbContext
    where TMigrationsConfiguration : System.Data.Entity.Migrations.DbMigrationsConfiguration<TContext>, new()
    public void InitializeDatabase(TContext context)
        if (context.Database.Exists())
            // set the database to SINGLE_USER so it can be dropped
            context.Database.ExecuteSqlCommand(TransactionalBehavior.DoNotEnsureTransaction, "ALTER DATABASE [" + context.Database.Connection.Database + "] SET SINGLE_USER WITH ROLLBACK IMMEDIATE");
            // drop the database
            context.Database.ExecuteSqlCommand(TransactionalBehavior.DoNotEnsureTransaction, "USE master DROP DATABASE [" + context.Database.Connection.Database + "]");
        var migrator = new MigrateDatabaseToLatestVersion<TContext, TMigrationsConfiguration>();

Use it like this;

public static void ResetDb()
    // rebuild the database
    Console.WriteLine("Rebuilding the test database");
    var initializer = new DropCreateAndMigrateDatabaseInitializer<MyContext, MyEfProject.Migrations.Configuration>();
    using (var ctx = new MyContext())
        ctx.Database.Initialize(force: true);

You should also set up your connection string in a particular way. In your integration test project,

1. set up your connection string to have “Pooling=false;” This doesn’t help the speed of the test, but helps mitigate problems with multiple tests running against the integration test db. (Thanks to Ladislav Mrnka for this.)

2. set the initial catalog to a different DB from your production one — I add ‘IntegationTests’ to the end of the name of the database. This ensures you’re not going to delete the database which is, say, underlying the web app you’re building.

    <add name="MyContext" connectionString="Pooling=false;data source=localhost;initial catalog=MyContextIntegrationTests;[...]" providerName="System.Data.SqlClient" />


Lastly, you’ll need to make sure that your tests run in series, not in parallel. I use NCrunch, and needed to use the NCrunch.Framework.SerialAttribute to make sure that tests don’t overlap.

Invoking the TypeScript compiler when a project builds

If you’re using TypeScript, it’s good to set up your .csproj file so that it builds all the TypeScript files and compiles them to JS files. This makes sure that when someone else checks in a .TS file, you’ll get the corresponding .JS file without having to regenerate them manually, and that means you can avoid checking the .JS files into source control.

Here’s what you need to do to set up TypeScript for both developers and your build server.

1) Install the TypeScript VS2012/2013 extension from All your developers need to do this because the changes in this post make the TS compiler run every time the project builds.

2) If your developers or your build server use VS2012, or a mix of VS2012 and VS2013, you’ll need to edit your .csproj file to include some defaults. (We currently develop in VS2013 but the build server is VS2012.) Open the .csproj file in a text editor and insert the following segment. It goes very near the top of the file — in our project, it goes on line 5, after the <Import> elements, before all the other <PropertyGroup> elements.

<PropertyGroup Condition="!Exists('$(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion)\TypeScript\Microsoft.TypeScript.Default.props')">
 <!-- These are the defaults introduced in VS2013, duplicated here for VS2012 builds -->

2) Near the end, after <Import Project=”$(MSBuildBinPath)\Microsoft.CSharp.targets” />, insert this;

<Import Project="$(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion)\TypeScript\Microsoft.TypeScript.targets" />

3) If you have a build server set up, you’ll need to install the TypeScript compiler on the build server. Otherwise, your build server is going to fail to generate the JS files. So head to your machine at either

For a VS2012 build server: C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v11.0\TypeScript
For a VS2013 build server: C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v12.0\TypeScript

And copy the TypeScript folder to the same location on the build server. This installs the compiler into the location mentioned in the <PropertyGroup> element from Step 2.

This should do you. Save up the .csproj and reload it in Visual Studio. When you build the project, all the .JS files should be generated automatically every time you build. You may need to exclude the .JS files from source control.

StackOveflow-driven-development; ask the internet first, code second (but not really)

I’ve noticed a technique that’s been very helpful for me when writing software. Hopefully it’s a tool you might use, too.

When I come to an impasse — when I’m not quite sure what the next step in my work is — I find that I often hit a particular technical problem I can’t trivially solve. The problem that prompted this blog post was one that mashed together problems in Entity Framework, Generics, and Entity SQL. I realised this wasn’t going to be trivial, and I wasn’t sure how to proceed.

So, I hit stack overflow, clicked ‘Ask a question‘, and got to writing, I spent a while formulating and re-writing the question for the unknown programmers of the internet, trying to state the core of the problem and the way I wanted it solved. It included the problem, my approaches, my half-baked architectures.

Interesting alchemy starts to happen when you do this. You start to realise your problem in more detail. You start to see the flaws in your own thinking. You come to understand how you may have limited yourself, driving for the wrong kind of solution. When you get your own thoughts out on paper, you can see, then correct, the flaws.

So a question that starts;

“I’m having trouble querying for tables…

transmutes through

“Entity Framework doesn’t allow generics…


“I’m thinking of using code generation…


“If I can get hold of the EDMX model, I can run a code generator…

And you start to realise the shape of the problem you really help with.

In the end, stating the problem clearly for the folks on Stack Overflow (and this is a minimum standard for asking a question) is often all you need to see the solution to your own problem. So you may not need to post the question in the end. In this respect, it’s very similar to Rubber Duck Debugging, but with the additional benefit that, should you not figure it out, you can immediately post a high-quality, thoughtful question to Stack Overflow.

Anyway, you probably use something like this already, but it’s just a reminder that the technique is available.

Angular.ts – Angular and TypeScript

Just a short one, but I’ve started writing an Angular app with TypeScript.

There is a fairly big cognitive difference between the two systems. This seems to be typical of anything that works through conventional naming, and Angular.js is more full of conventions and magic strings than anything I’ve seen for a while.

This meant I had a fairly serious bit of fiddling to do to get Angular working with my TypeScript controllers. Setting up modules isn’t a terribly natural for, at least to my brain. Luckily, this guy has already walked the same ground, and has figured out some decent patterns for structuring your app. Check him out if you want Angular magic with strong typing.