The first few weeks of Sublime Text package development, and what I learned.

Alright, this one is a bit of nostalgia and a bit of a brag.

When Sublime Text came out, I jumped on it because, well, I wanted TextMate and I didn’t have a mac, and I wanted something with the power and programmability of emacs, but emacs lisp is wonky and kinda evil and too different from Scheme to be enjoyable. Sublime Text was native windows, and python-powered, and python is a great choice for short, readable scripts.

Sublime Text was very young when I first found it. It shipped with python scripts to do certain basic operations, like re-flowing text, but the only person ever to write one was Jon Skinner, the developer of Sublime Text. Early on, I wrote a feature request asking for plugins and Jon was keen and supportive.

What was surprising to me was how quickly things can explode if you’re willing to just ask for them and fiddle on some basic code. I wrote the feature request for plugins on 20 Mar 2008. Jon lent a hand almost immediately. Within 11 days, I had;

This helped turned some enthusiastic developers and their distributed code into a proper community. The package repository eventually ended up with 40 packages, and the documentation became pretty good. These have both been superceded, but that’s good — I think what I established was more of a cheap prototype, and other people have gone on from that, but I’m happy to have taken the first steps on several fronts.

So, I took a few, disparate lessons from this;

Almost everything is free. I got wiki software, source control, libraries, and such, for nothing. I suppose we all know that there’s lots of free stuff on the net, but even though it’s so easy to grab it and combine it, and because it comes in bits, I sometimes forget that you can get a proper thing – running, deployed software available at a real domain – by just asking for it. And this was 2008. It’s even better now than it was then. If you have an MSDN account and aren’t using your Azure credits, you need to!

Allow Minute Programs. A python plugin can be three lines long. It makes it easy to dip your toe. If you can develop something good before having to establish ‘big-software’ features like namespacing, package definitions, etc, people will develop more of them.

– People improve on first versions. I think people are great at commenting on, improving, and altering existing content. By establishing an initial version – even a rubbish one – you give people something to alter. A buggy plugin, a wiki with two short pages – people will correct bad things more quickly than establish new things. Everything I did has been superceded, which is great, because it’s all miles better than I did. Whatever pearls that now exist, I am quite happy to have been some of the grit in the oyster.

Work in public: There was something about asking questions, developing, then showing the results creates a feedback loop. People like what you’re doing, you feel good, you share, they get better, they feel good, and so on. Nice. Probably most of my open source projects probably come out that sense of saying ‘see what I did!’ that feels good.

Small-project devs work hard for strangers who write good feature requests. I’ve found this over and over. You communicate your need clearly, and a possible approach or solution, and you give the developer a route to go down, a kind of ‘permission’ or justification. Tom Kerrigan, the developer of t-chess, is another great example of someone who works closely with his customers.

Advertisements

Debugging Entity Framework Seed methods

In Entity Framework projects, sometimes you need to debug your seed methods. This can seem a little tricky, since the code tends to run in the Package Manager Console;

pmc

And it’s a fire-and-forget kind of deal. Any runtime errors get swallowed. However, you can debug into your own EF code by;

1) Find the project that contains your DbContext and Configuration.cs file, and set it as your startup project.

2) Open the properties page for the project.

3) Set the external program to the EF migator exe. For EF 6.1, it’s at

($mySourceDirectory)\packages\EntityFramework.6.1.0\tools\migrate.exe

4) Set the command line arguments to point to the name of your EF assembly, the bin folder for that EF assembly, and the web.config or app.config which contains the connection string for the DB you’ll be updating, like so;

MyProject.EntityFramework.dll /startupConfigurationFile="..\..\..\MyWebsite\Web.config" /startupDirectory="full\path\to\MyProjectEntityFramework\bin\Debug"

5) Set breakpoints and press F5!

Drawing graphs in Visual Studio

Just a quick one, but I find that a picture is often a much better way to communicate than words, and Visual Studio has a built-in tool I had no idea about until today. It’s the directed graph editor;

dgml-visual-studio

Turns out VS has this built-in, nicely-featured graph editor which saves to an XML dialect called DGML. The feature seems to have been included to support Visual Studio’s code analysis tools, but it can quite happliy edit plain files with no connection to any kind of code. The simple upshot is that you get a free tool for building these sorts of pictures, which I find really useful for thinking through a complex problem.

The quickstart guide is;

  1. Create a blank file with a DGML extension.
  2. In a text editor, add a simple example. The wikipedia page on DGML has a three-node graph you can copy.
  3. Open the file in Visual Studio.

It’s pretty straightfoward to draw your graphs, and the toolbar above the editor window does things like auto-layout.

The resulting file is just XML, so it can be checked into source control, too.

dotless.NamedThemes – add file-based theme switching in .less css preprocessing

I’ve just released a new project on github and at nuget.org which helps you provide switchable themes when using the dotless LESS handler.

Here’s what you need to do. First, add it to your projet using the NuGet package;

Then enable the plugin by adding it to the <dotless> element in web.config;

<dotless minifyCss="false" cache="false" web="false">
    <plugin name="NamedThemesPlugin" assembly="dotless.NamedThemes" />
</dotless>

Tell the web.config about the folder that holds your theme files;

<configuration>
    <appSettings>
        <add key="dotless.NamedThemes:ThemeBaseUrl" value="~/Content/Themes" />
    </appSettings>
</configuration>

Write files named after your theme;

// ~/Content/Theme/blueTheme.less
@theme-base-color:#003d7d;
@theme-base-secondary-color:#003d7d;
@theme-font-size-base:14px;
@theme-header-font-size-base:12px;
@theme-nav-bar-color:#3a3a3a;
@theme-nav-bar-bg-color:#FFFFFF;
@theme-link-color:#003d7d;

Now, the clever bit; in your actual .less file, use the getThemeColor function;

// ~/Content/themeAware.less
@theme-base-color:getThemeColor(@theme,theme-base-color);
@theme-base-secondary-color:getThemeColor(@theme,theme-base-secondary-color);

And make sure you pass the @theme variable in the url;

http://mysite/content/themeAware.less?theme=blueTheme

Voila! The theme name and the theme-specific colour are retrieved by the plugin.

BomSquad; library to detect byte-order marks in files

BomSquad is a small utility for detecting byte-order marks in files. This often proves useful when using Java utilities, which tend to deal with BOMs very badly. I’ve been bitten with both JSCover and ANTLR, both of  which are excellent utilities, but which are written in Java and so can choke on some perfectly good input files. 

At the moment, it’s just a straight bit of code you can use to detect if a file has a BOM, and what kind of encoding is implied;

Use it like this;

var dog = new BomSquad.SnifferDog();
// eg, true
var hasBom = dog.DetectBom(filePath);
// eg "UTF-16 (LE)"
var bomType = dog.DetectBomType(filePath);

It is available through pre-release nuget package at https://www.nuget.org/packages/BomSquad/. Original source at https://github.com/stevecooperorg/BomSquad/.

Soon, I’ll add an MSBuild task, so you can automatically check your files as part of a build. This will help you never check in a broken file.

Domain-specific languages – Initial impressions of ANTLR4.

I’ve always been into writing my own domain-specific languages — I firmly believe that used right, they are a massive win for developers. Your system is different from everyone else’s, so the only person able to write the perfect language for your problem is you. General-purpose programming languages look awfully crufty compared to an elegant, dedicated language.

The problem, of course, is that a dedicated language is something of a beast to write, at least if you start from scratch. I’ve been programming DSLs for a few years, now, and I’ve even written the Canto toolkits, some open-source parsing toolkits for JavaScript and C#. But this kind of small-scale craftsman approach is to be compared with the industrial-strength behemoth that is Antlr. Antlr is a toolkit some 25 years in the making, and I’ve been tracking it’s progress, on and off, since about version 2. It recently moved to version 4. In the past, I’ve had trouble with it; it imposed such a strong idiom to your way of writing and it wasn’t one that gelled very well with my own way of approaching the problem. However, it’s just moved to v4, so I thought I’d give it another look.

The truth is, Antlr4 is good. Really good. Depressingly so, for someone who’s writing parsing toolkits. In fact, I love it. I love it more than my own projects.

The idea with Antlr is that you write a `grammar` for your language — describing the syntax that your language has. Here’s a snippet of the grammar to recognise JSON, from https://github.com/antlr/grammars-v4/blob/master/json/JSON.g4

 

grammar JSON;

json: object | array ;

object : '{' pair (',' pair)* '}'
       | '{' '}' ;

pair: STRING ':' value ;

So if you know JSON, you should be able to read this without difficulty. The first rules says “a JSON document is either an object or an array”. The second implements this line from JSON.org; “An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).”

You write the grammar, and Antlr builds you code to recognise that language. It handles two main jobs for you. The first is lexing — splitting a file into discrete words, or tokens — and the second is parsing — recognising syntactic structures like statements, lists, loop declarations, etc. Previous versions did that, but not in a way I ever really grokked. Until Antlr4…

So what makes Antlr4 so good?

First, the integration for Visual Studio is superb. There’s a VS extension which basically makes ANTLR part of C#; just do “Add | New Item… | ANTLR Combined Grammar” and you’ve started developing your language. Every time you build, it builds you a parser from the grammar. This is a big deal; no messing with the guts of .csproj files, or running hokey batch scripts. It’s better integrated than tools like T4, for example, which are actually part of visual studio.

Second, the structure of grammars has changed to make describing your language a distinct act from writing an application like an interpreter. Previously, you had to embed scraps of C# code throughout the grammar to make it do anything. That left something of a bad taste in my mouth in previous versions. Code was powerful, but got really ugly. By that I mean, you couldn’t look at an ANTLR grammar and say ‘Ah, I see clearly what the language looks like!’ There were bits and bats all over the grammar itself, obscuring the definition of the language. Now, Antlr takes the approach of providing post-parsing tools like listeners and visitors, which help you separate the act of parsing from the act of interpreting. This is the way a good parser should be structured, so I’m really glad to see that Antlr allows that kind of structure.

Just those two things meant that I felt confident letting other devs in on the secret. It doesn’t take a language wonk to be able to extend a language or interpreter. I know this because my pair-programming partner at work, when I was called away for a meeting, extended the language and the interpreter I’d been working on with literally no reading on the subject; just looked at the code, saw what it was doing, and extended it, in the time it took me to attend a meeting. Part of that is that she’s just a very smart dev; the other part is that it’s an intuitive language to play with, and if you understand the Visitor pattern, and if regular expressions don’t make you cry, you’ll pick it up.

Anyway, I can’t recommend it enough. Go try it for yourself.

 

Canto34-cs; a lexing and recursive-descend parsing toolkit for .Net

Introducing Cant34-cs – another OSS project I probably won’t maintain!

Alright, folks. I’ve just uploaded another library which has served as the base of language projects I’ve run over the years. It’s called Canto34-cs, and it’s available on both GitHub, where you can get the source under the very permissive MIT License, and on NuGet, where it’s currently on an Alpha 1 release. I’ve used it in a lot of places, so it’s pretty useful despite the version.

This blog post takes you round the basics. By the time you get to the end, you should be able to write a program to recognise simple languages.

Whenever you write a language — say, a DSL — you need to write a lexer and a parser. Lexers are pretty much identical the world over, and basically just need to recognise tokens; ‘for’ and ‘$foo’ and ‘int’. You do this by inheriting `Canto34.LexerBase` and making calls in the constructor, like this;

public class SExpressionLexer : LexerBase
{
public const int OP = 1;
public const int CL = 2;
public const int ATOM = 3;

public SExpressionLexer(string input): base(input)
{
this.AddLiteral(OP, "(");
this.AddLiteral(CL, ")");
this.AddPattern(ATOM, "[a-z]+", "atom");
this.StandardTokens.AddWhitespace();
}
}

So this is a lexer which recognises three token types – an open paren, a close paren, and an ‘atom’ token comprising strings like ‘a’, ‘abc’, or ‘abbbbbbbc’. So now you can split the contents of your file into tokens like this;

var lexer = new SExpressionLexer(input);
var tokens = lexer.Tokens().ToList();

So now you have a list of Token objects, and you can start doing the real work to recognise what’s going on inside the file. In our case, we’re going to recognise lisp-style s-expressions; that means either an individual item;

foo

or a list;

(a b c)

or a list of lists;

(def foo (lambda (x y) (+ x y)))

How’s that achieved? A program that recognises patterns in a sequence of tokens is a parser. In Canto34, you inherit from ParserBase;

public class SExpressionParser : ParserBase
{
internal dynamic SExpression()
{
if (LA1.Is(SExpressionLexer.OP))
{
Match(SExpressionLexer.OP);
}
else
{
var atom = Match(SExpressionLexer.ATOM).Content;
return atom;
}

var array = new List<dynamic>();
while (!EOF && !LA1.Is(SExpressionLexer.CL))
{
array.Add(SExpression());
}

Match(SExpressionLexer.CL);
return array;
}
}

So look at this code. you’ll see the parser has one method — SExpression() — which recognises an s-expression. There are two basic ideas here. One, look at the next token. Here, `LA1` is a class-level property representing the next token — `LA1` stands for “look ahead 1 token”. By looking at the next token in the stream, you can perform appropriate logic; “if the next token is an open bracket, start a new list”, or “if the next token is `for`, start looking for a for-loop.”

The second concept is matching a token — the `Match(tokenType)` method looks at the next token; if the next token isn’t the expected one, an exception is thrown, indicating a syntax error. But if it’s the right type, it returns the token and advances one token in the file.

So let’s say you’re trying to match expressions like

x = 3;
y = 10;

The pattern is;

assignment:
ID, EQUALS, NUMBER, SEMICOLON;

And your parser will have code like this;

public void Assignment()
{
var nameToken = Match(CalcLexer.ID);
Match(CalcLexer.EQUALS);
var numberToken = Match(CalcLexer.NUMBER);
Match(CalcLexer.SEMI);
Console.WriteLine("Recognised {0} = {1}", nameToken.Content, numberToken.Content);
}

See how every call to Match() is advancing through the file? As we keep matching tokens, we’ll move from the start of the file to the end, reading and interpreting the file as we go.

The last piece of the puzzle is calling parsing methods from inside other parsing methods. This is where the real power of this kind of parser comes from. Let’s take that assignment method and make it more powerful. Let’s say we want to handle both straight assignments, and addition expressions, like this;

x = 4;
y = 3 + 4;
y = 1 + 2 + 3 + 4;

We first need to recognise these open-ended maths expressions like “1 + 2 + 3 + 4”. We can define the language as;

assignment:
ID, EQUALS, additionexpression, SEMICOLON;

And our `additionexpression` is a recursively-defined expression;

additionexpression:
NUMBER [PLUS additionexpression];

See how that works? Either it’s a number, or it’s a number followed by ‘+’ and another addition expression.

So we define a parsing method that reflects that recursive definition, like this;

public int AdditionExpression()
{
// get the number
var numberToken = Match(CalcLexer.NUMBER);
var number = int.Parse(numberToken.Content, CultureInfo.InvariantCulture);

// look for a plus; recurse.
if (!EOF && LA1.Is(CalcLexer.PLUS))
{
Match(CalcLexer.PLUS);
number += AdditionExpression();
}
return number;
}

See how it matches the definition? read a number, look for a plus, if you see it read a number, look for a plus…

So now we can finish the job by calling this method in the original Assignment method;

public void Assignment()
{
var nameToken = Match(CalcLexer.ID);
Match(CalcLexer.EQUALS);
var number = AdditionExpression();
Match(CalcLexer.SEMI);
Console.WriteLine("Recognised {0} = {1}", nameToken.Content, number);
}

And now we can recognise everything we expect;

x = 12; // "Recognised x = 12"
x = 12 + 34; // "Recognised x = 46"
x = 12 + 34 + 56; // "Recognised x = 102"

So, we’re coming to the end of this intro to Canto34. Imagine the call stack when we are parsing that final expression. It’s going to look like this;

CalcParser.Assignment(); // x =
CalcParser.AdditionExpression(); // 12 +
CalcParser.AdditionExpression(); // 34 +
CalcParser.AdditionExpression(); // 56

This shape that the call stack makes is what gives this style of parser it’s name; a *recursive-descent* parser.