Devweek Day 1

I’m at Devweek this week. Yesterday was a workshop on SharePoint by Sahil Malik, an excellent speaker who took us on a whirlwind tour of the new version. That’s going to be the subject of a longer post — there was a massive amount of info and a lot of things to think about, so that might not appear for a few day.

Today’s talks are all about making sure we’re up to speed on some modern patterns and technologies. Today there are three talks:

9:30 Technical Keynote: There’s a storm coming…

11:30 SQL Azure overview – how to develop and manage it.

14:00 The new world of HTML5 and CSS3

16:00 Light-weight architectures for ‘everyone’ with ASP.NET Web API.

Creating Node packages in NPM

npm is the package manager for Node.js, the server-side JavaScript thingy. And it’s lovely. On a day-to-day basis, doing C#, I use NuGet in Visual Studio. While NuGet is a welcome addition to Visual Studio, npm seems to make it much easier to create and reuse packages. 

For instance, I’ve just pushed my parsing library, canto34, up to this location on npm, and the process was remarkably easy. The steps are –

1. create a user on the npm website.

2. Link your computer to it with npm command line;

npm adduser

 

3. Create a single file (package.json) which describes the package, its dependencies, and its location in git

4. Publish the package using npm again;

npm publish

And that’s it — package deployed. 

Now, that’s pretty sweet — anyone in the world can install the package using this command line;

npm install canto34

And get going with the library. But what about me? I want to continue to develop canto34 here on my dev machine, and use it in another project (in my case, a little program called Mettle that’s beginning to take shape.)

So I want to make changes to canto34 as I make changes to Mettle. I don’t want to have upload changes to canto34, and then download them in Mettle. I just want to develop both. 

npm lets you do this by CDing into canto34’s directory and typing

npm link

And then CDing into Mettle’s directory and typing

npm link canto34

Magic! npm ‘installs’ canto34 on my machine using a shortcut to my development copy, so that I now have this structure on my disk;

<src>/Mettle/node_modules/canto34

And I can now develop both projects together.

The thing here isn’t that the command line support is good — on its own, that’s pretty dull. The nice thing is now the npm system seems really well designed to get you sharing code, and doesn’t punish you for doing so. You can share libraries as open source, or use them in proprietary projects, or mix the two together, and it feels like npm is your buddy — a little like, I guess, an author feels about an editor; the technical guy that gets your artistry out to the world. 

All in all, a very good developer experience

 

Introducing Canto34.js, a lexing and parsing library in JavaScript

I mentioned it in passing in my last post, but I’ve been working on a project called Canto 34 which helps you write lexers and parsers in JavaScript. Here’s the current README.md from the GitHub repo

—–

Canto 34 is a library for building recursive-descent parsers.

When it comes to writing a parser, you get two main choices. Write your own, or use a parser generator like PEGJS or ANTLR.

I’ve never really had much success with parser generators, and it’s always seemed pretty easy to write a recursive-descent parser yourself, if you have some basic tools like a regex-based lexer, and some basic functions for matching tokens and reporting errors. Canto34.js gives you the functions you need to write a regex-based lexer and a recursive descent parser yourself.

A Simple Example

Here’s a simple example of a language which just defines name-value pairs;

foo 1, bar 5, baz 8.

We’ll see how to parse this language. First, you write a lexer which identifies the different kinds of token — names, integers, commas, and periods;

var lexer = new canto34.Lexer();

// add a token for whitespace
lexer.addTokenType({ 
    name: "ws",       // give it a name
    regexp: /[ \t]+/, // match spaces and tabs
    ignore: true      // don't return this token in the result
});

// add a token type for names, defines as strings of lower-case characters
lexer.addTokenType({ name: "name", regexp: /^[a-z]+/  });

// bring in some predefined types for commas, period, and integers.
var types = canto34.StandardTokenTypes;
lexer.addTokenType(types.comma());
lexer.addTokenType(types.period());
lexer.addTokenType(types.integer());

And here’s how you use it;

var tokens = lexer.tokenize("foo 1, bar 2.");

which returns a set of tokens;

[
    { content: "foo", type: "name",    line: 1, character: 1 },
    { content: 1,     type: "integer", line: 1, character: 5 },
    { content: ",",   type: "comma",   line: 1, character: 6 },
    { content: "bar", type: "name",    line: 1, character: 8 },
    { content: 2,     type: "integer", line: 1, character: 12 },
    { content: ".",   type: "period",  line: 1, character: 13 }
]

Now you feed these tokens into a parser. Here’s a parser for our language;

var parser = new canto34.Parser();
parser.listOfNameValuePairs = function() {
    this.result = [];
    this.nameValuePair();
    while (!this.eof() && this.la1("comma")) {
        this.match("comma");
        this.nameValuePair();
    }
    this.match("period");
};

parser.nameValuePair = function() {
    var name = this.match("name").content;
    var value = this.match("integer").content;
    this.result.push({ name:name, value: value });
};

And it’s used like this;

var tokens = lexer.tokenize("foo 1, bar 2, baz 3.");
parser.initialize(tokens);
parser.listOfNameValuePairs();

parser.result now contains the value;

[
    {name:"foo", value:1},
    {name:"bar", value:2},
    {name:"baz", value:3},
]

And we’re done! That’s a basic lexer and parser, written in thirty lines of code.

What’s canto34.js good for?

Canto 34 is designed for quickly writing parsers for straightforward domain- specific languages (DSLs). Please don’t use it for writing parsers for general-purpose programming languages — it’s not designed to be that fast or powerful.

Running jasmine tests using jasmine-node and autotest

I’ve just started using Jasmine-node on a project I’m starting to help people build recursive-descent parsers. Jasmine-node is a neat little tool, because it allows you to write a command line like this;

jasmine-node spec/ --autotest

And it’ll start a process which monitors your tests. As you save your tests (the .spec.js files), the tests are all re-run, and this makes for a really tight development cycle. As you save, without anything other than ctrl+s, your tests run. Beautiful, really — the fact the test is running without any interference means that you don’t even think about a code-run-test cycle — it’s just code-code-code. Neat.

Special thanks to Clare for helping me get the first few Jasmine tests written. That initial understanding is the thing that cracks open a lot of possibility.

Is mercurial dying?

Is it me, or has Mercurial well and truly lost the race against git? 

A couple of things make me think that mercurial is no longer a particularly viable option. Firstly, I’m only really hearing about development on git, and particularly on GitHub. I regularly hear things like “I just pushed TCP support for <package X> to GitHub” but I don’t know that I’ve ever heard a similar “<package X> just got it’s tests fixed on BitBucket.” Maybe this is because I’m looking at a lot of JavaScript projects and not looking at Python projects, but it definitely seems that all the cool kids are on Github. 

Second, I just did a very quick scan for git and mercurial integration for Visual Studio. Microsoft are releasing a git extension in their next service pack (Visual Studio 2012 Update 2). Mercurial integration packages like VisualHG aren’t even compatible with Visual Studio 2012.

The disappointing thing here is that GitHub’s policy on private repos is so much tighter than BitBucket. In BitBucket, I get unlimited free private mercurial and git repos. In GitHub I have to pay to get any. I understand why GitHub needs to charge. Of course I do. It’s just that I’ve been avoiding Github because I have a lot of stuff that’s not fit for public consumption but which I want under source control.

Ah, well. First-world problem.

Using Twitter Bootstrap Typeahead to select a key/value pair, not a string

Twitter Bootstrap comes with a UI widget called Typeahead — an auto-complete textbox. Problem is, it works only with strings — you type a string, it searches a list of strings, when you choose one, it tells you you selected a string.

This is fine in some cases, but it’s not much cop if you want to select, say, a primary key field.

I’ve found an approach for this that lets you select in a more traditional manner — search names, return a different, unique value. It uses the ‘highlighter’ function and a little bit of JSON. Highlighter is a bit of a misnomer — it’s actually an arbitrary function for re-writing the text that appears in the dropdown.

So when you invoke, say, $(‘#mysearchbox’).typeahead(), you pass a set of options. Here’s what you need to do for each option;

source: Set the source option to a function returning an array of JSON-serialized name/value pairs;

return [
JSON.stringify({ name:”one”, value: 1 }},

    JSON.stringify({ name:”two”, value: 2 }},

    JSON.stringify({ name:”three”, value: 3 }},

    JSON.stringify({ name:”four”, value: 4 }},

    JSON.stringify({ name:”five”, value: 5 }}

];

highlighter: Set the highlighter to return just the name, not the value;

highlighter: function(item) {
return JSON.parse(item).name;
}

matcher: Set the matcher to return things that match the right name;

matcher: function (item) {
return

      JSON.parse(item).name.toLocaleLowerCase()

      .indexOf(this.query.toLocaleLowerCase()) != -1;
}

updater: Use the updater to catch the selected item;

updater: function (item) {
alert(JSON.parse(item).value);
return JSON.parse(item).name;
}

This pattern lets you search for text but return a corresponding key.

Hosting personal projects in TFS online

I’ve been using TFS at work for a while, and I like it. Turns out you can sign up for a free TFS account with Microsoft here;

http://tfs.visualstudio.com/

Which is pretty good going. I could use something else, but I’ve tried using different stacks and it’s always been more hassle than it’s worth. You can’t quite get ‘fluent’ when you’re working on two very similar stacks. (Six months coding C# during the day and VB at night, and you’ll forget how to write ‘if’ statements)

Additionally, it’s not just source control, but also the Scrum methodology, so you can do sprint planning and all that jazz.

Anyway, I think this’ll go well. Let’s see.

‘This’ in JavaScript and C#

copied from my old blog: http://www.stevecooper.org/index.php/2010/01/16/this-in-javascript-and-csharp/

I noticed something today while learning jQuery, and that’s the way the keywordthis differs between C# and JavaScript. It suprised me when I saw some javascript that looked like;

01 $(document).ready(function() {
02   $(‘div’).each(function() {
03   this.style.color = ‘blue’;
04   });
05 ));

and I realised that this wouldn’t work in C# — at least, not the same way it works in JavaScript. In the JavaScript above, the this on like 03 refers to each div element that’s being iterated over.

Now consider similar C# code;

class Document { 
    List divList = …;
    void Ready() {
        divs.foreach(delegate () { this.style.color = “blue”; });
    }
}

In C#, this doesn’t refer to the div, but to the Document class.

In both pieces of code, we’re creating a function with a reference to this, but they mean different things;

  • In C#, this means the object that declares the function
  • In JS, this means the object the function is being invoked on.

To see the difference, realize that you can attach the same function to two different javascript objects, and you’ll see this referring to each one in turn. Here’s a piece of javascript to illustrate;

var func = function() { alert(this.name); }
var obj1 = { ‘name’: ‘first object’, ‘func’: func }; 
var obj2 = { ‘name’: ‘second object’, ‘func’: func };
obj1.func(); obj2.func();

When you run this; you get two alerts: first object and second object. But when you run this in C#

Action func = delegate() { MessageBox.Show(this.GetHashCode()); };
var obj1 = new { func = func }; 
var obj2 = new { func = func };
obj1.func(); obj2.func();

You see the same hashcode in both message boxes. It’s the hashcode of the object that contains this method.

So. Don’t confuse the meaning of this in C# and JavaScript. They are very different beasts.

Now, if you want C#’s semantics in Javascript, you have to take account of this behaviour. With my C# head on, I was tempted to understand ‘this‘ as a variable name, but it isn’t. It’s a keyword, and not a variable name. To make it work like C#, you need to create a real variable, and use the variable in the function. Like so;

var outerThis = this; // declare a real variable
func = function() { alert(outerThis.name); }

And this will give you C# semantics in Javascript.

Statics and unit testing

I was reading http://misko.hevery.com/2008/12/15/static-methods-are-death-to-testability/ by Miško Hevery. He makes some excellent points about using statics and the effect they have on testing.

There seems to be a common misreading of the original article, where he says something like ‘if you use a static in a class, that class is harder to unit test’ and people hear ‘statics are hard to unit test’. This needs some addressing. This post seeks to explain why statics compromise testability with an example.

So, consider the development of a dictionary class with string keys. At some point you will want to hash the keys and assign the values to bins;

# pseudocode
class Dictionary:
    Add(key, item):
        hash = String.GetHashCode(key)
        bucketIndex = hash % maxBuckets
        ...
        this.buckets[bucketIndex].Add(item)

Now, a hash function (String.GetHashCode) is a perfect candidate for a static; it is a pure function with no state to worry about. It’s also easy to test. There are no worries about whether the static is testable.

But what about our dictionary class? Before I release this on the world, I want to convince myself that I’ve got the logic right for what happens when we get a collision — when two different keys generate the have the same hash code.

With the static sitting there in the Add function, I’m out of luck. How can I guarantee a collision in a test? I’d need to know two keys which generate the same hash code, and write a test like this;

    key1 = "2394820934802934820348"
    key2 = "lgjeibrieovmdofivevrij"
    dictionary = new Dictionary()
    dictionary.Add(key1, "A")
    dictionary.Add(key2, "B")

And here the dependency on the static becomes clear; in order to write a test for the dictionary, I need to know the implementation details of the static to write my test. I’ve made unit testing harder — remember, the unit is the dictionary.

So to make it testable, I need to add in a new idea into my Add method; the idea of a swappable object with a GetHashCode instance method;

# pseudocode
class Dictionary:
    Constuctor(hasher):
      this.hasher = hasher
    Add(key, item):
        hash = hasher.GetHashCode(key)
        bucketIndex = hash % maxBuckets
        ...
        this.buckets[bucketIndex].Add(item)

Now my test becomes trivial;

    class AlwaysCollidingHasher inherits Hasher
       GetHashCode(key):
           return 0 # this hasher always causes a hash collision

    dictionary = new Dictionary(new AlwaysCollidingHasher())
    dictionary.Add("A", "A")
    dictionary.Add("B", "B")

Now I have a proper unit test; my dictionary can now be tested reliably for this dangerous condition, and it’s no longer relying on a particular implementation detail of another class. It’s a true unit test.

So to reiterate; the point isn’t that statics are hard to test. The point is that sometimes, things that use statics are hard to test.

Welcome

Hi,

I’m Steve, and I’m starting a new programming blog. It’s early days yet, but my plan is to write about .Net development, focussing on how to write good, object-oriented, clean, testable code. We’ll see how things pan out…