Calculating expected finger table entries for virtual ring distributed systems like Chord and Cassandra

Here’s a block of code I wrote to print out finger tables for Chord-like distributed hash tables. Just update the ‘nodes’ variable and the m-value to adjust to any system. Runs happily in Node.js 4.4.0.

// calculating finger tables for Chord-like distributed hash table.

// the size of the ring, as 2^m;
var m=9, ringSize = Math.pow(2,m);

// the nodes we know about
var nodes = [1, 12, 123, 234, 345, 456, 501]; 

// just make sure it's sorted
nodes = nodes.sort( (a,b) => a-b );

console.log('the finger tables for a ', ringSize,'-virtual-node system');
console.log('========');
console.log('nodes: ', nodes);

// calculate the finger table for each node
nodes.forEach(n => {

 var fingerTable = [];
 i = 1;
 for(var x = 0; x < m; x++) {
 var nodeId = (n + i) % ringSize;
 // find the node after nodeId; 
 var ftEntry = nodes.filter(x => x > nodeId).concat(nodes)[0];
 fingerTable.push(ftEntry);
 i = i << 1;
 }
 
 console.log(n + ':', fingerTable);
});

Fun With ORM Includes!

This is a cautionary tale about how an ORM (in this case, Entity Framework) can screw your performance if you don’t pay attention to the queries it generates. Particularly, the use of recusive includes.

Here’s a simplified version of the original problem;

var person = await DataContext.Set<Person>()
 .Include(p => p.Groups)
 .Include(p => p.Manager)
 .Where(p => p.Id == personId)
 .SingleOrDefaultAsync();

So here’s the intent, and the bug. The code above is supposed to say ‘load a person with the specific ID, and include his/her groups and manager’

And here’s the problem. EF sees that you want to load Person.Manager. Person.Manager is a person, so we’ll need to include the manager’s groups. But ah! Person.Manager has a Manager, which has a Manager, which has a Manager. EF goes batshit at this point, and generates over 90 joins;

person join person join person join person join person ....

And that query goes off, loads far too much data, and takes far too much time; in the order of seconds. Now this is a query that should load in small numbers of milliseconds..

The fix I put in was to divide this into two, much faster queries. Something more like this;

// don't load manager etc.
 var person = await DataContext.Set<Person>()
 .Include(p => p.Groups)
 .Where(p => p.Id == personId)
 .SingleOrDefaultAsync();

if (person.ManagerId.HasValue) {
 var manager = await DataContext.Set<Person>()
 .Include(p => p.Groups)
 .Where(p => p.Id == person.ManagerId.Value)
 .FirstOrDefaultAsync();
 }

So we hit the database twice, but with much cheaper queries. This may not be perfect (I’d like to get everything in one shot) but I couldn’t think of a better approach.

Here’s the takeaway, though. It’s almost never right to use a recursive include (like Person.Manager) unless you want to load complete branches or ancestor paths. When mixed with other includes (like Person.Groups) it leads to an explosion in the complexity of the query. So;

  • Measure the performance of any queries you write. Use the free Stackify Prefix  or some other tool to make sure your queries are small-numbers-of-milliseconds fast. If not, question whether you could make it faster.
  • Use Stackify Prefix to examine the generated SQL. Sure, EF can be a bit verbose, but any time it generates more than a page of SQL, question whether the query is as simple as it could be.
  • Don’t ask for it if you don’t need it. Don’t Include() in data just in case.
  • Two very cheap queries beats one awful one. EF’s only so clever, and it doesn’t handle lots of includes well, so don’t be afraid of two efficient queries when EF goes mental.
  • On the other hand, don’t take that as permission to write an N+1 bug. This is where you load a ‘master’ object, then iterate through children, loading each in turn. This will kill your server, and it’s the most common database performance bug written in enterprise software.

 

Simulating slow POST requests with Fiddler

Sometimes, your dev machine is too damned fast, and you need to slow down how something works in order to investigate a potential bug. This post shows how to do that with a custom rule for Fiddler.

We recently had to make sure that a button on a web page worked like this: the user clicks the button; the button is disabled; the work completes; the button is re-enabled.

However, on a fast developer machine, the whole process can be far too quick. What you need to do is simulate the server taking a long time, so that you can visually confirm the delay.

Turns out you can use Fiddler to do that. It uses a strange JavaScript variant to do it. (Telerik; I’m pretty sure we wouldn’t mind it being plain old JavaScript, if that’s ok!)

Here’s what you need to do;

Start Fiddler.

In the “Rules” menu, choose “Customise Rules…”

About line 65, paste this fragment;

public static RulesOption("Delay &POSTs by 2s", "Per&formance")
var m_DelayPostResponses: boolean = false;

There will be similar pairs of lines thereabouts; just put it with the similar code.

What this does is register a menu option at Rules > Performance > Delay POSTs by 2s. It’ll appear when you save the file.

Then, at the top of the OnBeforeRequest(Session) method around line 162, paste this;

if (m_DelayPostResponses && oSession.HTTPMethodIs("POST")) {
  // active wait loop
  var msecs = 2000;
  var start = new Date().getTime();
  while(new Date().getTime() - start < msecs) { 
  }
}

This introduces a 2-second wait loop for POSTs, if you’ve turned the option on in the Rules > Performance menu.

So once you save this file, fiddler automatically reloads it and the toggling menu item appears. Use this whenever you want to test that your code works around slow POST methods.

Micro-optimisation; ‘starts with single-character string’ vs str[0]

I’m writing code with very stringent performance characteristics, and it’s a string validation routine. One part is a short-cut that kicks in if we might potentially be looking at a JSON object or JSON array. I’d written

var mightBeJson = value.StartsWith("{") || value.StartsWith("[")) {

as a short-cut. Turned out that since this is almost the first thing that happened in the code, this made a massive, terrible hit on performance. Switching to treating the string as a char array makes a massive difference;

 var mightBeJson = value[0] == '[' || value[0] == '{';

Night and day difference!

Takeaway — never compare single-character strings like “X” when you can compare chars like ‘X’ directly.

Setting up log4net to write to trace.axd

I’ve just been looking at logging info from MVC sites through to ASP.NET’s trace.axd handler. Here’s what you need to do.

1) Install log4net from NuGet;

Install-Package log4net

2) Tell log4net to read its config from XML – specifically, from your web.config file;

// Global.asax.cs;
 protected void Application_Start()
 {
   ...
   XmlConfigurator.Configure();
   ...
 }

3) Update web.config with three sections – one to register the log4net element, the log4net config itself, and the ‘trace’ element to turn on asp.net tracing;

web.config, part 1 — register the config section;

<configuration>
  <configSections>
    <section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net" />

web.config, part 2 — add the log4net element under <configuration>;

<log4net debug="true">
  <!-- writes to normal .net trace, eg DBGVIEW.exe, VS Output window -->
  <appender name="TraceAppender" type="log4net.Appender.TraceAppender">
    <layout type="log4net.Layout.PatternLayout">
      <conversionPattern value="%date [%thread] %-5level - %message%newline" />
    </layout>
  </appender>
 
  <!-- writes to ASP.NET Tracing, eg Trace.axd -->
  <appender name="AspNetTraceAppender" type="log4net.Appender.AspNetTraceAppender" >
    <layout type="log4net.Layout.PatternLayout">
      <conversionPattern value="%date [%thread] %-5level %logger [%property{NDC}] - %message%newline" />
    </layout>
  </appender>
 
  <!-- writes errors to the event log -->
  <appender name="EventLogAppender" type="log4net.Appender.EventLogAppender">
    <applicationName value="AI Track Record" />
    <threshold value="ERROR" />
    <layout type="log4net.Layout.PatternLayout">
     <conversionPattern value="%date [%thread] %-5level %logger [%property{NDC}] - %message%newline" />
    </layout>
  </appender>
 
  <root>
    <level value="ALL" />
    <appender-ref ref="AspNetTraceAppender" />
    <appender-ref ref="EventLogAppender" />
    <appender-ref ref="TraceAppender" />
  </root>
 
</log4net>

web.config, part 3 – enable trace element in <configuration> / <system.web>

 
<configuration>
  <system.web>
    <trace writeToDiagnosticsTrace="true" enabled="true" pageOutput="false" requestLimit="250" mostRecent="true" />
  <system.web>
</configuration>

4) Finally, to actually use the log, you need to create a logger;

// maybe add this to each class, or to be fancy, inject using Ninject;
private static readonly ILog log = LogManager.GetLogger(typeof(MyClassName));

and write to it using methods like Info, Debug, etc;

log.Debug("Here's my excellent message");

5) Check Trace.axd! You’ll see your messages written to the ‘Trace Information’ section

Summary

Things to note — each appender has its own format, in the <converstionPattern> element, and its own ‘seriousness’, in the <threshold> element. So you can set up fine-grained control, like logging absolutely everything to a file and only errors to event log, or writing sparse information to trace.axd while writing verbose messages to a database table.

.Net Gotcha: Decimals track their number of decimal places

tl:dr; System.Decimal values in .Net track their decimal places, so that 12.0 is not precisely the same as 12.00000. This will cause you trouble if you expect the string representation to be the same.

The other day I was calculating a key signature for some records. A key signature is a way to combine a compound primary key of any type and create a representative string value;

// the record has the primary key { 42, 99 }
signature == "42|99"

The idea being that you can compare records using simple string comparison. Simplifying comparison means that operations like sorting, duplicate detection, or JOIN-like operations can now be implemented by working on sequences of strings.

The difficulty is this — how do you reduce arbitrary value types to strings? A straightforward approach is just to call Convert.ToString on the value, and that was my first approach;

string CalculateSignature1(object[] keyValues) {
    var sb = new StringBuilder();
    foreach(var value in keyValues)
    {
        if (value == null) {
            sb.Append("<null>");
        } else {
            sb.Append(Convert.ToString(value));
        }
        sb.Append("|");
    }
    if (keyValues.Length > 0) { sb.Length -= 1; }
    return sb.ToString();
}

Which works like so;

CalculateSignature1(new object[] { 1, "hello", null, 99 }) == "1|hello|<null>|99";

This is reasonable and works fairly well until you come to Decimals.

Decimals cause you problems because they track their decimal place. By that I mean that these two objects are different

var sb = new StringBuilder();
Decimal a = 12.0d; // Convert.ToString returns "12.0"
Decimal b = 12.000d; // Convert.ToString returns "12.000"

So you can see that Convert.ToString is returning different representations.

This was a beast to track down, because the expression a==b returns true. So tests, debuggers, etc seem to show them as identical values. The values, types, etc all tell you it’s the same value.

What you need to do is use String.Format or StringBuilder.AppendFormat to reduce to a common format — here’s the updated version;

string CalculateSignature2(object[] keyValues) {
    var sb = new StringBuilder();
    foreach(var value in keyValues)
    {
        if (value == null) {
            sb.Append("<null>");
        } else if (value is Decimal) {
            // decimals store their accuracy, so that 2.0M and 2.000M are not the same number. Here we
            // toString in such a way that numbers like 2.0M and 2.000M give the same representation in the
            // signature. To do otherwise is to calculate different signatures from different values, and
            // therefore 'miss' an appropriate join between source and target records. 
            var d = (decimal)value;
            sb.AppendFormat("{0:0000000000000000.0000000000000000}", d);
        } else {
            sb.Append(Convert.ToString(value));
        }
        sb.Append("|");
    }
    if (keyValues.Length > 0) { sb.Length -= 1; }
    return sb.ToString();
}

So watch out for that — it can sneak up on you!

—–

decimal-gotchas.csx — scriptcs code;

void AssertEqual(string a, string b) {
	Console.Write(a == b ? "PASS: " : "FAIL: "); 
	Console.Write(a);
	Console.Write(" == ");
	Console.WriteLine(b);
}

void AssertNotEqual(string a, string b) {
	Console.Write(a != b ? "PASS: " : "FAIL: "); 
	Console.Write(a);
	Console.Write(" != ");
	Console.WriteLine(b);
}


string CalculateSignature1(object[] keyValues) {
	var sb = new StringBuilder();
	foreach(var value in keyValues)
	{
		if (value == null) {
			sb.Append("<null>");
		} else {
			sb.Append(Convert.ToString(value));
		}
	    sb.Append("|");
	}
	if (keyValues.Length > 0) { sb.Length -= 1; }
	return sb.ToString();
}

string CalculateSignature2(object[] keyValues) {
	var sb = new StringBuilder();
	foreach(var value in keyValues)
	{
		if (value == null) {
			sb.Append("<null>");
		} else if (value is Decimal) {
			// decimals store their accuracy, so that 2.0M and 2.000M are not the same number. Here we
            // toString in such a way that numbers like 2.0M and 2.000M give the same representation in the
            // signature. To do otherwise is to calculate different signatures from different values, and
            // therefore 'miss' an appropriate join between source and target records. 
            var d = (decimal)value;
            sb.AppendFormat("{0:0000000000000000.0000000000000000}", d);
	    } else {
			sb.Append(Convert.ToString(value));
		}
	    sb.Append("|");
	}
	if (keyValues.Length > 0) { sb.Length -= 1; }
	return sb.ToString();
}

AssertEqual(CalculateSignature1(new object[] { 1, "hello", null, 99 }), "1|hello|<null>|99");

Decimal a = 12.0m;
Decimal b = 12.000m;

Console.WriteLine(a == b);

AssertNotEqual(CalculateSignature1(new object[] { a }), CalculateSignature1(new object[] { b }));
AssertEqual(CalculateSignature2(new object[] { a }), CalculateSignature2(new object[] { b }));

Syntax Hack – single-character block comment toggles

tl:dr; Use //*/ to terminate block comments, then use /* and //* to toggle the block in and out of code.

So this is a small trick but it’s really useful — a syntax hack at the same kind of level of a keyboard shortcut.

Most C-syntax langauges use line comments and block comments like so;


// line comment

/*
block comment
*/

Block comments mask more code, but they tend to be ‘fiddly’ to put into your code and take out. This can be a pain when you have a block you’re developing and you want to comment it in and out regularly;


/*
... big code block here
... which goes on for a while
... and you want to toggle it
... in and out regularly.
*/

So if you change your terminating symbol to //*/, an interesting thing happens; it functions as the terminator of a block comment inside a block comment, but a line comment otherwise. This means you can toggle code in and out by putting a single extra slash on the start of the block comment;


/*
code here commented out
//*/

but


//*
code here is now active
//*/

So there is it. A single-character togle of a block comment.

TypeWalker – keeps your c# DTOs in sync with TypeScript

TypeWalker

(Now available on github and nuget)

With TypeScript, Microsoft introduced a nice, strongly-typed way to develop a variant of JavaScript. It’s been written to be nicely agnostic to other languages — it’ll run for node.js server-side apps just as easily as integrating with your C# MVC projects. However, for complex C# apps, it’s really useful to have your TypeScript and your C# in lockstep.

For instance, let’s say you have an MVC controller returning a list of person records;

[HttpGet]
public ActionResult GetPeople() 
{
    PersonDto[] people = personRepo.GetAll(l);
    return Json(people, JsonRequestBehavior.AllowGet);
}

So far so good. You probably consume it in JavaScript like so;

$.getJson('/People/GetPeople', function(data) {
    // data will look like; 
    //    [
    //        { "firstName": "alice", "id": 1 },
    //        { "firstName": "bob", "id": 2 },
    //    ]
});

But in TypeScript we want to strongly type it, so we want to do something more like;

$.getJson('/People/GetPeople', function(data: PersonDto[]) {
    ...
});

where you can now see the PersonDto[] type annotation. This gives you strong typing throughout the function. So somehow I need to write a typescript interface;

interface PersonDto {
    firstName: string;
    id: number;
}

This is fine, but you’ve already got a PersonDto.cs file somewhere in your C# file, and there is no benefit in you coding both, at the risk of violating DRY, and at the risk of things going out-of-date over time.

Also, I like Knockout.js, and especially the mapping plugin, which converts POJOs into bindable objects. But these have different interfaces from the original object — a name property becomes aname() getter function and a name(value) setter function. This is fairly predicable, so to get TypeScript support you would need to add a third description of the class;

module KnockoutVersion {
    interface PersonDto {
        firstName(): string;
        firstName(value:string): void;
        id(): number;
        id(value:number): void;
    }
}

So that’s three descriptions of the same basic type. Urgh!

TypeWalker is a project which generates your TypeScript definitions from your C# types. This spares you the effort of keeping things in sync, and offers some stronger guarantees that your JavaScript will be less likely to contain the mistakes that creep in when people add or remove properties to classes or otherwise change the signature of a type.

At the moment it operates using the command line. You should be able to use it now by calling it manually. I am currently working at building it as a NuGet package which will extend a web application project and automatically build your TypeScript types at build time.

Command Line Operation

TypeWalker 
    /configFile=c:\src\mysite\AssembliesToOutput.txt 
    /language=KnockoutMapping 
    /knockoutPrefix=KOGen 
    /outputFile=c:\src\mysite\scripts\typings\mysite.d.ts

configFile is a text file containing lines formatted like so;

MyClassLibrary::My.Namespace.To.Export

So, pairs of assembly names (without .dll extension) and a namespace within that class library to export.

language is currently one of KnockoutMapping or TypeScript. You can call the app twice if you want both. Over time, I’ll make the languages pluggable, but right now they’re fixed.

knockoutPrefix is a namespace prefix for Knockout-generated files. This lets you generateMyModule.Person for your POJOs and KO.MyModule.Person for your knockout versions.

outputFile is just the file where the file should be written. Usually a .d.ts TypeScript definition file.

NuGet/MSBuild Integration

There’s rudimentary NuGet support;

Install-Package TypeWalker -Pre

Run this in your web app. This will install the binary and modify the .csproj file with the appropriate MSBuild targets. You now need to save the project, close and re-open the solution.

Next, you need to write a file containing the assembly names and namespaces to export — see above under the configFile documentation. Add this file to the project, then edit the properties of the file. Change the Build Action to GenerateTypeScriptResources.

Save, possibly close and re-open again, then start building. It’ll generate your output, switching the extension to .d.ts. For example;

Scripts/TypeWalker.txt
Scripts/TypeWalker.d.ts

JavaScript is a filthpig, Part £$!”@£@!:!: function names

I’ve reading Reg Braithwaite’s JavaScript Allonge, which is great, but in reading it I discovered one of the ways JS is batshit crazy.

We’re all used to writing and calling named functions;

function iAmAvailable() { 
 console.log("iAmAvailable has been called");
}
iAmAvailable();

Which gives the nice predictable result;

iAmAvailable has been called

So far so good. And we know we can put a function expression into a variable;

var iPointToAFunction = function() {
 console.log("iPointToAFunction has been called");
}
iPointToAFunction();

And this gives what you would expect, too;

iPointToAFunction has been called

So far, no surprises. But what if you try both together? A function with a name assigned to a variable?

var iPointToAFunction = function butIAmUnavailable() {
 console.log("iPointToAFunction/butIAmUnavailable has been called");
}
iPointToAFunction();
butIAmUnavailable();

This gives the surprising result;

iPointToAFunction/butIAmUnavailable has been called
ReferenceError: butIAmUnavailable is not defined
    at repl:1:1
    at REPLServer.defaultEval (repl.js:132:27)
    at bound (domain.js:254:14)
    at REPLServer.runBound [as eval] (domain.js:267:12)
    at REPLServer. (repl.js:279:12)
    at REPLServer.emit (events.js:107:17)
    at REPLServer.Interface._onLine (readline.js:214:10)
    at REPLServer.Interface._line (readline.js:553:8)
    at REPLServer.Interface._ttyWrite (readline.js:830:14)
    at ReadStream.onkeypress (readline.js:109:10)

Which is really inconsistent. It seems that because the function has been assigned to a variable, it’s been excluded from the environment. Or maybe it’s the other way around; functions not assigned to variables automatically create a variable of the same name.

Either way, it feels to me that creating a function with a function expression should be more consistent — the grammar like

function x() { ... }

Should either create a function variable called x, or not create it, but not create it ‘only sometimes’, depending on whether you assign it.

The first few weeks of Sublime Text package development, and what I learned.

Alright, this one is a bit of nostalgia and a bit of a brag.

When Sublime Text came out, I jumped on it because, well, I wanted TextMate and I didn’t have a mac, and I wanted something with the power and programmability of emacs, but emacs lisp is wonky and kinda evil and too different from Scheme to be enjoyable. Sublime Text was native windows, and python-powered, and python is a great choice for short, readable scripts.

Sublime Text was very young when I first found it. It shipped with python scripts to do certain basic operations, like re-flowing text, but the only person ever to write one was Jon Skinner, the developer of Sublime Text. Early on, I wrote a feature request asking for plugins and Jon was keen and supportive.

What was surprising to me was how quickly things can explode if you’re willing to just ask for them and fiddle on some basic code. I wrote the feature request for plugins on 20 Mar 2008. Jon lent a hand almost immediately. Within 11 days, I had;

This helped turned some enthusiastic developers and their distributed code into a proper community. The package repository eventually ended up with 40 packages, and the documentation became pretty good. These have both been superceded, but that’s good — I think what I established was more of a cheap prototype, and other people have gone on from that, but I’m happy to have taken the first steps on several fronts.

So, I took a few, disparate lessons from this;

Almost everything is free. I got wiki software, source control, libraries, and such, for nothing. I suppose we all know that there’s lots of free stuff on the net, but even though it’s so easy to grab it and combine it, and because it comes in bits, I sometimes forget that you can get a proper thing – running, deployed software available at a real domain – by just asking for it. And this was 2008. It’s even better now than it was then. If you have an MSDN account and aren’t using your Azure credits, you need to!

Allow Minute Programs. A python plugin can be three lines long. It makes it easy to dip your toe. If you can develop something good before having to establish ‘big-software’ features like namespacing, package definitions, etc, people will develop more of them.

– People improve on first versions. I think people are great at commenting on, improving, and altering existing content. By establishing an initial version – even a rubbish one – you give people something to alter. A buggy plugin, a wiki with two short pages – people will correct bad things more quickly than establish new things. Everything I did has been superceded, which is great, because it’s all miles better than I did. Whatever pearls that now exist, I am quite happy to have been some of the grit in the oyster.

Work in public: There was something about asking questions, developing, then showing the results creates a feedback loop. People like what you’re doing, you feel good, you share, they get better, they feel good, and so on. Nice. Probably most of my open source projects probably come out that sense of saying ‘see what I did!’ that feels good.

Small-project devs work hard for strangers who write good feature requests. I’ve found this over and over. You communicate your need clearly, and a possible approach or solution, and you give the developer a route to go down, a kind of ‘permission’ or justification. Tom Kerrigan, the developer of t-chess, is another great example of someone who works closely with his customers.