Leeds Code Dojo - Digit Recogniser with KNN

Last Wednesday, I went to my first code dojo. Held in Callcredit’s office, this dojo planned to build Skynet to occupy the city of Leeds with cyborgs. What would be the obvious thing that every war-machines have? Digit recognisers, of course! I can’t imagine what cyborgs would do without able to read digits? So, instead of making a cyborg in its entirety, we ended up making a crude and simple implementation of digit recogniser as our first step.

coding in pairs - photo by Grant Crofton

Coding in pairs - Leeds Code Dojo (photos by Grant Crofton, taken from Meetup.com's dojo page)

The technique that was used is called k-nearest neighbour (KNN), which is to find k number of things with the most similar features (nearest in distance). I paired with Phil to implement the recogniser using Objective-C. Since both of us are Machine Learning newbies, we simply compared the number of whitespaces for each image. This simplistic approach yields the accuracy of 18.75% (which is better than expected!). Below is the snippet for calculating distance using whitespace comparison.

I went continuing to improve the distance calculator using different techniques and tweaks. Here are the results for each technique with its accuracy and performance (the machine: Macbook 2010, 2.3 GHz Core i5, running Xcode 6.1 GM on OSX 10.9.5):

Calculate Distance Accuracy (%) Time (s)
Whitespace comparison 18.75 125.62
Sums of pixel differences 88.75 90.38
Euclidean distance (power=2, k=1) 90.25 106.87
Euclidean distance (power=2, k=3) 86.75 109.14
Euclidean distance (power=3, k=1) 90.25 111.04
Euclidean distance (power=3, k=3) 87.50 115.97

Adding neighbours seemed to be lowering the accuracy. Even after adding mechanism for putting weights for the neighbours (bigger weight for nearer neighbour), bigger value of k still lowers the final accuracy. The table shows us that using euclidean distance with power of two yields the best score in the shortest amount of time. Below is the snippet for calculating distance using euclidean distance (power=2, k=1).

I really enjoyed my first dojo experience. I met new people, got introduced to new languages, ate free pizzas, and learned new techniques. Code dojo made me realise that regardless of our ethnicity, race, text editors, indentation style and programming language, we know that deep down in heart we all developers have one similarity. We all want to make better cyborgs :)

Feel free to check the full code at Leeds Code Dojo’s repo.

Big thanks for Grant and Callcredit folks for organising this dojo!

Fusing talempong and gamelan with the help of an acoustic drum kit and a custom built Max/MSP patch. This performance is a part of my ‘Applied Performance’ module in Music Technology programme in School of Music, University of Leeds.

He’s a very opinionated man with bold things to say. He has useful insights but sometimes his delivery might be a bit too straight or even harsh for some people.

Two main points that I took from this talk:

  • Don’t structure your classes solely for testability
  • Strive for code clarity, read more code and write more code

Great ‘provocative’ talk from DHH. Worth the watch.