51 Elliot: 2013

Thursday, December 5, 2013

Node.JS Module Patterns using simple examples

Slides for a recent talk at Ottawa.JS on "Node.JS Module Patterns using simple examples" are available. The slides have been updated to include a brief intro to Common.JS, examples for exporting named and anonymous functions, objects and prototypes, and an explanation of "exports" vs. "module.exports".

http://darrenderidder.github.io/talks/ModulePatterns/#/

Saturday, September 28, 2013

JavaScript Efficiencies

I'm working on an article about patterns for structuring Express.JS apps, which is taking too long, so I decided to write this instead: Here are a few tips and tricks for JavaScript programming that I like.

Comment switches

Comment switches let you comment out entire blocks of code with a single character, or switch between two different blocks of code with a single character, which can be useful when prototyping.

//*
console.log("Hello!\n");
/*/
console.log("Goodbye!\n");
// */

Removing the first slash '/' toggles between these two print statements. See the original post for more examples of comment switches.

Iterate by Counting Down

You can iterate n times concisely, like this:

var n = 1000;
while (n--) { ... }

Defaulting Arguments

This is a handy way to provide a default value for undefined arguments in a JavaScript function.

function foo(bar) {
var bar = bar || "Some default";
...
}

Saturday, August 10, 2013

Kenka Matsuri II

Tuesday, August 6, 2013

Testing Express.JS REST APIs with Mocha

My post on Testing REST APIs with RESTClient and Selenium IDE has been getting some attention, but you might also want an automated testing solution to be part of your build process.

This post will cover testing REST API's built with Express.JS, using Mocha as the test framework. Express and Mocha are both by the same author and work well together. This technique can also be used to test Express apps in general.

In a nutshell, we'll create a test suite in Mocha that does the following:

instantiates and runs our REST API server (an Express app)
sends HTTP (or HTTPS) requests to the REST API
checks the responses for appropriate values

So, let's say you've written an Express app that works as a REST API server, and now you want to automatically test it over HTTP. Assuming you have Mocha installed, and are somewhat familiar with how to use it, building a suite to test your Express app over HTTP is relatively simple.

Our example test suite is a file located at test/app_test.js. The first thing you'll need to do is include the Express app you want to test. Mine is simply called app.js, so this is what the start of my test suite looks like:

Before we can start testing our API server app we need to start it. We can create a test suite called "app" for this. Before anything, we'll start the app. Then we'll check to make sure it is running. Here's what it looks like:

There are a couple of pieces missing from the test case above. We're using Node's build-in http library, so we need to include that at the top of our file as well:

And, there's a function defaultGetOptions, which is just a little helper utility to format our request:

Lastly, you can see that there's a variable called sessionCookie. This is how we maintain a session with our REST API, in case your API requires an authenticated session. You can simply declare this variable at the top of the file:

And set the cookie value whenever you get a response from the API server. Here is an example of setting the sessionCookie value when logging in:

As you can see, I'm using a few extra variables or functions - there are some test-user parameters, and a function to format the headers for a POST request. Having seen the example for a GET request, you should have no problem creating similar functions for POST, PUT, and DELETE (check out the Node docs for HTTP).

Finally, to make this all work you might need to modify your Express app to make sure that it does not start up automatically on the default port, if it is being included as part of a test suite. You do this by checking for module.parent.

Above, you can see that the app will only start actively listening if it is being run directly, but if it has been included in a test suite like ours, it won't automatically start up. This let's us start the app on a port of our choosing from within our test suite, like we did in the example above.

Hope that helps...

Wednesday, July 24, 2013

Machine Learning in JavaScript

I did a presentation at Ottawa JavaScript on machine learning, which covered a lot of the material in my two recent posts on Bayesian classifiers. This was new to a lot of the audience, so I made slides to step through a very simple example.

Using the "box of chocolates" analogy, the slides demonstrate how to predict if a chocolate contains nuts, depending on its colour and shape.

You can view the slides here:

http://darrenderidder.github.io/talks/MachineLearning

Thursday, July 4, 2013

Parable of the Banana Leaf

Charles and Ray Eames were brilliant designers. For them, design wasn't about style, it was a philosophy and a way of living and thinking. This is from a lecture by Charles Eames.

There's sort of a parable I'd like to . . . In India . . . I guess it's a parable: In India, sort of the lowest, the poorest, the, those, those without and the lowest in caste, eat very often--particularly in southern India--they eat off of a banana leaf. And those a little bit up the scale, eat off of a sort of a un . . . a low-fired ceramic dish. And a little bit higher, why, they have a glaze on--a thing they call a "tali"--they use a banana leaf and then the ceramic as a tali upon which they put all the food. And there get to be some fairly elegant glazed talis, but it graduates to--if you're up the scale a little bit more--why, a brass tali, and a bell-bronze tali is absolutely marvelous, it has a sort of a ring to it. And then things get to be a little questionable. There are things like silver-plated talis and there are solid silver talis and I suppose some nut has had a gold tali that he's eaten off of, but I've never seen one. But you can go beyond that and the guys that have not only means, but a certain amount of knowledge and understanding, go the next step and they eat off of a banana leaf. And I think that in these times when we fall back and regroup, that somehow or other, the banana leaf parable sort of got to get working there, because I'm not prepared to say that the banana leaf that one eats off of is the same as the other eats off of, but it's that process that has happened within the man that changes the banana leaf. And as we attack these problems--and I hope and I expect that the total amount of energy used in this world is going to go from high to medium to a little bit lower--the banana leaf idea might have a great part in it.

Thursday, June 20, 2013

BBPlayer - A Simple HTML5 Audio Player

BBPlayer is a minimalist HTML5 audio player for playlists that I made for a project recently. If you're looking for the best HTML5 audio player there are now many available; BBplayer provides a simple alternative with a clean and simple design that can be easily styled using CSS.

BBPlayer uses the HTML5 audio element and allows you to add multiple audio source tracks to easily create a playlist.

Visit the bbplayer demo page or get the code from github.

Sunday, February 10, 2013

The Christmas Dream

Friday, January 25, 2013

Improving Bayesian Filters

Last time we looked at a Simple Introduction to Naive Bayesian Filters (or classifiers), and saw how they work using the example of a box of chocolates. It turns out there's a simple trick you can use to make Bayesian filters more effective. I haven't seen it applied before, but it seems like an obvious technique in certain situations, and I found that it improves the efficacy of filtering by up to 10% in my test cases.

Bayesian filters are used for things like spam filters, where they look for certain words and phrases and rate the "spamminess" of emails. The presence of certain words tips off the spam filter and sends spam to your junk folder. When you're filtering spam, you don't really care about the non-spammy words that an email contains, and you don't care about words that are missing. There are over 600,000 words in the English language. Most of them will not be in an email. You simply don't care about the words that aren't there, if you're filtering email messages for spam. Applications like spam filtering care specifically about the presence of tokens (words) with a high probability of spamminess.

There are other use-cases, though, where you might care not only about tokens that are present, but also about tokens that are missing. An elephant, for example, always has a trunk. Big fat and gray you may be, but no trunk? Not an elephant. To make use of this information, we calculate the probabilities of a token not being present -- which is just the inverse of the probability for the token being present.

Going back to the box of chocolates example, suppose that every chocolate with nuts has fluted edges, always. No fluted edges, no nuts. In this case, we don't just care about the characteristics that are present (wrapper, no wrapper, etc). We care about the characteristics that are missing, too (like no fluted edges). This techniques works best when the total number of characteristics (aka the "vocabulary") is small enough to consider the presence and the absence of all possible tokens -- a few hundred or maybe a few thousand words.

The dclassify module for node is a simple implementation of a Bayesian filter that uses this "apply inverse" trick. In testing, using the apply-inverse option has improved results by 5 ~ 10% over conventional Bayesian filtering, when working with a vocabulary of approximately 20,000 unique tokens and a training data set of around 4000 documents.

Monday, January 21, 2013

A Simple Introduction to Naive Bayesian Filters

In the field of data analysis and machine learning algorithms, naive Bayesian filters remain popular since they're relatively easy to implement yet surprisingly effective. Most introductions to Bayesian filtering dive straight into mathematics, but Bayesian filters are simple enough that you can intuitively see how they work by stepping through an example.

Let's say we've got a box of chocolates, and some of them have nuts inside, and we don't like nuts. We'd like to be able to tell which chocolates have nuts in them. The chocolates are all different shapes, sizes and colors; some are wrapped and some aren't. The only way to tell if a chocolate contains nuts is to take a bite and see. By sampling the box of chocolates and keeping track of the shape, size, color and wrapping of all the ones we tried, we could come up with a pretty good idea of which ones are most likely to have nuts.

You could make a list of characteristics: size, color, shape, and wrapping. Then start eating chocolates. Every time you find one with nuts, you put an 'X' beside each of the matching characteristics (large, round, dark, etc). Every time you got a chocolate without nuts, put an "O" beside it's matching characteristics. By the time you got through half the box, the list might look something like this:

Small:     OOOOO
Medium: XXOOO
Large:       XXXOO
---
Round:       XXXOO
Square:      XXOOO
Long:        OOOOO
---
Dark:        XXXXX
Light:       OOOOO
White:     OOOOO
---
Wrapping:    XOOO
No Wrapping: XXXX OOOOOOO

The X's tell you where the nuts were. You can see that large, round, dark chocolates with no wrapping tend to contain nuts. For each of the characteristics, we can count up the X's and come up with a score for how often they contained nuts. The scores would look something like this:

Small: 0/5 (0%)
Medium:      2/5 (40%)
Large:       3/5 (60%)
---
Round:       3/5 (60%)
Square:    2/5 (40%)
Long:        0/5 (0%)
---
Dark:        5/5 (100%)
Light:       0/5 (0%)
White:       0/5 (0%)
---
Wrapping:    1/4 (25%)
No Wrapping: 4/11 (~36%)

You can see that none of the small chocolates had nuts in them. At least, not the ones we tasted. But that doesn't necessarily mean small chocolates never contain nuts. Look at the dark chocolates, for example. All the dark chocolates had nuts. So, how about a small, dark chocolate? What are the chances it has nuts? 0%? 100%? Or somewhere in between? What if its round? What if it had a wrapper?

Looking at our chart for small, round, dark chocolates with a wrapper, we see 0%, 60%, 100%, and 25%. How do we take this set of probabilities and turn it into a final score? As you might guess, when dealing with probabilities, we just multiply them together. The "0" value presents a problem, though. No matter how high the other values are, if there's a zero anywhere in the list, when we multiply it out, the whole result just goes straight to zero. So we fudge the numbers a bit. Instead of zero, we use a really small positive number (i.e. 0.01).

You've probably noticed that the more probabilities you multiply together, the smaller the final result becomes. This leads to weird-looking results. Even if a chocolate has all the hallmarks of nuts (large, round, dark and unwrapped) the final score comes out to only 0.13 (0.6 * 0.6 * 1 * 0.36 = 0.13). That doesn't sound right at all! Intuitively, we're almost certain it contains nuts. It ought to have a really high probability, right? Are we really saying it's only got a 13% chance of containing nuts? Not really. If we calculate a final score for this same chocolate not containing nuts (by multiplying the inverses) it's only 0.1% (one tenth of one percent)! We get this value by looking up the probabilities for nuts and subtracting from 1, to get the probability of not having nuts.

Here's our table of probabilities again, this time with an extra column for "No Nuts".

             P(Nuts)     P(No Nuts)
Small: 0/5 (0%)    5/5 (100%)
Medium:      2/5 (40%)   3/5 (60%)
Large:       3/5 (60%)   2/5 (40%)
---
Round:       3/5 (60%)   2/5 (40%)
Square:    2/5 (40%)   3/5 (60%)
Long:        0/5 (0%)    5/5 (100%)
---
Dark:        5/5 (100%) 0/5 (0%)
Light:       0/5 (0%)    5/5 (100%)
White:       0/5 (0%)    5/5 (100%)
---
Wrapping:    1/4 (25%)   3/4 (75%)
No Wrapping: 4/11(36%)   7/11(64%)

The probabilities for "no nuts" give us 0.4 * 0.4 * 0.01 * 0.64 = 0.001, or 0.1%. Comparing the odds of 13 to 0.1, it suddenly seems a lot more likely that this chocolate contains nuts, than not.

Instead of looking at a final value like 13% or 0.1%, we could ask "how much more likely is it that this particular chocolate contains nuts, than not?" In this case, the chocolate is 130 times more likely to contain nuts, than not.

At the end of the day, what we really want is a way to pick any chocolate and automatically label it "nuts" or "no nuts". If we get this correct most of the time, we'll be happy. The procedure is simple: pick a chocolate, look at it's characteristics, and calculate a value for "nuts" vs. "no nuts". Whichever value is higher, that's how we label it. That, basically, is a naive Bayesian binary classifier.

Kenka Matsuri

Participants in the Kenka Masturi (Fight Festival) carrying a shrine, Himeji, Japan