Wednesday, November 12, 2014

Three Simple Rules for Escaping Callback Hell

A lot of newcomers to Node.JS complain about "callback hell" and the "pyramid of doom" when they're getting started with the callback-driven continuation passing style.  It's confusing, and a lot of people reach for an async / flow-control module right away.  Many people have settled on using Promises, a solution that brings some unfortunate problems along with it (performance, error-hiding anti-patterns, and illusory behavior, for example).

I prefer using some simple best practices for working with callbacks to keep my code clean and organized. These techniques don't require adding any extra modules to your code base, won't slow your program down, don't introduce error-hiding anti-patterns, and don't convey a false impression of synchronous execution. Best of all, they result in code that is actually more readable and concise, and once you see how simple they are, you might want to use them, too.

Here they are:
  1. use named functions for callbacks
  2. nest functions when you need to capture (enclose) variable scope
  3. use return when invoking the callback

The Pyramid of Doom

Here's a contrived example that uses typical node.js callbacks with (err, result) arguments. It's a mess of nested functions: the so-called Pyramid of Doom. It keeps indenting, layer upon smothering layer, until it unwinds in a great cascading spasm of parenthesis, braces and semi-colons.

Named Callbacks

The Pyramid of Doom is often shown as a reason to use Promises, but most async libraries -- including and especially Promises -- don't really solve this nesting problem.  We don't end up with deeply nested code like this because something is wrong with JavaScript. We get it because people write bad, messy code.  Named callbacks solve this problem, very simply. Andrew Kelley wrote about this on his blog a while ago ("JavaScript Callbacks are Pretty Okay"). It's a great post with some simple ways of taming "callback hell" that get skipped over by a lot of node newcomers.

Here's the above example re-written using named callback functions. Instead of a Russian doll of anonymous functions, every function that takes a callback is passed the name of the callback function to use. The callback function is defined immediately afterwards, greatly improving readability.

Nest Only for Scope

We can do even better. Notice that two functions, sendGreeting and showResult, are still nested inside of the getGreeting function. Nested "inner" functions create a closure that encloses the callback function's own local variable scope, plus the variable scope of the function its nested inside of. These nested callbacks can access variables from higher up the call stack. In our example, both sendGreeting and showResult use variables that were created earlier in the getGreeting function. They can access these variable from getGreeting, because they're nested inside getGreeting and thus, enclose its variable scope.

A lot of times this is totally unnecessary. You only need to nest functions if you need to refer to variables in the scope of the caller from within the callback function. Otherwise, simply put named functions on the same level as the caller. In our example, variables can be shared by moving them to the top-level scope of the greet function. Then, we can put all our named functions on the same level. No more nesting and indentation!

Return when invoking a Callback

The last point to improve readability is more a stylistic preference, but if you make a habit of always returning from an error-handling clause, you can further minimize your code. In direct-style programming where function calls are meant to return a value, common wisdom says that returning from an if clause like this is bad practice that can lead to errors.  With continuation-passing style, however, explicitly returning when you invoke the callback ensures that you don't accidentally execute additional code in the calling function after the callback has been invoked. For that reason, many node developers consider it best practice. In trivial functions, it can improve readability by eliminating the else clause, and it is used by a number of popular JavaScript modules.  I find a pragmatic approach is to return from error handling clauses or other conditional if/else clauses, but sometimes leave off the explicit return on the last line in the function, in the interest of less code and better readability. Here's the updated example:

Compare this example with the Pyramid of Doom at the beginning of the post. I think you'll agree that these simple rules result in cleaner, more readable code and provide a great escape from the Callback Hell we started out with.

Good luck and have fun!

Monday, October 13, 2014

How Wolves Change Rivers

A beautifully filmed short video from Yellowstone National Park that reminds us of the importance of wildlife for the health of the whole planet:

How Wolves Change Rivers from Sustainable Man on Vimeo.

Wednesday, September 17, 2014

doxli - a help utility for node modules on command line

Quite often I fire up the node REPL and pull in some modules I've written to use on the command line. Unfortunately I often forget the exact way to call the various functions in those modules (there are a lot) and end up doing something like foo.dosomething.toString() to see the source code and recall the function signature.

In the interest of making code as "self-documenting" as possible,  I wrote a small utility that uses dox to provide help for modules on the command line. It adds a help() function to a module's exported methods so you can get the dox / jsdoc comments for the function on the command line.

So now will return the description, parameters, examples and so on for the method based on the documentation in the comments.

It's still a bit of a work in progress, but it works nicely - provided you actually document your modules with jsdoc-style comments.

All the info is here:

Sunday, September 7, 2014

REST API Best Practices 4: Collections, Resources and Identifiers

Other articles in this series:
  1. REST API Best Practices: A REST Cheat Sheet
  2. REST API Best Practices: HTTP and CRUD
  3. REST API Best Practices: Partial Updates - PATCH vs. PUT
RESTful APIs center around resources that are grouped into collections. A classic example is browsing through the directory listings and files on a website like When you browse the directory listing, you can click through a series of folders to download files.  The folders are collections of CentOS resource files.

In Rest, collections and resources are accessed via HTTP URI's in a similar way:

members/ -- a collection of members
members/1 -- a resource representing member #1
members/2 -- a resource representing member #2

It may help to think of a REST collection as a directory folder containing files, although its highly unlikely that the member data is stored as literal JSON files on the server. The member data should be coming from a database, but from the perspective of a REST API, it looks similar to a directory called "members" that contains a bunch of files for download.

Naming collections

In case it's not obvious already, collection names should be nouns. Use the plural form for naming collections. There's been some debate over whether collection names should be plural (members/1) or singular (member/1). The plural form seems to be most widely used.

Getting a collection

Getting a collection, like "members" may return
  1. the entire list of resources as a list of links, 
  2. partial representations of each resource, or 
  3. full representations of all the resources in the collection. 
Our classic example of browsing online directories and files uses approach #1, returning a list of links to the files. The list is formatted in HTML, so you can click on the hyperlink to access a particular file.

Approach #2, returning a partial representation (ie. first name, last name) of all resources in a collection is a more pragmatic way of returning enough information about the resources in a collection for the end user to make a selection to request further details, especially if the collection can contain a lot of resources. Actually, the directory listings on a website like display more than just the hyperlink. They include additional meta-data like the last-modified timestamp and file size, as well.  This is helpful for the end-user who's looking for an up-to-date file and wants to know how long it will take to download. It's a good example of returning just enough information about the resources for the end-user to be able to make a selection.

With approach #3, if a collection is small, you may want to return the full representation of all the resources in the collection as a big array.  For large collections, it isn't practical, however. Github is the only RESTful API example I've seen that actually returns a full representation of all resources when you fetch the collection. I wouldn't consider  #3 to be a "best practice", or recommend it for most use cases, but if you know the collection and resources will be small, it might be more effective to fetch the whole collection all at once like this.

The best practice for fetching a collection of resources, in my opinion, is #2: return a partial representation of the resources in a collection with just enough information to facilitate the selection process, and be sure to include the URL (href) of each resource where it can be downloaded from.

Only when a collection is guaranteed to be small and you need to reduce the performance impact of making multiple queries, consider bending the rules with approach #3 to return all the resources in one fell swoop.

Here's a practical example of fetching the collection of members using approach #2.


GET /members
Host: localhost:8080


HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

    "id": 1,
    "href": "/members/1",
    "firstname": "john",
    "lastname": "doe"
    "id": 2,
    "href": "/members/2",
    "firstname": "jane",
    "lastname": "doe"

In this example, some minimal information is returned about each of the members: first and last name, id, and the "href" URL where the full representation of the member resource can be downloaded.

Getting a resource

Getting a specific resource should returns the full representation of that resource from the URL that contains the collection name and the ID of the specific resource you want.

Resource IDs

RESTful resources have one or more identifiers: a numerical ID, a title, and so on. Common practice is for every resource to have a numeric ID that is used to reference the resource, although there are some notable exceptions to the rule.

Resources themselves should contain their numerical ID; the current best practice is for this to exist within the resource simply as an attribute labelled "id". Every resource should contain an "id"; avoid using more complicated names for resource identifiers like "memberID" or "accountNumber" and just stick with "id". If you need additional identifiers on a resource, go ahead and add them, but always have an "id" that acts as the primary way to retrieve the resource. So, if a member has "id" : 1, it should be fairly obvious that you can fetch his details at the URL "members/1".

An example of fetching a member resource would be:


GET /members/1
Host: localhost:8080


HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

  "id": 1,
  "href": "/members/1",
  "firstname": "john",
  "lastname": "doe",
  "active": true,
  "lastLoggedIn": "Tue Sep 16 2014 08:37:42 GMT-400 (EDT)",
  "foo": "bar",
  "fizz": "buzz",
  "qux": "doo"

Beyond simple collections

Most of the examples you see online are fairly simple, but practical data models are often much more complex.  Resources frequently contain sub-collections and relationships with other resources. API design in this area seems to be done in a mostly ad-hoc manner,but there are some practical considerations and trade-offs when designing APIs for more complex data models, which should be covered in the next post.

Thursday, August 21, 2014

Defensive Shift - Turning the Tables on Surveillance

Like many people lately, I've been pondering the implications of pervasive surveillance, "big data" analysis, state-sponsored security exploits, and the role of technology in government. For one thing, my work involves a lot of the same technology: deep packet inspection, data analysis, machine learning and even writing experimental malware. However, instead of building tools that enable pervasive government surveillance, I've built a product that tells mobile smartphone users if their device, or a laptop connected to it, has been infected with malware, been commandeered into a botnet, or come under attack from a malicious website, and so on.  I'm happy to be working on applying some of this technology in a way that actually benefits regular people. It feels much more on the "good side" of technology than on the bad side we've been hearing so much about lately.

Surveillance of course has been in the news a lot lately, so we're all familiar with the massive betrayal of democratic principles by governments, under the guise of hunting the bogeyman. It's good that people are having conversations about reforming it, but don't expect the Titanic to turn around suddenly. There's far too much money and too many careers on the line to just shut down the leviathan of pervasive surveillance overnight. It will take time, and a new generation of more secure networking technologies.

Big data has also been in the news in some interesting ways: big data analysis has been changing the way baseball is played! CBC's David Common presents the story [1]:

Not everyone is happy with the "defensive shift" - the process of repositioning outfield players based on batting stats that tell coaches how likely a batter is to hit left or right, short or long.  Longtime fans feel it takes away from the human element of the game and turns it into more of a science experiment.

I tend to agree.  And to be honest, until now deep traffic inspection, big data analysis, surveillance, and definitely state-sponsored hacking, have quite justifiably earned a reputation as, well, repugnant to any freedom-loving, democracy-living, brain-having person. Nevertheless, as powerful as big data analytics, machine learning, and network traffic analysis are, and as much as they have been woefully abused by our own governments, I don't think we've yet begun to see the potential for good that these technologies could have, particularly if they are applied in reverse to the way they're being used now.

Right now we're in a position where a few privileged, state-sponsored bad actors are abusing their position of trust and authority to turn the lens of surveillance and data analysis upon ordinary people, foreign business competitors[2], jilted lovers [3], etc.  The sea change that will, I think, eventually come is when the lens of technology slowly turns with relentless inevitability onto the government itself, and we have the people observing and monitoring and analyzing the effectiveness of our elected officials and public servants and their organizations.

How do we begin to turn the tables on surveillance?

Secure Protocols

As I see it, this "defensive shift" will happen due to several factors. First, because the best and brightest engineers - the ones who design the inner workings of the Internet and write the open-source software used for secure computing - are on the whole smart enough to know that pervasive surveillance is an attack and a design flaw [4], are calling for it to be fixed in future versions of Internet protocols [5], and are already working on fixing some of the known exploits [6].

One of the simplest remedial actions available right now for pervasive surveillance attacks is HTTPS, with initiatives like HTTPS Now[9] showing which web sites follow good security practices, and tools like HTTPS Everywhere[10], a plugin for your web browser that helps you connect to websites securely. There is still work to be done in this area, as man-in-the-middle attacks and compromised cryptographic keys are widespread at this point - a problem for which perfect forward secrecy[11] needs to become ubiquitous. We should expect future generations of networking protocols to be based on these security best practices.

Some people say that creating a system that is totally secure against all kinds of surveillance, including lawful intercept, will only give bad people more opportunity to plan and carry out their dirty deeds.  But this turns out not to be true when you look at the actual data of how much information has been collected, how much it all costs, and how effective it's actually been.  It yields practically nothing useful and is almost always a "close the barn door, the horse is out!" scenario. This, coming from an engineer who actually works in the area of network-based threat analysis, by the way.

Open Data

Second, the open data movement. Its not just you and I who are producing data-trails as we mobe and surf and twit around the Interwebs.  There's a lot of data locked up in government systems, too.  If you live in a democracy, who owns that data? We do. It's ours. More and more of it is being made available online, in formats that can be used for computerized data analysis.  Sites like the Center for Responsive Politics' Open Secrets Database [8], for example, shed a light on money in politics, showing who's lobbying for what, how much money they're giving, and who's accepting the bribes, er, donations.

One nascent experiment in the area of government open data analysis is AnalyzeThe.US, a site that let's you play with a variety of public data sources to see correlations. Warning - it's possible for anyone to "prove" just about anything with enough graphs and hand-waving. For real meaningful analysis, having some background in mathematics and statistics is a definite plus, but the tool is still super fun and provides a glimpse of where things could be going in the future with open government.


Third, automation. There's still a long way to go in this area, but even the slowness and inefficiency of government will eventually give way to the relentless march of technology as more and more systems that have traditionally been mired in bureaucratic red tape become networked and automated, all producing data for analytics. Filling in paper forms for hours on end will eventually be as absurd for the government to require as it would be for buying a book from Amazon.

With further automation and data access, the ability to monitor, analyze and even take remedial action on bureaucratic inefficiencies should be in the hands of ordinary people, turning the current model of Big Brother surveillance on its head. Algorithms will be able to measure the effectiveness of our public services and national infrastructures, do statistical analysis, provide deep insight and make recommendations. The business of running a government, which today seems to be a mix of guesswork, political ideology and public relations management, will start to become less of a religion and more of a science, backed up with real data. It won't be a technocracy - but it will be leveraging technology to effectively crowd-source government.  Which is what democracy is all about, after all.

[7] AnalyzeThe.US

Thursday, August 14, 2014

Repackaging node modules for local install with npm

If you need to install an npm package for nodejs from local files, because you can't or prefer not to download everything from the repo, or you don't even have a network connection, then you can't just get an npm package tarball and do `npm install <tarball>`, because it will immediately try to download all it's dependencies from the repo.

There are some existing tools and resources you can try:

  • npmbox -
  • bundle.js gist -
  • relevant npm issue -

I found all of these a bit over-wrought for my taste. So if you prefer a simple DIY approach, you can simply edit the module's package.json file, and copy all of its dependencies over to the "bundledDependencies" array, and then run npm pack to build a new tarball that includes all the dependencies bundled inside.

Using `forever` as an example:
  1. make a directory and run `npm init; npm install forever` inside of it
  2. cd into the node_modules/forever directory
  3. edit the package.json file
  4. look for the dependencies property
  5. add a bundledDependencies property that's an array
  6. copy the names of all the dependency modules into the bundledDependencies array
  7. save the package.json file
  8. now run `npm pack`. It will produce a forever-<version>.tgz file that has all it's dependencies bundled in.

Thursday, May 29, 2014

JavaScript's Final Frontier - MIDI

JavaScript has had an amazing last few years. Node.JS has taken server-side development by storm. First person shooter games are being built using HTML and JavaScript in the browser. Natural language processing and machine learning are being implemented in minimalist JavaScript libraries. It would seem like there's no area in which JavaScript isn't set blow away preconceptions about what it can't do and become a major player.

There is, however, one area in which JavaScript - or more accurately the web stack and the engines that implement it - has only made a few tentative forays.  For me this represents a final frontier; the one area where JavaScript has yet to show that it can compete with native applications. That frontier is MIDI.

I know what you're probably thinking. Cheesy video game soundtracks on your SoundBlaster sound card. Web pages with blink tags and bad music tracks on autoplay. They represent one use case where MIDI was applied outside of its original intent. MIDI was made for connecting electronic musical instruments, and it is still very much alive and well. From lighting control systems to professional recording studios to GarageBand, MIDI is a key component of arts performance and production. MIDI connects sequencers, hardware, software synthesizers and drum machines to create the music many people listen to everyday. The specification, though aging, shows no signs of going away anytime soon. It's simple and effective and well crafted.

It had to be. Of all applications, music could be the most demanding. That's because in most applications, even realtime ones, the exact timing of event processing is flexible within certain limits. Interactive web applications can tolerate latency on their network connections. 3D video games can scale down their frames per second and still provide a decent user experience. At 30 frames per second, the illusion of continuous motion is approximated. The human ear, on the other hand, is capable of detecting delays as small as 6 milliseconds. For a musican, latency of 20ms between striking a key and hearing a sound, would be a show-stopper. Accurate timing is essential for music performance and production.

There's been a lot of interest and some amazing demos of Web Audio API functionality.  The Web MIDI API, on the other hand, hasn't gotten much support.  Support for Web MIDI has landed in Chrome Canary, but that's it for now.  A few people have begun to look at the possibility of adding support for it in Firefox.  Until the Web MIDI API is widely supported, interested people will have to make due with the JazzSoft midi plugin and Chris Wilson's Web MIDI API shim.

I remain hopeful that support for this API will grow, because it will open up doors for some truly great new creative and artistic initiatives.