Sunny.js, a Cloud Library for Node.js

Sunny.js

Node.js provides a great environment for cloud applications and development. Node’s asynchronous design and great HTTP/REST support provide the building blocks for architecting non-blocking and scalable applications in the cloud.

There are a lot of great Node cloud clients out there for cloud datastores. Amazon Web Services S3 has: knox, node-sissy, node-s3, Node.js-Amazon-S3, and Rackspace Cloud Files has: node-cloudfiles.

After reviewing these libraries, I found a few features variously lacking that I would like in an ideal cloud datastore client:

Event-based interface.
“One-shot” requests wherever possible.
Able to create / delete arbitrary containers (buckets).
Configurable request headers, cloud options and metadata.
SSL support.

Additionally, it would be nice to have a library that worked across multiple clouds like jclouds for Java and Libcloud for Python.

With these goals in mind, I put together a basic multi-cloud datastore client called Sunny.js. Sunny initially supports Amazon Web Service’s Simple Storage Service (S3) and Google Storage for Developers (version 1). Sunny hits all the points above, and also has:

A really well-documented API and user guide. (At least I think so!)
Full unit tests for all cloud operations and providers.

Here are some basic resources to get started with Sunny, which we’ll dive into a little deeper in the subsequent sections of this post:

Sunny.js Documentation: Guides and API documentation.
- User Guide: Guide for programmers using Sunny in your applications and library.
- Developer’s Guide: Guide for developing, extending, and hacking on the Sunny.js source code.
- API: Full API.
Sunny.js GitHub Page: Source repository and issue tracker.
Sunny.js NPM Page: NPM page and history.

Getting Started

First, install sunny from npm:

$ npm install sunny

Once Sunny is installed, we need to set up some configuration information for our Node program. We can provide cloud account and option information to Sunny either via the process environment or a JavaScript object / file. For simplicity, we’ll just export some environment variables (see the user guide for more on configuration and using files).

$ export SUNNY_PROVIDER=("aws"|"google")
$ export SUNNY_ACCOUNT="ACCOUNT_NAME"
$ export SUNNY_SECRET_KEY="ACCOUNT_SECRET_KEY"
$ export SUNNY_SSL=("true"|"false")

From here, we can create our Node program, and get a live cloud datastore connection as follows:

var sunny = require("sunny"),
  conn = sunny.Configuration.fromEnv().connection;

Cloud Operations

Now that we have our cloud connection object (conn), we can perform all of our cloud operations on containers and blobs. For those used to Amazon Web Services parlance, a Sunny container is equivalent to an S3 “bucket”, and a Sunny blob is an S3 “key”.

Sunny cloud operations are asynchronous and event-based. The cloud methods return either a request or a stream object, that is then used to set listeners and callbacks on, before calling end() and starting the underlying real cloud network operation.

Requests

Sunny request objects are not true Node request objects, but approximate a subset of a Node HTTP request object events. The most common request events are:

“error”: The underlying real cloud operation failed.
“end”: The operation finished and we have results.
“data”: A GET request returned a data chunk.

Let’s perform a basic asynchronous operation to get a container named “sunny” from our cloud store.

var request = conn.getContainer("sunnyjs", { validate: true });
request.on('error', function (err) {
  console.log("We received error: %s", err);
});
request.on('end', function (results, meta) {
  var container = results.container;
  console.log("Retrieved container \"%s\".", container.name);
});
request.end();

which gives us the output:

Retrieved container "sunnyjs".

Breaking the code down, we call getContainer with the container name and the option validate: true. The validate option means we actually perform a cloud request to check the container exists before continuing. This is a good check to start with before launching in to other code. But, Sunny allows the programmer to choose to not validate, and wait for the first blob request to actually perform any network operation, which is faster.

The getContainer method returns a request object, which we then set our listeners on for “error” and “end” (when we have results). Finally, the call to request.end() initiates the actual network operation.

Our “end” callback takes a results parameter which contains a container method for further use with blob methods, and a meta object with information from the actual cloud call (metadata, HTTP headers, etc.) See the getContainer documentation for more options, results and uses.

The other request-based Sunny methods work in a similar fashion, and include the following:

Connection: Get a list of containers. Get/create/delete a single container.
Container: Get/create/delete a single container. Get a list of blobs. Head/delete a single blob. Get/put blob to/from a file.
Blob: Head/delete a single blob. Get/put blob to/from a file.

See the API for full method details.

Streams

Sunny returns stream objects for data-based cloud operations (PUT or GET), which are real implementations of Node streams. A GET blob method returns a Readable Stream object, and a PUT blob method returns a Writable Stream object.

Let’s take the container object we received from our successful getContainer request above, and perform a PUT blob operation with a simple string. (Note: programmatically, this would be within the “end” callback of the get container request). Data can be written with any number of write() calls and a single end() call (which starts the data transfer and ignores all subsequent writes).

var writeStream = container.putBlob("my-blob.txt");
writeStream.on('error', function (err) {
  console.log("We received error: %s", err);
});
writeStream.on('end', function (results, meta) {
  var blob = results.blob;
  console.log("Put blob \"%s\".", blob.name);
});
writeStream.write("My data string.");
writeStream.end();

which produces:

Put blob "my-blob.txt".

Now, in a later operation (say, within the putBlob “end” callback, after we know the blob was written), we can retrieve the data:

var readStream = container.getBlob("my-blob.txt");
readStream.on('error', function (err) {
  console.log("We received error: %s", err);
});
readStream.on('data', function (chunk) {
  console.log("Data chunk: \"%s\"", chunk);
});
readStream.on('end', function (results, meta) {
  var blob = results.blob;
  console.log("Get blob \"%s\".", blob.name);
});
readStream.end();

which gives us:

Data chunk: "My data string."
Get blob "my-blob.txt".

Note that we’re getting the data in raw chunks from the “data” event, which is somewhat tedious to cobble together (as strings if encoded or Buffer objects otherwise). Given that we have full read / write stream implementations, Sunny really shines with the availability of pipe()‘ing data. Essentially, you can simply connect a Sunny read / write stream to another read / write stream, call pipe() and let Node and Sunny take care of the rest.

For example, Sunny provides helper methods for getting / putting blobs to and from files. Looking at the source code, the real gist of, for example, getting a blob to a local file is the following:

var getStream = container.getBlob("name", /* ... */);
var writeStream = fs.createWriteStream(filePath, /* ... */);

// Start the pipe and call 'end()'.
getStream.pipe(writeStream);
getStream.end()

There’s a little more to it, but it is mostly this easy. Take a look at the source code for the blob convenience methods putFromFile and getToFile to see a full implementation that binds together file and cloud streams.

Node streams available for use with Sunny get / put pipe() calls include:

HTTP/HTTPS requests and responses. See my sunny-proxy project for a simple web server that proxies web requests to cloud blobs using Sunny blob get streams piped to HTTP responses.
File reads and writes – fs.ReadStream, fs.WriteStream

Cloud Headers and Metadata

Sunny abstracts common cloud provider header and metadata. For example, AWS S3 has a metadata header prefix of “x-amz-meta-“, while Google Storage has an analogous one of “x-goog-meta-“.

Sunny cloud operations return a meta object that looks like:

{
  headers: {
    /* HTTP headers. */
  },
  cloudHeaders: {
    /* Cloud-specific HTTP headers (e.g., "x-amz-"). */
  },
  metadata: {
    /* Metadata headers (e.g., "x-amz-meta-"). */
  }
}

with cloud-specific prefixes stripped out. Moreover, you can set cloud headers in the same manner by passing any of the fields above (“headers”, “cloudHeaders”, “metadata”) in the options object of a cloud request. In this manner, Sunny makes it easy to utilize metadata operations as part of your application, while remaining agnostic to your actual cloud provider. See the Sunny user guide for more information.

Error Handling

For programmatic ease, and to better abstract across cloud providers, Sunny wraps cloud-based errors with a custom class, that adds in some additional useful attributes. The CloudError class has the following members:

message: The error message (usually in XML). (from Error).
statusCode: HTTP status code, if any.

and error type methods:

isNotFound: Container / blob does not exist.
isInvalidName: Invalid name requested for container / blob.
isNotEmpty: Container cannot be deleted if blobs still exist within it.
isAlreadyOwnedByYou: Creating a container that is already owned by you.
isNotOwner: AWS and Google Storage have a global namespace, so if someone else owns the container name requested, your operations will fail.

Ultimately, Sunny “error” event listeners will get CloudError objects instead of raw Error objects on cloud method errors, which can be used to more easily programmatically react (especially since Sunny correctly handles different error codes and XML from cloud providers that correspond to the same type of error).

Putting it all Together, with a Little Help from Async.js

Now that we’ve seen how to create a Sunny connection, get a container and perform operations on blobs within the container, let’s put all of our code together in a single script. (Note: I’ve collapsed the error handling to a single utility function).

var sunny = require("sunny"),
  conn = sunny.Configuration.fromEnv().connection,
  errFn = function (err) { console.log(err); };

// Op 1: GET container.
var request = conn.getContainer("sunnyjs", { validate: true });
request.on('error', errFn);
request.on('end', function (results, meta) {
  var container = results.container;
  console.log("Retrieved container \"%s\".", container.name);

  // Op 2: PUT blob.
  var writeStream = container.putBlob("my-blob.txt");
  writeStream.on('error', errFn);
  writeStream.on('end', function (results, meta) {
    var blob = results.blob;
    console.log("Put blob \"%s\".", blob.name);

    // Op 3: GET blob.
    var readStream = container.getBlob("my-blob.txt");
    readStream.on('error', errFn);
    readStream.on('data', function (chunk) {
      console.log("Data chunk: \"%s\"", chunk);
    });
    readStream.on('end', function (results, meta) {
      var blob = results.blob;
      console.log("Get blob \"%s\".", blob.name);
    });
    readStream.end();
  });
  writeStream.write("My data string.");
  writeStream.end();
});
request.end();

Not too shabby! However, you’ll note that we have three serial cloud operations, each of which have to complete before the next starts, so we end up repeatedly nesting callbacks in “end” event listeners. Even with a mere three operations, the nesting makes the code hard to understand at first blush, and an indentation nightmare. Wouldn’t it be better if we could code our operations to match the general outline of what we’re trying to do? That is:

Get a container.
Once we have a container, put data to a blob.
Once we have put data to the blob, get the data back.

Fortunately, there are many asynchronous helper libraries for Node to help in just this situation. I like Async.js, which from the project page “provides around 20 functions that include the usual ‘functional’ suspects (map, reduce, filter, forEach…) as well as some common patterns for asynchronous control flow (parallel, series, waterfall…).”

In our simple three-cloud operation script, we just need async.series to serialize our operations for easier control flow. I am skipping over a lot of details about using Async.js, such as callbacks at the end of functions or errors, and passing state across serialized functions, but you can probably intuit most of how things are working here:

var async = require("async"),
  sunny = require("sunny"),
  conn = sunny.Configuration.fromEnv().connection,
  container,
  blob;

async.series([
  // Op 1: GET container.
  function (callback) {
    var request = conn.getContainer("sunnyjs", { validate: true });
    request.on('error', callback);
    request.on('end', function (results, meta) {
      container = results.container;
      console.log("Retrieved container \"%s\".", container.name);
      callback(null);  // Signal function "done" to async.series.
    });
    request.end();
  },

  // Op 2: PUT blob.
  function (callback) {
    var writeStream = container.putBlob("my-blob.txt");
    writeStream.on('error', callback);
    writeStream.on('end', function (results, meta) {
      blob = results.blob;
      console.log("Put blob \"%s\".", blob.name);
      callback(null);  // Signal function "done" to async.series.
    });
    writeStream.write("My data string.");
    writeStream.end();
  },

  // Op 3: GET blob.
  function (callback) {
    var readStream = container.getBlob("my-blob.txt");
    readStream.on('error', callback);
    readStream.on('data', function (chunk) {
      console.log("Data chunk: \"%s\"", chunk);
    });
    readStream.on('end', function (results, meta) {
      blob = results.blob;
      console.log("Get blob \"%s\".", blob.name);
      callback(null);  // Signal function "done" to async.series.
    });
    readStream.end();
  }
], function (err, result) {
  if (err) {
    console.log(err);
  } else {
    console.log("async.series is done!");
  }
});

which gives us our expected output:

Retrieved container "sunnyjs".
Put blob "my-blob.txt".
Data chunk: "My data string."
Get blob "my-blob.txt".
async.series is done!

Our code now appears in a serialized, contained order that matches the conceptual execution of the actual cloud operations, instead of a lot of callback nesting.

For more on developing with Async.js, please see the excellent Async.js Readme / tutorial. It is an investment well worth your time. Sunny internally uses Async.js extensively for unit tests. Explore the source tests directory for a lot of good use cases (parallel and serial).

Have Fun in the Sun

That wraps up our introduction to Sunny.js. Hopefully the library provides a useful interface for your cloud programming needs.

The project is still very much in an early development state. See the Sunny “to do” list to get a better idea of what features and fixes are coming down the pipeline. The big ticket items I would really like to see added are:

More cloud providers, notable Rackspace Cloud Files and OpenStack Storage.
A basic retry function wrapper to handle expected, occasional cloud operation failures and throttling.
Advanced cloud operation support for ACL’s, policies, etc.

Help, bug reports, and pull requests for the project are most welcome. Please review the developer’s guide, and contact me with any questions or comments.

Loose Bits Thoughts on distributed systems, cloud computing, and the intersection of law and technology.