Sunny.js Presentation and Demo Server

20 Oct 2011

sunny node.js heroku cloud amazon aws s3 google storage meetup

Sunny.js Presentation

I gave a lightning talk at last night’s Node.js Meetup introducing Sunny.js, a multi-cloud library for Node.js. For more background on Sunny, see my previous blog post to get started with the library.

Instead of the usual PowerPoint deck, I used Prezi, which I have been wanting to try out for some time. Here is the Sunny.js prezi I gave at the meetup:

Although the Prezi took some time to put together (especially being my first time using the service), I was very happy with the appearance and flow of the presentation.

Simple Cloud/Web Proxy Server

As part of my presentation, I demo’ed a simple proxy service using Sunny and Node. I took the v0.0.1 HTML documentation for Sunny, and uploaded all of it to an Amazon S3 bucket, being careful to preserve paths from the original documentation files when naming S3 keys. (Actually, this wasn’t hard at all – I just used the reliable CyberDuck S3 client).

I then wrote a really simple 40-line web server. It basically takes credentials from the process environment, checks if a specified container exists, and then translates web request GET paths to blob names in the container I set up. The result is that I could serve my entire documentation site straight from by Amazon S3 bucket!

The source for the project, sunny-proxy, is available on GitHub, and is fairly straightforward to setup for yourself using the Readme instructions.

Sunny.js, a Cloud Library for Node.js

16 Oct 2011

sunny node.js cloud amazon aws s3 google storage

Sunny.js

Node.js provides a great environment for cloud applications and development. Node’s asynchronous design and great HTTP/REST support provide the building blocks for architecting non-blocking and scalable applications in the cloud.

There are a lot of great Node cloud clients out there for cloud datastores. Amazon Web Services S3 has: knox, node-sissy, node-s3, Node.js-Amazon-S3, and Rackspace Cloud Files has: node-cloudfiles.

After reviewing these libraries, I found a few features variously lacking that I would like in an ideal cloud datastore client:

Event-based interface.
“One-shot” requests wherever possible.
Able to create / delete arbitrary containers (buckets).
Configurable request headers, cloud options and metadata.
SSL support.

Additionally, it would be nice to have a library that worked across multiple clouds like jclouds for Java and Libcloud for Python.

With these goals in mind, I put together a basic multi-cloud datastore client called Sunny.js. Sunny initially supports Amazon Web Service’s Simple Storage Service (S3) and Google Storage for Developers (version 1). Sunny hits all the points above, and also has:

A really well-documented API and user guide. (At least I think so!)
Full unit tests for all cloud operations and providers.

Here are some basic resources to get started with Sunny, which we’ll dive into a little deeper in the subsequent sections of this post:

Sunny.js Documentation: Guides and API documentation.
- User Guide: Guide for programmers using Sunny in your applications and library.
- Developer’s Guide: Guide for developing, extending, and hacking on the Sunny.js source code.
- API: Full API.
Sunny.js GitHub Page: Source repository and issue tracker.
Sunny.js NPM Page: NPM page and history.

Pivot Faceting (Decision Trees) in Solr 1.4.

20 Sep 2011

search solr facet pivot hacks

Solr faceting breaks down searches for terms, phrases, and fields in the Solr into aggregated counts by matched fields or queries. Facets are a great way to “preview” further searches, as well as a powerful aggregation tool in their own right.

Before Solr 4.0, facets were only available at one level, meaning something like “counts for field ‘foo’” for a given query. Solr 4.0 introduced pivot facets (also called decision trees) which enable facet queries to return “counts for field ‘foo’ for each different field ‘bar’” – a multi-level facet across separate Solr fields.

Decision trees come up a lot, and at work, we need results along multiple axes – typically in our case “field/query by year” for a time series. However, we use Solr 1.4.1 and are unlikely to migrate to Solr 4.0 in the meantime. Our existing approach was to simply query for the top “n” fields for a first query, then perform a second-level facet query by year for each field result. So, for the top 20 results, we would perform 1 + 20 queries – clearly not optimal, when we’re trying to get this done in the context of a blocking HTTP request in our underlying web application.

Hoping to get something better than our 1 + n separate queries approach, I began researching the somewhat more obscure facet features present in Solr 1.4.1. And after some investigation, experimentation and a good amount of hackery, I was able to come up with a “faux” pivot facet scheme that mostly approximates true pivot faceting using Solr 1.4.1.

We’ll start by examining some real pivot facets in Solr 4.0, then look at the components and full technique for simulated pivot facets in Solr 1.4.1.

Pivot Faceting in Solr 4.0

Pivot facets were added to Solr in SOLR-792. A good introductory article is available on the Solr.pl site. To see the basic operation in action, let’s just use the “example” setup that comes with the Solr 4.0 distribution (located at “solr_4.0_path/solr/example”).

Loose Bits Thoughts on distributed systems, cloud computing, and the intersection of law and technology.