Better Data Slinging with Node.js Readable/Writable Streams and Pipes
Node.js Streams
Node.js provides an extensible and fast platform for web servers, proxies, and middle-tier services. Node.js applications often utilize some transformation from one data format (e.g., a database or cloud store) to another (e.g., an HTML or JSON page).
Most folks are familiar with the callback-style of hooking together various JavaScript data components in a Node.js program. However, an often overlooked and very powerful data binding abstraction for Node.js is found in the stream class.
Streams are an abstract interface for data objects in Node.js which can be
readable and/or writable. They can be hooked from one to another in
a similar style to Unix pipes – in fact, the stream operation we’ll mostly
focus on here is the not-coincidentally-named pipe()
. Some potential
advantages of stream pipes over other binding styles include:
- Often much less code for the actual binding (can just call
pipe()
). - Streams can handle pausing / resuming of data flows. Implementing classes, however, have to implement this logic internally if supported.
- Don’t have to set specific callbacks or listeners for intermediate data
events – just
pipe()
the stream and forget it. - Avoiding buffering by processing data and re-emitting it directly to another stream (unless all of the data is required in one chunk).
- Compatible with the many Node.js core modules that already implement streams, including HTTP, HTTPS, and file and process I/O.
For a brief example, we can create a download client to retrieve a web page and write it to a file as follows (ignoring error handling):
Notice that we didn’t set any explicit “data
” listeners or buffer any of
the data, even if it came in as chunks. We simply pipe()
‘ed it with our
two stream objects: response
and outStream
. The output of response
is
hooked to the input of outStream
and we’re done.
More importantly (as we’ll get to below), we can add many more pipe()
calls
to do other transformations / data-slinging inline to our chained call.
Ultimately, it just takes a little bit of glue code to hook together data flows
in a terse and efficient manner with streams.
The Stream Interfaces
So how do we do this for our own classes and objects?
The Node.js streams documentation offers the full rundown of how to implement the interfaces, but we’ll give an abbreviated version here to get going. All of the code examples discussed in this post are available as a GitHub gist.
Readable Streams
Readable Streams must emit “data
” events whenever they have
data to be read and “end
” when the data stream is finished. The implementing
constructor should also set this.readable = true
. The interface
additionally provides a lot more implementer flexibility for things like
pausing and resuming a stream, as well as resource management and cleanup.
Writable Streams
Writable Streams must implement the write()
method to
accept new data chunks into the stream and the end()
method to instruct the
stream that the data is finished. The implementing constructor should also set
this.writable = true
.
Putting it Together
Let’s take a look at a simple example of a custom-implemented readable and writable stream that simply passes through data – data input is simply emitted unchanged as output.
This is essentially the bare minimum for a readable and writable stream class.
Not too much work! And for more complicated streams, we can simply augment
write
/data
and end
/end
to do whatever data transformations we want.
We can now take the web scraping example from above and add the pass-through stream in the middle with the same effect – we still get a file written to output.
Checking the output file (“out.txt”), we can see the download results are the same as our original example.
And in fact, we could even reuse the pass-through stream multiple times to
illustrate repeated pipe()
data flows (although there’s absolutely no
practical sense to the following):
So that’s the basics. Let’s look at creating something a tad more useful.
Let’s Upper-case Some Data!
The UpperCaseStream
class takes a data source (in string or Buffer
format) and converts string data into upper case letters. Not ultimately
that useful or extensible, but it’s definitely a data transformation that can
illustrate the ease of creation and use of a custom stream.
We mostly take our simple pass-through stream above and add a custom
_transform
helper method to transform the data in either a write()
or
end()
call, then re-emit the upper-cased data in a data
event.
To test things out, we can take an input file (“input.txt”), read it in, upper case all text, then write it out to “out.txt” using three streams.
The resulting output file is now uppercased! All in all, not that amazing, but considering the ease of our implementation, other (more useful) read/write stream applications could include:
- XML to JSON conversion.
- Unzipping zipped data.
- Image resizing.
- … any other transformations you’d like to use with your existing streams.
Conclusion
Streams provide a great means of binding together lots of data in a sane and manageable way. Beyond the core library documents, there are a lot of great stream introductions for further reference: