After gestating for more than a year on GitHub, the project Streams has now been adopted by WHATWG in an effort to standardize a web streaming API. The project is led by Domenic Denicola, the man that started the work on Promises, currently part of the upcoming ECMAScript 6.
The purpose of the API is to provide mapping to low level I/O primitives for “creating, composing, and consuming streams of data.” These are meant to be raw streams that can be used as foundation for other higher level APIs, such as File I/O, Socket, multimedia or inter-process communication. The main reason behind streams is that one should be able to access and obtain data on the web as needed, without having to fetch it entirely in memory.
The standard proposes three types of streams: Readable, Writable, and Transform. While the use cases for readable and writeable streams are obvious, the transform streams could be used for encrypting/decrypting data on the fly, compressing/decompressing images, or applying filters to videos.
Streams operate with Chunks of data, which represent the basic unit of data that a stream manipulates. A chunk may contain binary or text data, and a stream can combine chunks of different types of data. It is currently debated if a stream should be allowed to contain object data or not.
Streams are wrappers around underlying sources. There are two types of such sources for readable streams: push sources – pushing data to the consumer, along with a mechanism for pausing and resuming the pushing process-, and pull sources – which provide data in a synchronous or asynchronous mode based on consumer requests. Streams provide a unified interface on top of both types of sources.
A producer sends data to a writable stream which is a wrapper around an underlying sink. The stream queues successive writes, passing them to the sink one by one.
Transform streams work by piping a readable stream to a writable one. Multiple transform streams can be piped together, forming a pipe chain. The pipe uses a backpressure mechanism based on signals to keep all streams informed if one of the streams in the chain is overloaded. Each stream uses a buffering approach to handling data, having chunks of data kept in a queue until their time has come to be transferred to the consumer or the underlying sink. The backpressure mechanism is based on these queues and a queuing strategy. An example of such a strategy consists in generating backpressure whenever one of the streams has three or more chunks in the queue.
The standard comes with a reference implementation and tests written in ECMAScript 6 and transpiled to ECMAScript 5 using Traceur. There is incipient work on adding Streams support to Chromium.
We have talked to Denicola to find out more details about the project.
InfoQ: I understand that streams are useful, but they are already everywhere on the web. Why a new streaming API?
DD: Streams already underlie much of the web, but unfortunately they're not exposed to developers at all. For example, when you do an XMLHttpRequest, there is no way to get the response data without it all being stored in memory at once. This means that even though the browser might be using streaming techniques for implementing e.g. streaming <video>, there's no way that developers could do that themselves using XMLHttpRequest. The idea of the Streams Standard is to provide a common JavaScript data type that specifications can start returning to expose the underlying streams they are probably already using, but not yet letting developers access.
InfoQ: Is this intended to be included in a future ECMAScript version?
DD: That's an interesting question, and in fact we're going to talk about that at the next TC39 meeting [1]. I'm torn on this. On the one hand, I/O streams are a generally useful abstraction, and the spec is written in an environment-agnostic way. Various languages with large standard libraries, like C++, Java, and C#, do have streams built in. On the other hand, the ECMAScript standard library has historically been very small, focused on language-level primitives and not concerning itself with things like I/O. So adding something like streams would be unprecedented and a bit disruptive.
My current thinking is that it makes the most sense for the spec to stand alone and not be incorporated into the JavaScript language spec---but I can envision implementations existing in many JS environments, not just browsers. For example, the ECMA-402 spec on internationalization is entirely standalone, but is implemented across many JS environments. The WHATWG URL Standard [3] was developed after Node.js's URL-parsing libraries, but I am hopeful that Node's implementation can be updated to match the URL Standard over time. And even something like the Fetch Standard [4] could in theory be implemented in many different JS environments, as a standard way of performing HTTP requests.
[1]: https://github.com/tc39/agendas/blob/master/2014/11.md
[2]: http://www.ecma-international.org/publications/standards/Ecma-402.htm
[3]: http://url.spec.whatwg.org/
[4]: https://fetch.spec.whatwg.org/
InfoQ: There is incipient work on Streams for Blink. Are you aware of someone else being interested in it?
DD: Yes, definitely! I've been in active talks with both Microsoft and Mozilla, who are very interested in having streams in their browsers. It's still somewhat early days, but there's a lot of support.