New York Times Lab Introduces Visual Stream Processing Tool
The New York Times R&D Lab has released streamtools, a general purpose, graphical tool for dealing with streams of data.
Mike Dewar, the streamtools project lead at NY Times Lab describes the motivation as follows:
Over the last 20 years we have invested heavily in tools that deal with tabulated data, from Excel, MySQL, and MATLAB to Hadoop, R, and Python+Numpy. These tools, when faced with a stream of never-ending data, fall short and diminish our creative potential. In response to this shortfall we have created streamtools.
- A block perfoms some operation on each message it recieves, and that operation is defined by the block's type.
- Each block has zero or more rules which define that block's behaviour.
- Each block has a set of named routes that can recieve data, emit data, or respond to queries.
- Blocks can be connect via routes, using connections.
- A collection of connected blocks is called a pattern, and it is possible to export and import whole patterns from a running instance of streamtools via a JSON formatted descriptor document.
We chose Go because it lets us write code that's very close to the idiom we're trying to present to the user. So every block is its own goroutine, every connection is a pair of channels. It makes for a very straightforward abstraction, which we hope will let people understand the systems they're building. This also means that writing new blocks is really simple, which we hope will encourage the community to make the blocks they find useful. Go also allows us to write safe, performant code, which is great for our day-to-day work at The Times.