Teradata Announces New Software for Real-Time Analysis of Internet of Things Data
At its 2015 Partners User Group Conference, Teradata announced two new software capabilities for real-time ingestion and analysis of massive streams of Internet of Things (IoT) data. The Teradata Listener software enables "listening" to multiple, diverse IoT data streams in real time, then propagating the data to multiple analytic platforms. The new Teradata Aster Analytics on Hadoop software provides scalable analysis of massive IoT data streams using Teradata Aster Analytics.
The Listener software is engineered using a combination of open source software (including Kafka, Cassandra, Elasticsearch, and Mesos) and custom Teradata software based on Docker, microservices, and RESTful APIs. Listener users configure the incoming data streams, internal data pipelines, and output data destinations using a GUI. Monitor dashboards provide a complete and transparent picture of the ongoing processing. The Listener software also includes RESTful APIs that enable development of custom monitors, reports, and analyses. While full documentation of the APIs is not available for the current beta version of Listener, the Listener Curl Script blog post provides an indication of how the APIs will be accessed.
While the new Teradata platform is similar in ways to Elastic's ELK stack, the technologies differ somewhat in the specifics of the problems they address. Listener embeds Elasticsearch, provides transportation pipeline capabilities like Logstash, and includes Kibana-like data flow monitoring and visualization components. The problem the new Teradata platform solves that is not addressed by the ELK stack is the parallelization of mathematical algorithms that require all data to be simultaneously visible to an algorithm in order for a correct final result to be produced. This is the innovation that Teradata Aster Analytics on Hadoop brings: big data scalability that can be applied for analyzing massive amounts of incoming IoT data using any conceivable algorithm.
Traditionally, analytics tools have not been designed to run in a distributed environment like Hadoop, because many of the analyses the tools provide require producing an answer that represents the entire input data set. If the data and analyses are spread across multiple servers, each running a separate copy of the analytics software, multiple results will be returned, with conceivably (depending on the type of analysis) no mathematically valid means for merging the results into a single correct result. While statistical methods may be available to estimate a correct value based on an aggregation of computations performed on subsets of the data, the actual correct value for the entire input data set is not computable. If the actual correct value is needed, rather than an estimate within a window of error, the analytics cannot be parallelized using traditional methods.
Aster Analytics on Hadoop solves this problem by directly integrating the Aster Analytics software into Hadoop. Aster processing engines (called "vWorkers") are provisioned and managed by the Hadoop YARN data operating system. Because Aster is running as a native process in Hadoop, it can access data across an entire Hadoop Distributed File System (HDFS), thus solving the problem of multiple or incorrect analytic results. Meanwhile, running Aster Analytics within Hadoop solves the problem of analytics not being scalable.
A beta version of Teradata Listener is currently available and the company plans to release the production version in the first quarter of 2016. It has scheduled Teradata Aster Analytics on Hadoop for release in the second quarter of 2016. You can run the software in your own data center or on a cloud platform.