Precog: Big Data Analytics as a Service
Precog has recently announced a Big Data warehousing and analysis service which takes care of the data capture, storage, transformation, analysis and visualization process and the infrastructure on which it runs, but leaving open various access points throughout the service via RESTful APIs enabling developers and data scientists to control the entire process.
Precog captures input data from a variety of sources including SQL databases, Amazon S3, Hadoop, MongoDB, client-side web applications, and back-end servers. A RESTful API enables developers to capture data from external sources such as Twitter or Facebook, or from CSV files or mobile devices. The data is then stored in a custom database called PrecogDB, and can be enriched with various attributes – demographics, sentiment, location and others.
Precog runs the entire process on a combination of cloud providers - Amazon EC2 and SoftLayer – to increase resilience and uptime.
In an interview for InfoQ, John A. De Goes, CEO and Founder of Precog, explained that the “architecture [of the system] has some similarities to the architecture of analytical databases, including column-oriented storage, but differs in supporting fully heterogeneous and denormalized data, and in supporting Quirrel, the "R for big data" language that lets you easily perform much more advanced calculations than you can with an analytical RDBMS.”
At the heart of the platform is PrecogDB, a columnar database written in Scala and running on the JVM, optimized for data capture and analysis. PrecogDB stores “measured data, such as clicks, purchases, measurements, tweets, and other kinds of activities, which collectively form a journal of historical activity,” according De Goes, who added: “Precog cannot yet store huge blobs of unstructured data, as is required for applications in bioinformatics and other areas, but this capability is in the roadmap.”
Regarding Quirrel, the statistical query language implemented by Precog, De Goes said: “In many respects, Quirrel is similar to the R programming language. Like R, Quirrel is designed for advanced analytics and statistics. Unlike R, Quirrel is not a Turing complete language, and it is purely declarative, which makes it possible to efficiently distribute Quirrel queries across massive clusters of machines (this also makes Quirrel much easier to learn than R)."
PrecogDB has “built-in routines for performing common analytical and statistical computations,” and a “granular, capability-based security model, which allows PrecogDB to be accessed by REST API directly from mobile devices and web applications.”
Ronny Kohavi Dec 12, 2013