Splice Machine version 1.0 supports analytic window functions and integration with Hadoop ecosystem. Splice Machine team recently released their Hadoop based RDBMS data management solution that can be used for transactional workloads on Hadoop.
Google announced earlier this year their Cloud Dataflow, a service and SDK for processing large amounts of data in batches or real time. Now they have open sourced the Dataflow Java SDK, enabling developers to see how it works and possibly use the SDK for services running on-premises or in other clouds.
LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.
An agile view of Big Data, wherein data is viewed as a real time stream, offers a new look at how data is managed. Using an agile data infrastructure, organizations can conquer Big Data challenges with a level of ease, flexibility and performance. White paper by codeFutures describes the Agile view of Big Data.
At the 2014 QCon San Francisco conference, LinkedIn's Lin Qiao gave a talk on their Gobblin project (also summarized in a blog post) that is a unified data ingestion system for their internal and external data sources.
MapR Technologies, provider of the Apache Hadoop distribution, has open sourced their MapR-DB NoSQL database for unlimited production use. MapR-DB is a Wide Column NoSQL database with native integration to Hadoop and support for strong consistency and ACID transactions.
GridGain's In-Memory Data Fabric entered Apache Incubator last October under the name of Apache Ignite. The company donated its flagship in-memory computing platform to the Apache Software Foundation with the intention of attracting external developers and growing a viable community around its core technology.
At the StrataHadoop conference in Barcelona last week, Rod Smith, Vice President of the IBM Emerging Internet Technologies organization, presented work on an internal product they have been developing in their consulting work with clients that integrates data sources, and data analysis.
At the recent GOTO conference in Berlin, Mahout committer Sebastian Schelter outlined recent advances in Mahout's ongoing effort to create a scalable foundation for data analysis that is as easy to use as R or Python.
Yesterday concluded the second day of the Web Summit in Dublin, Ireland. We see what happened and what is new from last day at the event.
Web Summit, one of the largest technology conferences in Europe opened up today. Famous people from the technology and business world are expected to talk, like Peter Thiel, Drew Houston and Anna Patterson.
In their first Forrester Wave: NoSQL Key-Value Databases, released in Q3 2014, Forrester has evaluated the most popular NoSQL database offerings.
The 2.0 version of the Splunk C# SDK is heavily invested in modern C# features. Every major operation from login-onwards is available via asynchronous methods. And for most advanced uses such as sampling, Reactive Extensions come into play.
Splunk’s user conference has drawn to a close. After three days with over 160 sessions ranging from security and operations to business intelligence to even the Internet of Things, the same central theme kept appearing over and over again: the key to Big Data is machine learning.
A common theme at the Splunk user conference is the idea that the users are the greatest threat. Even in a well-regulated enterprise where no one has more privileges than what’s needed to do their job, a typical user has more than enough ability to steal massive amounts of data or cause widespread problems. Fortscale seeks to address this issue by using the data that you are already collecting.