BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage AI, ML & Data Engineering Content on InfoQ

  • Google Open Sources Cloud Dataflow Java SDK

    Google announced earlier this year their Cloud Dataflow, a service and SDK for processing large amounts of data in batches or real time. Now they have open sourced the Dataflow Java SDK, enabling developers to see how it works and possibly use the SDK for services running on-premises or in other clouds.

  • LinkedIn Open Sources Cubert With an Eye To Big Data Analytics

    LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.

  • Agile View of Big Data

    An agile view of Big Data, wherein data is viewed as a real time stream, offers a new look at how data is managed. Using an agile data infrastructure, organizations can conquer Big Data challenges with a level of ease, flexibility and performance. White paper by codeFutures describes the Agile view of Big Data.

  • Gobblin, LinkedIn's Unified Data Ingestion Platform

    At the 2014 QCon San Francisco conference, LinkedIn's Lin Qiao gave a talk on their Gobblin project (also summarized in a blog post) that is a unified data ingestion system for their internal and external data sources.

  • MapR-DB NoSQL Database Integrated into MapR Community Edition for Unlimited Production Use

    MapR Technologies, provider of the Apache Hadoop distribution, has open sourced their MapR-DB NoSQL database for unlimited production use. MapR-DB is a Wide Column NoSQL database with native integration to Hadoop and support for strong consistency and ACID transactions.

  • GridGain Becomes Apache Ignite

    GridGain's In-Memory Data Fabric entered Apache Incubator last October under the name of Apache Ignite. The company donated its flagship in-memory computing platform to the Apache Software Foundation with the intention of attracting external developers and growing a viable community around its core technology.

  • Google Uses Machine Learning to Simplify CAPTCHA

    Google has announced a new CAPTCHA API which provides a No CAPTHA experience for most users.

  • IBM, Databricks, GraphLab Present Notebooks as Unified Interfaces for Building Prediction Apps

    At the StrataHadoop conference in Barcelona last week, Rod Smith, Vice President of the IBM Emerging Internet Technologies organization, presented work on an internal product they have been developing in their consulting work with clients that integrates data sources, and data analysis.

  • Spark Sets New Record in Sort Performance

    Databricks has recently announced a new record in the Daytona GraySort contest using the Spark processing engine. The Daytona GraySort contest is a 3rd party benchmark measuring how fast a system can sort 100 Terabytes of data. Databricks posted a throughput of 4.27 TB/min over a cluster of 206 machines for their official run.

  • Mahout to Get Self-Optimizing Matrix Algebra Interface with Pluggable Backends for Spark and Flink

    At the recent GOTO conference in Berlin, Mahout committer Sebastian Schelter outlined recent advances in Mahout's ongoing effort to create a scalable foundation for data analysis that is as easy to use as R or Python.

  • Lovefield: An SQL-like Query Engine by Google

    Lovefield is a JavaScript library providing an SQL-like query engine to web developers who want the benefits of a relational database.

  • Web Summit 2014 Day Two Review

    Yesterday concluded the second day of the Web Summit in Dublin, Ireland. We see what happened and what is new from last day at the event.

  • Web Summit 2014 Day One Review

    Web Summit, one of the largest technology conferences in Europe opened up today. Famous people from the technology and business world are expected to talk, like Peter Thiel, Drew Houston and Anna Patterson.

  • Basho Announcements at RICON Conference

    The RICON conference in Las Vegas last week brought together scholarship from industry and academia in a venue targeted at sharing the latest innovations in tools, technologies and concepts in the field of Distributed Systems. In hosting the conference Basho Technologies positions themselves as thought leaders in a challenging field.

  • Microsoft Expands Azure Machine Learning and Real Time Analytics Offering

    Microsoft recently announced new machine learning capabilities for Microsoft Azure platform. Developers can also create their own web services and publish them to Azure Marketplace. Microsoft also announced availability of Apache Storm for Azure. Azure Stream Analytics, Data Factory and Event Hubs for Azure were all announced in the past few weeks by Microsoft. In this article we explore moreabout

BT