Intel open-sources BigDL, a distributed deep learning library that runs on Apache Spark. It leverages existing Spark clusters to run deep learning computations and simplifies the data loading from big datasets stored in Hadoop.
Kuzzle is a document back-end that can run on-premises or in the cloud. The company behind this platform has recently announced the enterprise version of their solution during CES 2017.
Yelp open sources latest component in its data pipeline initiative, a python-based data pipeline client library.
Instacart is an online delivery service for groceries under one hour. Customers order the items on the website or using the mobile app, and a group of Instacart’s shoppers go to local stores, purchase the items and deliver them to the customer. InfoQ interviewed Mathieu Ripert, data scientist at Instacart, to find out how machine learning is leveraged to guarantee a better customer experience.
Stack Overflow recently announced making its dataset available through Google’s BigQuery. Using regular SQL statements, developers can query the full set of Stack Overflow data including posts, votes, tags, and badges. In this article we explore datasets that are available through Google's BigQuery platform.
The latest version of Graph NoSQL database Neo4j introduces causal clustering and new security architecture. Neo4j team recently released version 3.1 of the graph database. Other new features include database kernel improvements and a Schema Viewer.
“Fast and Probably Good Seedings for k-Means” by Olivier Bachem et al. was presented on 2016’s Neural Information Processing Systems (NIPS) conference and describes AFK-MC2, an alternative method to generate initial seedings for k-Means clustering algorithm that is several orders of magnitude faster than the state of art method k-Means++.
Speedment released version 3.0.1 of their stream object-relational mapping Java toolkit and runtime application, featuring a new declarative Java 8 stream API, an improved user interface, and better code generation. InfoQ spoke to Per-Åke Minborg, co-founder and CTO of Speedment, about this latest release.
Julien Nioche, director of DigitalPebble, PMC member and committer of the Apache Nutch web crawler project, talks about StormCrawler, a collection of reusable components to build distributed web crawlers based on the streaming framework Apache Storm. InfoQ interviewed Nioche, main contributor of the project, to find out more about StormCrawler and how it compares to other similar technologies.
A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.
Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.
At their annual re:Invent conference in Las Vegas, AWS unleashed a flurry of announcements about upcoming cloud services. Amazon outlined over two dozen new capabilities coming to the public cloud, including directly querying data in S3 object storage, building code as part of deployment pipelines, provisioning cheap virtual private servers, and moving data in bulk, ETL-style.
The Cloud, infrastructure as code, federated architectures with APIs, and anti-fragile systems: these are technologies for developing software systems that are rapidly coming into focus, claimed Mary Poppendieck. Systems are moving towards the cloud, and APIs are replacing central shared databases and enable the internet of things. We need to develop anti-fragile systems which embrace failure.
Couchbase 4.6 Developer Preview features full text search improvements, cross data center replication with globally-ordered conflict resolution and connectors for real-time analytics technologies: one for Spark 2.0 and the other for Kafka.
Realm has launched an open source object database for Node.js, allowing mobile developers to create and send pre-populated Realms to clients.