InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Facebook's Comparison of Apache Giraph and Spark GraphX for Graph Data Processing
A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.
-
Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow
Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.
-
Large Scale Experimentation at Spotify
When you want to scale the number of A/B tests to do many experiments at the same time, you need to adopt your processes and platform, and it might also impact your culture. Doing product research with controlled experiments helps to confront your ideas about how customers will use your product in reality, and check if those ideas actually impact user behaviour.
-
Amazon Adds Finer Granularity of Control to Their Voice Recognition API
Amazon’s Alexa Voice Service API, the NLP (natural language processing) API that powers Amazon Echo, has a new update that allows for developers to use Alexa to turn any device into a “smart” device through the use of the API’s voice recognition features.
-
AWS re:Invent Recap
At their annual re:Invent conference in Las Vegas, AWS unleashed a flurry of announcements about upcoming cloud services. Amazon outlined over two dozen new capabilities coming to the public cloud, including directly querying data in S3 object storage, building code as part of deployment pipelines, provisioning cheap virtual private servers, and moving data in bulk, ETL-style.
-
Amazon Announces MXNet as Deep Learning Framework of Choice at AWS
Amazon's Werner Vogels announces MXNet as the deep learning toolkit of choice for internal adoption, and extends AWS commitment to open-source MXNet ecosystem development.
-
Couchbase 4.6 Developer Preview Released, Adds Real-Time Connectors for Apache Spark 2.0 and Kafka
Couchbase 4.6 Developer Preview features full text search improvements, cross data center replication with globally-ordered conflict resolution and connectors for real-time analytics technologies: one for Spark 2.0 and the other for Kafka.
-
Tracks Announced & Registrations off to a Fast Start: QCon London 2017 (March 6-10, 2017) Update
QCon London 2017, the 11th annual practitioner-driven conference designed for team leads, architects and influencers driving innovation in their teams, hosts more than 125 speakers across 18 concurrent tracks over three days. Track topics have been finalized and published. Ticket sales for this year’s conference are off to a fast start - register before December 17 2016 and save £360.
-
Spark Summit EU Highlights: TensorFlow, Structured Streaming and GPU Hardware Acceleration
Apache Spark integration with deep learning library TensorFlow, online learning using Structured Streaming and GPU hardware acceleration were the highlights of Spark Summit EU 2016 held last week in Brussels.
-
New and Interesting Changes on ThoughtWorks Radar
As usual, the ThoughtWorks Technology Radar covers four areas - Language & Frameworks, Platforms, Techniques, Tools – each item having one of four recommendations – Adopt, Trial, Assess, Hold. This article lists only what is new and noteworthy in the respective areas.
-
QCon SF Keynote: The History and Future of Wearable Computing and Virtual Experience
Amber Case gave the opening keynote talk at QCon San Francisco. She spoke about the history and current state of virtual reality interfaces, the challenges faced by augmented reality and how these can be overcome as people become more comfortable with the advances in technology.
-
JVMs Across the Data Center and Twitter's JDK
The Twitter Sponsored Solutions track at QConSF2016 features an engineering talk on JVMs Across the Data Center and unveils an in-house OpenJDK fork, the Twitter-JDK, with noted potential open-sourcing or release to broader public.
-
Google Details Allo Recommendation Graph Processing Algorithm
Google details a graph streaming algorithm for constant runtime over large graphs of varying complexity space and predictor outputs.
-
Microsoft Releases Data Science Tools for Interactive Data Exploration and Modeling
Microsoft recently released two new data science tools for interactive data exploration: modeling and reporting. These tools can be reused by data science teams with data specific tasks in their projects. The goal is to ensure consistency and completeness of data science tasks across different projects in the organization.
-
Microservices and Stream Processing Architecture at Zalando Using Apache Flink
Javier Lopez and Mihail Vieru spoke at Reactive Summit 2016 Conference about cloud-based data integration and distribution platform used for stream processing in business intelligence use cases. Their solution is based on technologies such as Flink, Kafka and Elasticsearch.