InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
AFK-MC² Algorithm Speeds up k-Means Clustering Algorithm Seeding
“Fast and Probably Good Seedings for k-Means” by Olivier Bachem et al. was presented on 2016’s Neural Information Processing Systems (NIPS) conference and describes AFK-MC2, an alternative method to generate initial seedings for k-Means clustering algorithm that is several orders of magnitude faster than the state of art method k-Means++.
-
Amazon Preview FPGA Enabled EC2 Instances
Amongst the flurry of announcements at re:invent 2016 was the launch of a developer preview for a new F1 instance type. The F1 comes with one to eight high end Xilinx Field Programmable Gate Arrays (FPGAs) to provide programmable hardware. The FPGAs are likely to be used for risk management, simulation, search and machine learning applications.
-
Julien Nioche on StormCrawler, Open-Source Crawler Pipelines Backed by Apache Storm
Julien Nioche, director of DigitalPebble, PMC member and committer of the Apache Nutch web crawler project, talks about StormCrawler, a collection of reusable components to build distributed web crawlers based on the streaming framework Apache Storm. InfoQ interviewed Nioche, main contributor of the project, to find out more about StormCrawler and how it compares to other similar technologies.
-
Facebook Builds an Efficient Neural Network Model over a Billion Words
Using Neural Networks for sequence prediction is a well-known Computer Science problem with a vast array of applications in speech recognition, machine translation, language modeling and other fields. FB AI Research scientists designed adaptive softmax, an approximation algorithm tailored for GPUs which can be used to efficiently train neural networks over vocabularies of a billion words & beyond.
-
Facebook's Comparison of Apache Giraph and Spark GraphX for Graph Data Processing
A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.
-
Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow
Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.
-
Large Scale Experimentation at Spotify
When you want to scale the number of A/B tests to do many experiments at the same time, you need to adopt your processes and platform, and it might also impact your culture. Doing product research with controlled experiments helps to confront your ideas about how customers will use your product in reality, and check if those ideas actually impact user behaviour.
-
Amazon Adds Finer Granularity of Control to Their Voice Recognition API
Amazon’s Alexa Voice Service API, the NLP (natural language processing) API that powers Amazon Echo, has a new update that allows for developers to use Alexa to turn any device into a “smart” device through the use of the API’s voice recognition features.
-
AWS re:Invent Recap
At their annual re:Invent conference in Las Vegas, AWS unleashed a flurry of announcements about upcoming cloud services. Amazon outlined over two dozen new capabilities coming to the public cloud, including directly querying data in S3 object storage, building code as part of deployment pipelines, provisioning cheap virtual private servers, and moving data in bulk, ETL-style.
-
Amazon Announces MXNet as Deep Learning Framework of Choice at AWS
Amazon's Werner Vogels announces MXNet as the deep learning toolkit of choice for internal adoption, and extends AWS commitment to open-source MXNet ecosystem development.
-
Couchbase 4.6 Developer Preview Released, Adds Real-Time Connectors for Apache Spark 2.0 and Kafka
Couchbase 4.6 Developer Preview features full text search improvements, cross data center replication with globally-ordered conflict resolution and connectors for real-time analytics technologies: one for Spark 2.0 and the other for Kafka.
-
Tracks Announced & Registrations off to a Fast Start: QCon London 2017 (March 6-10, 2017) Update
QCon London 2017, the 11th annual practitioner-driven conference designed for team leads, architects and influencers driving innovation in their teams, hosts more than 125 speakers across 18 concurrent tracks over three days. Track topics have been finalized and published. Ticket sales for this year’s conference are off to a fast start - register before December 17 2016 and save £360.
-
Spark Summit EU Highlights: TensorFlow, Structured Streaming and GPU Hardware Acceleration
Apache Spark integration with deep learning library TensorFlow, online learning using Structured Streaming and GPU hardware acceleration were the highlights of Spark Summit EU 2016 held last week in Brussels.
-
New and Interesting Changes on ThoughtWorks Radar
As usual, the ThoughtWorks Technology Radar covers four areas - Language & Frameworks, Platforms, Techniques, Tools – each item having one of four recommendations – Adopt, Trial, Assess, Hold. This article lists only what is new and noteworthy in the respective areas.
-
QCon SF Keynote: The History and Future of Wearable Computing and Virtual Experience
Amber Case gave the opening keynote talk at QCon San Francisco. She spoke about the history and current state of virtual reality interfaces, the challenges faced by augmented reality and how these can be overcome as people become more comfortable with the advances in technology.