InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Facebook Open-Sources RoBERTa: an Improved Natural Language Processing Model
Facebook AI open-sourced a new deep-learning natural-language processing (NLP) model, Robustly-optimized BERT approach (RoBERTa). Based on Google's BERT pre-training model, RoBERTa includes additional pre-training improvements that achieve state-of-the-art results on several benchmarks, using only unlabeled text from the world-wide web, with minimal fine-tuning and no data augmentation.
-
Google Releases Cloud Dataproc for Kubernetes in Alpha
Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has recently announced the alpha availability of Cloud Dataproc for Kubernetes, which provides customers with a more efficient method to process data across platforms.
-
WebExpo 2019: Make Healthcare Affordable and Accessible Using Tech and AI
Anna Zawilska, lead user researcher at Babylon Health, recently presented at WebExpo 2019 in Prague the lessons learnt from their experience delivering remote healthcare through a combination of technology and Artificial Intelligence (AI). Babylon Health came to adjust three key assumptions underpinning their product development.
-
Google Research Use of Concept Vectors for Image Search
Google recently released research about creating a tool for searching Similar Medical Images Like Yours (SMILY). The research uses embeddings for image-based search and allows users to influence the search through the interactive refinement of concepts.
-
Facebook, Microsoft, and Partners Announce Deepfake Detection Challenge
Facebook, Microsoft, the Partnership on AI, and researchers from several universities have created the Deepfake Detection Challenge (DDC), a contest to produce AI that can detect misleading images and video that have been created by AI. The challenge includes several grants and awards for the teams that create the best AI solution, using the DDC's dataset of real and fake videos.
-
Amazon-Certified Syntiant Neural Decision Processors (NDP) Aim to Bring Alexa to Low-Power Devices
Syntiant NDPs are custom-built chips specialized to run TensorFlow neural networks and can be integrated in many kinds of voice and audio-enabled devices, now including Alexa-enabled devices.
-
Denis Magda on Continuous Deep Learning with Apache Ignite
At the recent ApacheCon North America, Denis Magda spoke on continuous machine learning with Apache Ignite, an in-memory data grid. Ignite simplifies the machine-learning pipeline by performing training and hosting models in the same cluster that stores the data, and can perform "online" training to incrementally improve models when new data is available.
-
Jagadish Venkatraman on LinkedIn's Journey to Samza 1.0
At the recent ApacheCon North America, Jagadish Venkatraman spoke about how LinkedIn developed Apache Samza 1.0 to handle stream processing at scale. He described LinkedIn's use cases involving trillions of events and petabytes of data, then highlighted the features added for the 1.0 release, including: stateful processing, high-level APIs, and a flexible deployment model.
-
ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes
At ApacheCon North America, Christopher Crosbie gave a keynote talk title "Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes." He highlighted Google's efforts to make Apache big-data software "cloud native" by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes cluster.
-
Exploring the Motivations for the Inaugural Open Core Summit: Q&A with Founder Joseph Jacks
The inaugural Open Core Summit will be running September 19-20th in San Francisco. InfoQ recently sat down with the founder of the event, Joseph Jacks, and explored his motivations for running this event and discussed why he believes new open source businesses should focus on creating the maximum level of value possible, but aim to capture only a small amount of that value.
-
Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads
In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults
-
Waymo Shares Autonomous Vehicle Dataset for Machine Learning
Waymo, the self-driving technology company, released a dataset containing sensor data collected by their autonomous vehicles during more than five hours of driving. The set contains high-resolution data from lidar and camera sensors collected in several urban and suburban environments in a wide variety of driving conditions and includes labels for vehicles, pedestrians, cyclists, and signage.
-
Introducing KiloGram, a New Technique for AI Detection of Malware
A team of researchers recently presented their paper on KiloGram, a new algorithm for managing large n-grams in files, to improve machine-learning detection of malware. The new algorithm is 60x faster than previous methods and can handle n-grams for n=1024 or higher. The large values of n have additional application for interpretable malware analysis and signature generation.
-
How Artificial Intelligence Impacts Designing Products
Artificial intelligence is changing the way that we interact with technology; eliminating unnecessary interfaces makes interaction with machines more humane, argued Agnieszka Walorska at ACE conference 2019. The expectations towards customer experience have changed, and one factor that is becoming more and more important to this change is machine learning.
-
New Technique Speeds up Deep-Learning Inference on TensorFlow by 2x
Researchers at North Carolina State University recently presented a paper at the International Conference on Supercomputing (ICS) on their new technique, "deep reuse" (DR), that can speed up inference time for deep-learning neural networks running on TensorFlow by up to 2x, with almost no loss of accuracy.