InfoQ Homepage Big Data Content on InfoQ
-
Confluent Releases KSQL, a Distributed Streaming SQL Engine for Apache Kafka
Confluent released KSQL: interactive, distributed streaming SQL engine for Apache Kafka. KSQL supports stream processing operations like aggregations, joins, windowing, and sessionization on topics in Apache Kafka. Confluent announced the open source streaming SQL engine at the recent Kafka Summit conference.
-
Q&A with Andrew Brust of Datameer Regarding Big Data's Role in AI
Rags Srinivas talks to Datameer's Andrew Brust about the larger role of Big Data in AI and how it's operationalized with SmartAI.
-
Researchers Improve State of the Art in Image Recognition Using Data Set with 300 Million Images
Researchers improved the state of the art results on several benchmarks with models trained on a generated data set with 300 million images instead of the 1 million normally used. To test what happens with more train data, Google created an internal dataset of 300 million images. They labelled the data automatically in a noisy way. The conclusion is that more training data indeed helps.
-
Netflix Announces Genie 3
Netflix announced major revisions and functionality in their Big Data distributed workflow management tool, Genie 3. In its newest version, Genie 3 supports scalable, config-driven data processing executables and task pipelines.
-
Scalable Chatbot Architecture with eBay ShopBot Shopping Assistant
Robert Enyedi, software engineer at eBay spoke at QCon New York 2017 Conference about ShopBot personal shopping assistant application. ShopBot, launched in late 2016 based on Facebook Messenger bot, leverages AI components and the eBay user data to provide shopping options in a conversational style.
-
Enhancing Google Maps with Deep Learning and Street View
Google's Ground Truth team recently announced a new Deep Learning model for the automatic extraction of information from geo-located image files to improve Google Maps. This neural network model achieved a higher accuracy in processing the challenging French Street Name Signs (FSNS) dataset.
-
Facebook Publishes New Neural Machine Translation Algorithm
Facebook’s Artificial Intelligence Research team published research results using a new approach for neural machine translation (NMT). Their algorithm scores higher than any other system on three established machine translation tasks.
-
Developing Virtual Assistant Apps with Amazon Lex and Polly Deep Learning Technologies
Greg Bulmash from Amazon spoke at the OSCON 2017 Conference last week about developing your own virtual assistant applications using Amazon's Lex and Polly technologies.
-
Apache Metron Graduates to Top-Level Project
Hortonworks and Apache announce graduation of Metron, a realtime big data security platform to top-level project at the ASF.
-
Confluent Cloud, Apache Kafka as a Service in AWS
Apache Kafka is a distributed, fault-tolerant pub sub messaging soltuion, originally developed by LinkedIn and open sourced. Confluent was formed by former LinkedIn engineers in the Kafka development group and today announced Confluent Cloud, a fully hosted and managed Apache Kafka as a Service in AWS. We also take a look at Confluent's second annual Streaming Data report and its findings.
-
Data Preparation Pipelines: Strategy, Options and Tools
Data preparation is an important aspect of data processing and analytics use cases. Business analysts and data scientists spend about 80% of their time gathering and preparing the data rather than analyzing it or developing machine learning models. Kelly Stirman spoke last week at Enterprise Data World 2017 Conference about the data preparation best practices.
-
Google Announces Cloud Machine Learning API Updates
Google recently announced the Cloud Machine Learning API updates at the Google Cloud Next Conference. This includes a set of APIs in the areas of vision, video intelligence, speech, natural language, translation and job search.
-
Using Deep Learning Technologies IBM Reaches a New Milestone in Speech Recognition
The research team at IBM recently announced they've reached a new industry record at 5.5%, using the SWITCHBOARD linguistic corpus. This brings us closer to what's considered to be the human error rate, 5.1%. They used deep learning technologies and acoustic models to accomplish this milestone.
-
Netflix Demonstrates Big Data Analytics Infrastructure
At QCon San Francisco, engineers at Netflix discussed their big data strategy and analytics infrastructure. This included a summary of the scale of their data, their S3 data warehouse, and Genie, their big data federated orchestration system.
-
Apache Ranger Graduates to Top-Level Project
Apache Ranger, a security management framework for Apache Hadoop ecosystem, graduated to top level. Ranger is used as a centralized component to define and administer security policies that are enforced across supported Hadoop components such as Apache HBase, Hadoop (HDFS and YARN), Apache Hive, Apache Kafka, Apache Solr, among others.