InfoQ Homepage Big Data Content on InfoQ
-
Distributed Messaging Framework Apache Pulsar 2.0 Supports Schema Registry and Topic Compaction
The latest version of open-source distributed pub-sub messaging framework Apache Pulsar enables companies to move “beyond batch” by acting on data in motion. Streamlio recently announced the availability of Apache Pulsar 2.0 streaming messaging solution. The new version supports Pulsar Functions, Schema Registry and Topic Compaction.
-
eBay's Accelerator Data Processing Framework Provides Parallel Execution and Live Recommendations
eBay's Accelerator data processing framework provides parallel execution and automatic organization of source code, input data, and results. It can be used for data analysis, and algorithm development, as well as a live recommendation system.
-
PayPal's Gimel Analytics Platform Provides Unified Data API and GSQL
Romit Mehta and Deepak Chandramouli from PayPal spoke at the recent QCon.ai Conference about Gimel data analytics platform and how it can be used to commoditize data access. InfoQ spoke with Mehta and Chandramouli about the data platform and its support in the areas of security,
-
Chile’s Energy Regulator to Adopt Blockchain
PV magazine, a publication focused on reporting photovoltaics (solar power generation), has announced the Chile Energy Regulator is set to adopt blockchain in March 2018. The regulator plans to use blockchain technology to transparently record market prices, marginal costs, fuel prices and compliance documentation.
-
Oral Arguments before Supreme Court in Microsoft Cloud Computing Case Focus on Legal Issues
On February 27, 2018, the Supreme Court of the United States heard oral arguments on the Microsoft cloud computing case. A ruling against Microsoft could require companies based in the United States to hand over to law enforcement data stored on foreign servers. U.S. based organizations might then not be able to provide cloud computing services to foreign countries.
-
Managing and Operating Kafka Clusters in Kubernetes
Nenad Bogojevic, platform solutions architect at Amadeus, spoke at KubeCon + CloudNativeCon North America 2017 Conference on how to run and manage Kafka clusters in Kubernetes environment. He talked about provisioning Kafka clusters and configuring them using Kubernetes custom resources or ConfigMaps.
-
Modern Big Data Pipelines over Kubernetes
Container management technologies like Kubernetes make it possible to implement modern big data pipelines. Eliran Bivas, senior big data architect at Iguazio, spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about big data pipelines and how Kubernetes can help develop them.
-
TensorFlow Lite Supports On-Device Conversational Modeling
TensorFlow Lite, the light-weight solution of open source deep learning framework TensorFlow, supports on-device conversation modeling to plugin the conversational intelligence features into chat applications. The TensorFlow team recently announced the release of TensorFlow Lite, which can be used in mobile and embedded devices.
-
Leslie Miley on Bias in Big Data/ML and AI - QCon San Francisco
At QCon San Francisco Leslie Miley gave a keynote talk in which he explained how inherent bias in data sets have affected things from the 2016 Presidential race to criminal sentencing in the United States.
-
Confluent Releases KSQL, a Distributed Streaming SQL Engine for Apache Kafka
Confluent released KSQL: interactive, distributed streaming SQL engine for Apache Kafka. KSQL supports stream processing operations like aggregations, joins, windowing, and sessionization on topics in Apache Kafka. Confluent announced the open source streaming SQL engine at the recent Kafka Summit conference.
-
Q&A with Andrew Brust of Datameer Regarding Big Data's Role in AI
Rags Srinivas talks to Datameer's Andrew Brust about the larger role of Big Data in AI and how it's operationalized with SmartAI.
-
Researchers Improve State of the Art in Image Recognition Using Data Set with 300 Million Images
Researchers improved the state of the art results on several benchmarks with models trained on a generated data set with 300 million images instead of the 1 million normally used. To test what happens with more train data, Google created an internal dataset of 300 million images. They labelled the data automatically in a noisy way. The conclusion is that more training data indeed helps.
-
Netflix Announces Genie 3
Netflix announced major revisions and functionality in their Big Data distributed workflow management tool, Genie 3. In its newest version, Genie 3 supports scalable, config-driven data processing executables and task pipelines.
-
Scalable Chatbot Architecture with eBay ShopBot Shopping Assistant
Robert Enyedi, software engineer at eBay spoke at QCon New York 2017 Conference about ShopBot personal shopping assistant application. ShopBot, launched in late 2016 based on Facebook Messenger bot, leverages AI components and the eBay user data to provide shopping options in a conversational style.
-
Enhancing Google Maps with Deep Learning and Street View
Google's Ground Truth team recently announced a new Deep Learning model for the automatic extraction of information from geo-located image files to improve Google Maps. This neural network model achieved a higher accuracy in processing the challenging French Street Name Signs (FSNS) dataset.