InfoQ Homepage Big Data Content on InfoQ
-
ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes
At ApacheCon North America, Christopher Crosbie gave a keynote talk title "Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes." He highlighted Google's efforts to make Apache big-data software "cloud native" by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes cluster.
-
Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads
In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults
-
An Introduction to Structured Data at Etsy
Etsy recently published a blog post detailing how they store and manage structured data. The Etsy team make extensive use of taxonomies, and store the structured data with JSON files.
-
Amazon Releases AWS Lake Formation to General Availability
Recently, Amazon announced the general availability (GA) of AWS Lake Formation, a fully managed service that makes it much easier for customers to build, secure, and manage data lakes.
-
Data Engineering in Badoo: Handling 20 Billion Events Per Day
Badoo is a dating social network that currently handles billions of events per day, explains Vladimir Kazanov, data platform engineering lead. At Skills Matter, he talked through some of the challenges of operating at this scale, and what tooling Badoo uses in order to process and report on this data.
-
The First AI to Beat Pros in 6-Player Poker, Developed by Facebook and Carnegie Mellon
Facebook AI Research’s Noam Brown and Carnegie Mellon’s professor Tuomas Sandholm recently announced Pluribus, the first Artificial Intelligence program able to beat humans in 6 player hold-em poker. In the past years, computers have progressively improved, beating humans in checkers, chess, Go, and the Jeopardy TV show. Poker poses more challenges around information asymmetry and bluffing.
-
Microsoft Announces Public Preview of Azure Data Share
Microsoft has announced the public preview of Azure Data Share, which provides capabilities to share data with users in the own organization, as well as with other organizations. Essentially, Microsoft positions the recently announced service as a big data tool, though it’s also possible to share individual files.
-
Amazon Personalize Is Now Generally Available, Bringing ML to Customers
After the first announcement of Amazon Personalize during AWS re:Invent last November, the service is now generally available for all AWS customers. With this service, developers can add custom machine learning models to their application, including ones for personalized product recommendations, search results and direct marketing, even if they don’t have much machine learning experience.
-
Los Angeles CTO Roundtable about AI and Data
The recent "Leaders in Data CTO Roundtable" in Los Angeles included discussions about an artificial intelligence (AI) framework/platform for business, data in the next five years, data software stacks, and acquiring data talent.
-
Sign In with Apple Touts Single Sign-On without Sharing Your Data
At the recent WWDC 2019, Apple announced its own Single Sign-On (SS) service, dubbed Sign in with Apple. Deemed "Apple's most significant new innovation" by Time, Sign in with Apple promises not to share any personal user data, including email addresses.
-
Introducing Interoperable Blockchain Identity Solutions with Hyperledger Aries
In a recent blog post, the Hyperledger project announced their 13th project called Hyperledger Aries, which provides an interoperable identity management toolkit that enables creating, transmitting and storing verifiable digital certificates. Using this toolkit, organizations can support, secure, interoperable peer-to-peer messaging across different distributed ledger technologies (DLT).
-
Expo: Real Time A/B Testing and Monitoring with Spark Streaming and Kafka at Walmart Labs
The WalmartLabs engineering team developed a real time A/B testing tool called Expo that collects and analyzes user engagement metrics. It uses Spark Structured Streaming to process the incoming data and stores the metrics in KairosDB.
-
Databricks Open Sources Delta Lake to Make Data Lakes More Reliable
Databricks recently announced open sourcing Delta Lake, their proprietary storage layer, to bring ACID transactions to Apache Spark and big data workloads. Databricks is the company behind the creators of Apache Spark, while Delta Lake is already being used in several companies like McAffee, Upwork etc . Delta Lake is addressing the heterogeneous data problem that data lakes often have...
-
Microsoft Releases High-Performance C# and F# Support for Apache Spark
Microsoft announced the release of .NET for Apache Spark, adding new high-performance C# and F# binding to the big-data computation engine.
-
A Framework for High-Value Big Data
Asha Saxena recently spoke at the Enterprise Data World 2019 Conference about the value big data analytics initiatives bring to the organizations. Saxena proposed a big data framework that can help with organizational maturity and internal competencies.