InfoQ Homepage Big Data Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

An Introduction to Structured Data at Etsy

Etsy recently published a blog post detailing how they store and manage structured data. The Etsy team make extensive use of taxonomies, and store the structured data with JSON files.

Alex Giamas
on Aug 31, 2019
Cloud

Amazon Releases AWS Lake Formation to General Availability

Recently, Amazon announced the general availability (GA) of AWS Lake Formation, a fully managed service that makes it much easier for customers to build, secure, and manage data lakes.

Steef-Jan Wiggers
on Aug 13, 2019
AI, ML & Data Engineering

The First AI to Beat Pros in 6-Player Poker, Developed by Facebook and Carnegie Mellon

Facebook AI Research’s Noam Brown and Carnegie Mellon’s professor Tuomas Sandholm recently announced Pluribus, the first Artificial Intelligence program able to beat humans in 6 player hold-em poker. In the past years, computers have progressively improved, beating humans in checkers, chess, Go, and the Jeopardy TV show. Poker poses more challenges around information asymmetry and bluffing.

Alex Giamas
on Jul 31, 2019
Cloud

Microsoft Announces Public Preview of Azure Data Share

Microsoft has announced the public preview of Azure Data Share, which provides capabilities to share data with users in the own organization, as well as with other organizations. Essentially, Microsoft positions the recently announced service as a big data tool, though it’s also possible to share individual files.

Eldert Grootenboer
on Jul 23, 2019
Cloud

Amazon Personalize Is Now Generally Available, Bringing ML to Customers

After the first announcement of Amazon Personalize during AWS re:Invent last November, the service is now generally available for all AWS customers. With this service, developers can add custom machine learning models to their application, including ones for personalized product recommendations, search results and direct marketing, even if they don’t have much machine learning experience.

Steef-Jan Wiggers
on Jun 25, 2019
AI, ML & Data Engineering

Los Angeles CTO Roundtable about AI and Data

The recent "Leaders in Data CTO Roundtable" in Los Angeles included discussions about an artificial intelligence (AI) framework/platform for business, data in the next five years, data software stacks, and acquiring data talent.

Aslan Brooke
on Jun 17, 2019
AI, ML & Data Engineering

Databricks Open Sources Delta Lake to Make Data Lakes More Reliable

Databricks recently announced open sourcing Delta Lake, their proprietary storage layer, to bring ACID transactions to Apache Spark and big data workloads. Databricks is the company behind the creators of Apache Spark, while Delta Lake is already being used in several companies like McAffee, Upwork etc . Delta Lake is addressing the heterogeneous data problem that data lakes often have...

Alex Giamas
on May 20, 2019
AI, ML & Data Engineering

A Framework for High-Value Big Data

Asha Saxena recently spoke at the Enterprise Data World 2019 Conference about the value big data analytics initiatives bring to the organizations. Saxena proposed a big data framework that can help with organizational maturity and internal competencies.

Srini Penchikala
on Apr 03, 2019
Cloud

Microsoft Announces New Azure Analytics Services ADLS, ADX and More

Microsoft has announced the general availability of two new Azure analytics services - Azure Data Lake Storage Gen2 (ADLS) and Azure Data Explorer (ADX). Furthermore, Microsoft also announced the preview of Azure Data Factory Mapping Data Flow.

Steef-Jan Wiggers
on Feb 17, 2019
Cloud

Microsoft Announces the General Availability of Azure Data Box Disk

In a recent blog post, Microsoft has announced the general availability of Azure Data Box Disk, an SSD-based solution for offline data transfer to Azure. Furthermore, Microsoft also announced the public preview of Azure Data Box Blob Storage – a feature allowing customers to copy data to Blob Storage on a Data Box.

Steef-Jan Wiggers
on Jan 25, 2019
Culture & Methods

Q&A with Christoph Windheuser on AI Applications in the Industry

Increased hardware power and huge amounts of data are making existing machine learning approaches like pattern recognition, natural language processing, and reinforcement learning possible. Artificial Intelligence is impacting the development process; it’s increasing the complexity of things like version control, CI/CD and testing.

Ben Linders
on Dec 15, 2018
Cloud

Amazon Announces Managed Streaming for Kafka in Public Preview

At the recent AWS re:Invent 2018 event, Amazon announced a new fully managed service that makes it easy for customers to build and run applications that use Apache Kafka to process streaming data. This new service is called Amazon Managed Streaming for Kafka, Amazon MSK for short, and is now in public preview.

Steef-Jan Wiggers
on Dec 06, 2018
Cloud

Google Cloud Announces Transfer Appliance in Beta for Cloud Data Migrations in the EU

Google announced that Transfer Appliance, a high-capacity server that lets customers move large amounts of data to Google Cloud Platform (GCP) quickly and securely, is available in beta in the European Union (EU). Google will handle the data transfer with Transfer Appliance in GCP in the EU, and data will not leave the EU.

Steef-Jan Wiggers
on Nov 20, 2018
AI, ML & Data Engineering

The Evolution of Uber’s 100+ Petabyte Big Data Platform

Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.

Hrishikesh Barua
on Nov 10, 2018
AI, ML & Data Engineering

Data Lakes and Modern Data Architecture in Clinical Research and Healthcare

Dr. Prakriteswar Santikary, chief data officer at ERT, spoke at Data Architecture Summit 2018 Conference last month about data lake architecture his team developed at their clinical research organization. He discussed the data platform deployed in the cloud to streamline data collection, aggregation and clinical reporting and analytics, using concepts like serverless computing and data services.

Srini Penchikala
on Nov 08, 2018

Newer News

Older News

InfoQ Software Architects' Newsletter

News