InfoQ Homepage Apache Iceberg Content on InfoQ
News
RSS Feed-
How Netflix Powers Audience Insights at Trillion-Row Scale
In a recent blog post, Netflix engineers described how they scaled Muse, the company’s internal application for data-driven creative insights, to handle trillion-row datasets.
-
Amazon S3 Adds Sort and Z-Order Compaction to Improve Apache Iceberg Query Performance
AWS has recently announced that Amazon S3 now supports sort and z-order compaction for Apache Iceberg tables. The new features reduce scan times and engine costs, and are available for both S3 Tables and traditional S3 buckets using AWS Glue Data Catalog optimization.
-
HTAP: the Rise and Fall of Unified Database Systems?
A recent article by Zhou Sun sparked a debate in the data community about the future of HTAP systems. Hybrid transaction/analytical processing was meant to help integrate historical and online data at scale, supporting more flexible query methods and reducing business complexity.
-
AWS Introduces S3 Tables Bucket: Is S3 Becoming a Data Lakehouse?
AWS has recently announced S3 Tables Bucket, managed Apache Iceberg tables optimized for analytics workloads. According to the cloud provider, the new option delivers up to 3x faster query performance and up to 10x higher transaction rates for Apache Iceberg tables compared to standard S3 storage.
-
Amazon S3 Introduces Metadata Feature for Improved Data Management and Querying in Preview
Amazon Web Services (AWS) has launched S3 Metadata, enhancing data management for S3 users. This new capability enables near real-time querying and analysis of S3 data via organized metadata updates. By adopting Apache Iceberg, it ensures interoperability and scalability, allowing businesses to efficiently leverage their data for analytics and AI applications.
-
From Aurora DSQL to Amazon Nova: Highlights of re:Invent 2024
The 2024 edition of re:Invent has just ended in Las Vegas. As anticipated, AI was a key focus of the conference, with Amazon Nova and a new version of Sagemaker among the most significant highlights. However, the announcement that generated the most excitement in the community was the preview of Amazon Aurora DSQL, a serverless, distributed SQL database with active-active high availability.
-
QCon San Francisco 2024 Day 2: Shift-Left, GenAI, Engineering Productivity, Languages/Paradigms
The 18th annual QCon San Francisco conference was held at the Hyatt Regency San Francisco in San Francisco, California. This five-day event, organized by C4Media, consists of three days of presentations and two days of workshops. Day Two, scheduled on November 19th, 2024, included a keynote address by Lizzie Matusov and presentations from four conference tracks.
-
Netflix Uses Metaflow to Manage Hundreds of AI/ML Applications at Scale
Netflix recently published how its Machine Learning Platform (MLP) team provides an ecosystem around Metaflow, an open-source machine learning infrastructure framework. By creating various integrations for Metaflow, Netflix already has hundreds of Metaflow projects maintained by multiple engineering teams.
-
Netflix Creates Incremental Processing Solution Using Maestro and Apache Iceberg
Netflix created a new solution for incremental processing in its data platform. The incremental approach reduces the cost of computing resources and execution time significantly as it avoids processing complete datasets. The company used its Maestro workflow engine and Apache Iceberg to improve data freshness and accuracy and plans to provide managed backfill capabilities.