InfoQ Homepage Big Data Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

Netflix Creates Incremental Processing Solution Using Maestro and Apache Iceberg

Netflix created a new solution for incremental processing in its data platform. The incremental approach reduces the cost of computing resources and execution time significantly as it avoids processing complete datasets. The company used its Maestro workflow engine and Apache Iceberg to improve data freshness and accuracy and plans to provide managed backfill capabilities.

Rafal Gancarz
on Jan 15, 2024
Java

QCon San Francisco 2023 Day 1: Architectures, Data Engineering, Infra Languages, Staff+ Skills

The 17th annual QCon San Francisco conference was held at the Hyatt Regency San Francisco in San Francisco, California. This five-day event, organized by C4Media, consists of three days of presentations and two days of workshops. Day One, scheduled on October 2nd, 2023, included a keynote address by Suhail Patel and presentations from four conference tracks and two sponsored tracks.

Michael Redlich
on Oct 03, 2023
AI, ML & Data Engineering

Grammarly Replaces its in-House Data Lake with Databricks Platform Using Medallion Architecture

Grammarly adopted the medallion architecture while migrating from their in-house data lake, storing Parquet files in AWS S3, to the Delta Lake lakehouse. The company created a new event store for over 6000 event types from 40 internal and external clients and, in the process, improved data quality and reduced the data-delivery time by 94%.

Rafal Gancarz
on Jul 24, 2023
Architecture & Design

How LinkedIn Serves over 4.8 Million Member Profiles per Second

LinkedIn introduced Couchbase as a centralized caching tier for scaling member profile reads to handle increasing traffic that has outgrown their existing database cluster. The new solution achieved over 99% hit rate, helped reduce tail latencies by more than 60% and costs by 10% annually.

Rafal Gancarz
on Jul 03, 2023
Architecture & Design

Discord Migrates Trillions of Messages from Cassandra to ScyllaDB

Discord has migrated trillions of message records from Apache Cassandra to ScyllaDB, reducing the size of the largest cluster from 177 Cassandra nodes to 72 ScyllaDB nodes and reducing tail latencies for reads and writes. The move has unlocked new product use cases because of the improved database stability and performance.

Rafal Gancarz
on Jun 22, 2023
Culture & Methods

Adopting Artificial Intelligence: Things Leaders Need to Know

Artificial intelligence (AI) can help companies identify new opportunities and products, and stay ahead of the competition. Senior software managers should understand the basics of how this new technology works, why agility is important in developing AI products, and how to hire or train people for new roles.

Ben Linders
on May 18, 2023
Cloud

AWS Introduces Athena Provisioned Capacity

AWS recently announced a new feature Provisioned Capacity for Athena, that allows users to run SQL queries on fully-managed compute capacity for a fixed price and no long-term commitments.

Steef-Jan Wiggers
on May 04, 2023
Java

Apache Linkis Graduated to Apache Top-Level Project

Apache Linkis is a computation middleware that acts as a layer between upper-level applications and underlying engines, such as Apache Spark, Apache Hive and Apache Flink. It started as an Apache Incubator project in 2021 and graduated to a Top Level Project in January 2023.

Andrea Messetti
on Feb 01, 2023
Java

Apache Druid 25.0 Delivers Multi-Stage Query Engine and Kubernetes Task Management

Apache Druid is a high-performance real-time datastore and its latest release, version 25.0, provides many improvements and enhancements. The main new features are: the multi-stage query (MSQ) task engine used for SQL-based ingestion is now production ready, and Kubernetes can be used to launch and manage tasks eliminating the need for middle managers...

Andrea Messetti
on Jan 19, 2023
AI, ML & Data Engineering

How Twitter Automated Data Quality Check Process

Twitter engineering has recently shared a blog post on how they architected and developed a quality automation platform. Twitter digests and creates thousands of data sets for different data products and applications. The next natural step is to make sure of the quality of the data by adding automation on top of it. In this news post, we explore this architecture in more detail.

Reza Rahimi
on Dec 20, 2022
AI, ML & Data Engineering

Uber Freight Near-Real-Time Analytics Architecture

Uber Freight is the Uber platform dedicated to connecting shippers with carriers. Providing reliable service to shippers is crucial for Uber Freight. This is why the Carrier Scorecard was developed, with several metrics including on-time pickup/delivery, tracking automation, and late cancellations.

Claudio Masolo
on Nov 08, 2022
AI, ML & Data Engineering

Snap Way to Design Ads Ranking Service Using Deep Learning

Snap engineering has recently published a blog post on how they designed their ads ranking and targeting service using deep learning. Showing ads to the users is the mainstream of social network platform monetization. Snap ad ranking system is designed to target the right user at the right time. Snap is providing an excellent user experience while preserving user privacy and security.

Reza Rahimi
on Oct 23, 2022
Cloud

Azure Data Explorer Supports Native Ingestion from Amazon S3

Microsoft recently announced the ability to natively ingest data from Amazon S3 into Azure Data Explorer (ADX). The new feature simplifies multi-cloud data analytics deployments, bringing data from Amazon S3 to Azure, without relying on custom ETL pipelines.

Renato Losio
on Sep 07, 2022
AI, ML & Data Engineering

Next Generation of Data Movement and Processing Platform at Netflix

Netflix engineering recently published in a tech blog how they used data mesh architecture and principles as the next generation of data platform and processing to unleash more business use cases and opportunities. Data mesh is the new paradigm shift in data management that enables users to easily import and use data without transporting it to a centralized location like a data lake.

Reza Rahimi
on Aug 29, 2022
Cloud

Google Introduces Zero-ETL Approach to Analytics on Bigtable Data Using BigQuery

Recently, Google announced the general availability of Bigtable federated queries, with BigQuery allowing customers to query data residing in Bigtable via BigQuery faster. Moreover, the querying is without moving or copying the data in all Google Cloud regions with increased federated query concurrency limits, closing the longstanding gap between operational data and analytics.

Steef-Jan Wiggers
on Aug 11, 2022

Newer News

Older News

InfoQ Software Architects' Newsletter

News