InfoQ Homepage Columnar Databases Content on InfoQ
-
Time Series Database QuestDB 8.0 Improves SQL Performance and Adds ZFS Compression
Version 8 of QuestDB, an open-source time series database designed for high-performance and efficient handling of time series data, has been released. This release includes a new VARCHAR data type as a default (over STRING) that aims to provide better compression and performance, a 50% improvement for SQL query performance, and data compression via system-level ZFS
-
Reddit Migrates Media Metadata from S3 and Other Systems into AWS Aurora Postgres
Reddit consolidated its media metadata storage into a new architecture using AWS Aurora Postgres. Previously, the company sourced media metadata from various systems, including directly from AWS S3. The new solution simplifies media metadata retrieval and handles 100k+ requests per second with latency below 5ms (p90).
-
Expedia Speeds up Flights Search with Micro Frontends and GraphQL Optimizations
Expedia made flight search faster by up to 52% (page usable time) by applying a range of optimizations to web and mobile applications. To support these improvements, the company improved the observability of its applications. Expedia Flights web application has been migrated to Micro Frontend Architecture (MFA) to allow flexibility, reusability, and better optimization.
-
ClickHouse Keeper: Efficient Apache ZooKeeper Alternative Created with C++ and Raft
ClickHouse project team created an in-house replacement for Apache Zookeeper as it needed a more efficient implementation that would also address some of Zookeeper's shortcomings. Now, ClickHouse Keeper is an essential part of the ClickHouse project and a cornerstone of this open-source analytical database, but can also be used independently for many distributed coordination use cases.
-
Managing 238 Million Memberships of Netflix: Surabhi Diwan at QCon San Francisco
During the first day of QCon San-Francisco 2023, Surabhi Diwan, a senior software engineer at Netflix, presented on managing 238 million Memberships of Netflix. The talk is a part of the “Architectures You’ve Always Wondered About" track. Diwan's work at Netflix involves the backend work regarding membership engineering, which is critical for both signups and streaming at Netflix.
-
Yelp Rebuilds Corrupted Cassandra Cluster Using Its Data Streaming Architecture
Yelp created a solution to sanitize data from the corrupted Apache Cassandra cluster utilizing its data streaming architecture. The team explored many potential options to address the data corruption issue, however, ultimately had to move the data into a new cluster to remove corrupted records in the process.
-
Azure Brings Vertical Scaling, Monitor Alerts and More for Apache Cassandra Managed Instance
Microsoft has recently released some new features for Azure Managed Instance for Apache Cassandra, such as upgrading the Apache Cassandra version to 4.0 GA, Azure Monitor alerts and insights, deallocating cluster resources to improve costs, vertical scaling and more.
-
Discord Migrates Trillions of Messages from Cassandra to ScyllaDB
Discord has migrated trillions of message records from Apache Cassandra to ScyllaDB, reducing the size of the largest cluster from 177 Cassandra nodes to 72 ScyllaDB nodes and reducing tail latencies for reads and writes. The move has unlocked new product use cases because of the improved database stability and performance.
-
Netflix Built a Scalable Annotation Service Using Cassandra, Elasticsearch and Iceberg
Netflix recently published how it built Marken, a scalable annotation service using Cassandra, ElasticSearch and Iceberg. Marken allows storing and querying annotations, or tags, on arbitrary entities. Users define versioned schemas for their annotations, which include out-of-the-box support for temporal and spatial objects.
-
InfluxData Releases Its New Database Engine in InfluxDB Cloud
InfluxData releases into general availability the new version of its database engine called Influx IOx. It is now available to be used in InfluxDB Cloud.
-
Google Introduces Zero-ETL Approach to Analytics on Bigtable Data Using BigQuery
Recently, Google announced the general availability of Bigtable federated queries, with BigQuery allowing customers to query data residing in Bigtable via BigQuery faster. Moreover, the querying is without moving or copying the data in all Google Cloud regions with increased federated query concurrency limits, closing the longstanding gap between operational data and analytics.
-
Google Introduces Autoscaling for Cloud Bigtable for Optimizing Costs
Cloud Bigtable is a fully-managed, scalable NoSQL database service for large operational and analytical workloads on the Google Cloud Platform (GCP). And recently, the public cloud provider announced the general availability of Bigtable Autoscaling, which automatically adds or removes capacity in response to the changing demand for applications allowing cost optimizations.
-
Google Cloud Improves SLA for Bigtable and Adds New Security Features
Google Cloud has recently raised the availability SLA for Bigtable instances up to 99.999%, matching the SLA for Firestore and Cloud Spanner. The data storage system introduced as well two new security features for enterprise workloads, customer-managed encryption keys (CMEK) and data access audit logs.
-
Google Provides a Peek into the Architecture of Colossus - Its Storage Foundation
In a recent post, Google provided a glimpse into the architecture of Colossus. Colossus underpins Google's scalable storage system, which serves both its Google Cloud offerings and Google's own globally available services such as YouTube, Google Drive, and Gmail. Five separate components compose Colossus - the client library, curators, metadata database, file servers, and custodians.
-
Microsoft Announces Azure Managed Instance for Apache Cassandra
At this year’s Ignite conference, Microsoft announced the public preview of Azure Managed Instance for Apache Cassandra, a NoSQL database product to manage Cassandra-based workloads into Azure cloud.