InfoQ Homepage Distributed Data Content on InfoQ
-
Uber's CacheFront: Powering 40M Reads per Second with Significantly Reduced Latency
Uber developed an innovative caching solution, CacheFront, for its in-house distributed database, Docstore. CacheFront enables over 40M reads per second from online storage and achieves substantial performance improvements, including a 75% reduction in P75 latency and over 67% reduction in P99.9 latency, demonstrating its effectiveness in enhancing system efficiency and scalability.
-
Apache Pinot 1.0 Provides a Realtime Distributed OLAP Datastore
Apache Pinot is an open source column-oriented distributed data store written in Java. Pinot is designed to use Online Analytical processing (OLAP) in order to answer multi-dimensional analytical (MDA) queries with low latency.
-
Distributed Materialized Views: How Airbnb’s Riverbed Processes 2.4 Billion Daily Events
Airbnb created Riverbed, a Lambda-like data framework for producing and managing distributed materialized views. The framework supports over 50 read-heavy use cases where data is sourced from multiple data sources within the company’s service-oriented architecture (SOA) platform. It uses Apache Kafka and Apache Spark for online and offline components, respectively.
-
Distributed PostgreSQL Benchmarks: Azure Cosmos DB, CockroachDB, and YugabyteDB
Microsoft recently discussed the results of distributed PostgreSQL benchmarks, comparing transaction processing and price performance for Azure Cosmos DB for PostgreSQL, CockroachDB, and Yugabyte. With different implementation trade-offs, the results show a higher throughput for Azure Cosmos DB but highlight the challenges of benchmarking distributed databases.
-
Zero-Copy In-Memory Sharing of Large Distributed Data: V6d
Zero-copy and in-memory data manager Vineyard (v6d) is maintained as a CNCF sandbox project and provides distributed operators that can be utilized to share immutable data within or across cluster nodes. V6d is of interest particularly for deep network training on big (sharded) datasets such as large language and graph models.
-
AWS Releases SimSpace Weaver for Real-Time Spatial Simulations
AWS recently released SimSpace Weaver, a managed option to run real-time spatial simulations across multiple EC2 instances. Distributing simulation workloads, the service can handle large real-world environments, crowd simulations, and immersive interactive experiences.
-
Cloudflare D1 Provides Distributed SQLite for Cloudflare Workers
Soon to enter beta, D1 is Cloudflare's first step into the Cloud-based SQL storage arena. D1 is built on top of SQLite with the addition of a distributed replication mechanism, batch operation support, embedded compute, automatic backups and redundancy, and more.
-
Hasura Remote Joins Implements GraphQL Data Federation
Hasura Remote Joins allows developers to use a single data graph to query several underlying data sources. Doing so does not force developers to modify data sources. Developers instead configure the relationships between the federated data models. The unified GraphQL API, combined with Hasura’s handling of authorization and caching, may provide more consistent and secure data access at scale.
-
Microsoft Open-Sources Fluid Framework for Distributed, Scalable, Real-Time Collaborative Web Apps
Microsoft open-sources Fluid Framework, a low-level platform for distributed, real-time collaborative web applications that possibly scale to a large number of simultaneous collaborators. Microsoft leverages the Fluid Framework in Microsoft 365.
-
The Distributed Data Mesh as a Solution to Centralized Data Monoliths
Instead of building large, centralized data platforms, corporations and data architects should create distributed data meshes.
-
Mind Your State for Your State of Mind: Pat Helland at QCon SF
The features of different types of data storage should be considered when selecting how data is stored in a system. Is always reading correct data, or low latency, most important? In his keynote at this year’s QCon San Francisco, Pat Helland described trends in storage and computing, durable and session state semantics, and other aspects of storage like transactions, identity and immutability.
-
Deep-Learning Framework SINGA Graduates to Top-Level Apache Project
The Apache Software Foundation (ASF) recently announced that SINGA, a framework for distributed deep-learning, has graduated to top-level project (TLP) status, signifying the project's maturity and stability. SINGA has already been adopted by companies in several sectors, including banking and healthcare.
-
Cockroach Labs Announces CockroachCloud, a Fully-Managed Distributed SQL Database in Beta
Recently, Cockroach Labs announced the beta program of CockroachCloud, a fully-managed service for its CockroachDB distributed SQL database. With CockroachCloud, customers can provision, scale and manage a complex, highly available distributed SQL database within minutes.
-
Extending the Reach of SQL to IoT Microcontrollers, ITTIA and Cypress Release SDK
In a recent press release, ITTIA, a maker of embedded database software for Internet of Things (IoT) devices, and Cypress Semiconductor Corp, announced a collaborative IoT device and data management capability. The new capability integrates SQL into the WICED SDK and unlocks the power of flash media on Cypress wireless microcontrollers (MCU).
-
Introducing Interoperable Blockchain Identity Solutions with Hyperledger Aries
In a recent blog post, the Hyperledger project announced their 13th project called Hyperledger Aries, which provides an interoperable identity management toolkit that enables creating, transmitting and storing verifiable digital certificates. Using this toolkit, organizations can support, secure, interoperable peer-to-peer messaging across different distributed ledger technologies (DLT).