InfoQ Homepage Database Content on InfoQ
-
Enabling High Performance Real-time Analytics for IoT Environments
Mahish Singh discusses how to use methodologies during design, development, deployment and operation for delivery of analytics platforms which offer real-time SLAs.
-
Architecture & Algorithms Powering Search @ZocDoc
Brian D'Alessandro and Pedro Rubio talk about the patient friendly search system they have built at Zocdoc using various products from the AWS stack and custom Machine Learning pipelines.
-
Orchestrating Chaos: Applying Database Research in the Wild
Peter Alvaro describes LDFI’s (Lineage-driven Fault Injection) theoretical roots in database research, presenting early results from the field and opportunities for near and long-term future research.
-
Managing Thousands of Data Services @Heroku
Gabriel Enslein discusses the evolution of fleet orchestration, immutable infrastructure, security auditing for managing data services for many Salesforce customers.
-
Scaling with Apache Spark
Holden Karau looks at Apache Spark from a performance/scaling point of view and what’s needed to handle large datasets.
-
Managing Data in Microservices
Randy Shoup shares microservices managing data patterns from Google, eBay, and Stitch Fix., talking on the need to access the data only through microservice's interface, communicate through events.
-
Serverless Design Patterns with AWS Lambda: Big Data with Little Effort
Tim Wagner discusses Big Data on serverless, showing working examples and how to set up a CI/CD pipeline, demonstrating AWS Lambda with the Serverless Application Model (SAM).
-
Power of the Log:LSM & Append Only Data Structures
Ben Stopford talks about the beauty of sequential access and append only data structures in the context of “Log Structured Merge Trees”.
-
Applied Distributed Research in Apache Cassandra
Jonathan Ellis explains the challenges and successes Cassandra has had in creating transactions, materialized views, and a strongly consistent cluster membership within this peer-to-peer paradigm.
-
Scio: Moving Big Data to Google Cloud, a Spotify Story
Neville Li tells the Spotify’s story of migrating their big data infrastructure to Google Cloud, replacing Hive and Scalding with BigQuery and Scio, which helped them iterate faster.
-
In-Memory Caching: Curb Tail Latency with Pelikan
Yao Yue introduces Pelikan - a framework to implement distributed caches such as Memcached and Redis. She discusses the system aspects that are important to the performance of such services.
-
Data Preparation for Data Science: A Field Guide
Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.