InfoQ Homepage Database Content on InfoQ
-
Demistifying DynamoDB Streams
Akshat Vig and Khawaja Shams discuss DynamoDB Streams and what it takes to build an ordered, highly available, durable, performant, and scalable replicated log stream.
-
Building a Data Science Capability from Scratch
Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.
-
Data Cleansing and Understanding Best Practices
Casey Stella talks about discovering missing values, values with skewed distributions and likely errors within data, as well as a novel approach to finding data interconnectedness.
-
SQL Server on Linux: Will it Perform or Not?
Slava Oks talks about SQL Server’s history, high-level architecture and dives into core of I/O Manager, Memory Manager, and Scheduler. Topics include lessons learned and experiences behind the scenes.
-
Practical Data Synchronization Using CRDTs
Dmitry Ivanov discusses the basic CRDTs implementations in Scala, explaining the advantages of these data structures to solve many synchronization problems as well as their limitations.
-
ScyllaDB: Achieving No-Compromise Performance
Avi Kivity discusses ScyllaDB, the many necessary design decisions, from the programming language and programming model through low-level details and up to the advanced cache design, and more.
-
Data Science in the Cloud @StitchFix
Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.
-
Elastic Data Analytics Platform @Datadog
Doug Daniels discusses the cloud-based platform they have built at DataDog and how it differs from a traditional datacenter-based analytics stack, pros and cons and the tooling built.
-
Petabytes Scale Analytics Infrastructure @Netflix
Tom Gianos and Dan Weeks discuss Netflix' overall big data platform architecture, focusing on Storage and Orchestration, and how they use Parquet on AWS S3 as their data warehouse storage layer.
-
Big Data in the Real World: Technology and Use Cases
Mike Olson presents several use cases where big data is collected and analyzed to gather insights from the automotive, insurance, financial, and other sectors.
-
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark introduces Bayesian Global Optimization as an efficient way to optimize ML model parameters, explaining the underlying techniques and comparing it to other standard methods.
-
Machine Learning and End-to-End Data Analysis Processes in Spark Using Python and R
Debraj GuhaThakurta discusses ML and data analysis processes in Spark using examples written in Python and R.