InfoQ Homepage Infrastructure Content on InfoQ
-
In-Memory Caching: Curb Tail Latency with Pelikan
Yao Yue introduces Pelikan - a framework to implement distributed caches such as Memcached and Redis. She discusses the system aspects that are important to the performance of such services.
-
Data Preparation for Data Science: A Field Guide
Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.
-
Challenging Perceptions of NHS IT
Edward Hiley, Dan Rathbone talk about how NHS Digital has built a highly secure and resilient system for processing patient data, applying techniques more often used in the cloud to bare metal servers
-
AI from an Investment Perspective
The panelists discuss AI from an investment perspective, the challenges, the risks, trends, the role of Deep Learning, successful AI use cases, and more.
-
Testing Programmable Infrastructure with Ruby
Matt Long talks about some approaches to environment infrastructure testing that his team at OpenCredo has created using Ruby.
-
Causal Consistency for Large Neo4j Clusters
Jim Webber explores the new Causal clustering architecture for Neo4j, how it allows users to read writes straightforwardly, explaining why this is difficult to achieve in distributed systems.
-
Big Data Infrastructure @ LinkedIn
Shirshanka Das describes LinkedIn’s Big Data Infrastructure and its evolution through the years, including details on the motivation and architecture of Gobblin, Pinot and WhereHows.
-
Real-Time Recommendations Using Spark Streaming
Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.
-
Building a Data Science Capability from Scratch
Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.
-
Data Science in the Cloud @StitchFix
Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.
-
Petabytes Scale Analytics Infrastructure @Netflix
Tom Gianos and Dan Weeks discuss Netflix' overall big data platform architecture, focusing on Storage and Orchestration, and how they use Parquet on AWS S3 as their data warehouse storage layer.
-
Big Data in the Real World: Technology and Use Cases
Mike Olson presents several use cases where big data is collected and analyzed to gather insights from the automotive, insurance, financial, and other sectors.