InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Consistency Models in New Generation Databases
Roger Bodamer talks about consistency models in NoSQL databases, showing how different products deal with replication, multiple copies of information, consistency, failover, high availability.
-
Webmail for Millions, Powered by Erlang
Scott Lystig Fritchie presents the architecture and lessons learned implementing a webmail system in Erlang, using UBF and Hibari, a distributed key-value store, to accommodate a large user base.
-
NoSQL at Twitter
Ryan King presents how Twitter uses NoSQL technologies - Gizzard, Cassandra, Hadoop, Redis - to deal with increasing data amounts forcing them to scale out beyond what the traditional SQL has to offer
-
Enterprise NoSQL: Silver Bullet or Poison Pill?
Billy Newport explains the fundamental differences between SQL and NoSQL, creating awareness that NoSQL is not suited for many cases, and people should make informed decisions before buying into it.
-
NoSQL at Twitter
Kevin Weil presents how Twitter does data analysis using Scribe for logging, base analysis with Pig/Hadoop, and specialized data analysis with HBase, Cassandra, and FlockDB.
-
Large Scale Map-Reduce Data Processing at Quantcast
Ron Bodkin presents the architecture used by Quantcast to process 100s of TB of data daily using Hadoop on dedicated systems, the applications, the type of data processed, and the infrastructure used.
-
Yes, SQL!
Uri Cohen reviews SQL and distributed data stores, presenting how various API’s – memcached, SQL/JDBC, JPA - can be used to interact with such data stores, specifying what jobs they are best used for.
-
HyperGraphDB - Data Management for Complex Systems
Borislav Iordanov presents the architecture of HyperGraphDB, a special type of store based on hypergraphs – graphs with edges pointing to an arbitrary number of nodes and to other edges.
-
Abstractions at Scale–Our Experiences at Twitter
Marius Eriksen considers that leaky abstractions lead to scalability issues, while those providing narrow access to explicit resources - map-reduce, shared-nothing web apps, big table - scale better.
-
The Evolution of the Flickr Architecture
Mikhail Panchenko discusses how Flickr’s code base developed over the years and the scalability problems that started to appear, presenting the the improvements and pros/cons of technologies used.
-
Adopting Apache Cassandra
Eben Hewitt introduces the Apache Cassandra project to those interested in getting a quick clear picture of what Cassandra is, what are its main features, what is the the data model used and the API.
-
Machine Learning: A Love Story
Hilary Mason presents the history of machine learning covering the most significant developments in the area, and showing how bit.ly uses it to discover various statistical information about users.