InfoQ Homepage Database Content on InfoQ
-
A Taste of Random Decision Forests on Apache Spark
Sean Owen introduces Spark, Scala and random decision forests, and demonstrates the process of analyzing a real-world data set with them.
-
Analyzing Social Networks with F#
Evelina Gabasova explains how to run a social network analysis on Twitter and how to use data science tools to find out more about followers.
-
Don’t Let Data Gravity Crush Your Infrastructure
Dave McCrory talks about what is Data Gravity, how it affects performance and portability and why these effects are amplified when there are larger volumes of data.
-
How SoundCloud Uses Cassandra
Emily Green is taking a look at how SoundCloud uses Cassandra. She describes a couple of Cassandra instances, from the point of view of the products and functionality they support.
-
Customer Insight, from Data to Information
Thore Thomassen shares from experience how to combine structured data in a DWH with unstructured data in NoSQL, and using parallel data warehouse appliances to boost the analytical capabilities.
-
Efficient Data Storage for Analytics with Parquet 2.0
Julien Le Dem discusses the advantages of a columnar data layout, specifically the features and design choices Apache Parquet uses to achieve goals of interoperability, space and query efficiency.
-
GORM Inside and Out
Jeff Scott Brown introduces GORM, a super powerful ORM tool that makes ORM simple by leveraging the flexibility and expressiveness of a dynamic language like Groovy.
-
Programming and Testing a Distributed Database
Reid Draper shows how real world distributed database work, communicate and are tested, trading RPC for messaging, unit-tests for QuickCheck, and micro-benchmarks for multi-week stress tests.
-
Using a Graph Database for JVM Heap Analysis
James Richardson, Nat Pryce discuss some of the challenges faced using Neo4J for interactive analysis of large data imports (80K nodes, 150k relationships) and how they overcame them.
-
Big Data in Memory
John Davies shows a Spring work-flow consuming 7.4kB XML messages, binding them to 25kB Java but storing them in just 450 bytes each, 10 million derivative contracts in-memory on a laptop.
-
Gobblin: A Framework for Solving Big Data Ingestion Problem
Lin Qiao discusses the architecture of Gobblin, LinkedIn’s framework for addressing the need of high quality and high velocity data ingestion.
-
Remote Access Made Easy and Fast with Haskell
Simon Marlow explains how to use Haxl to automatically batch and overlap requests for data from multiple data sources.