Machine Learning on Big Data for Personalized Internet Advertising
Michael Recce discusses how advertising works and what algorithms Quantcast uses to analyze large amounts of data in order to find out what people are interested in.
Michael Recce discusses how advertising works and what algorithms Quantcast uses to analyze large amounts of data in order to find out what people are interested in.
Recently Couchbase published a comparison of Couchbase and CouchDB to denote the differences and simlarities between the two. This document addresses a common question: "What is the difference between CouchDB and Couchbase?", and what happened to Membase? InfoQ caught up with James Phillips, a Couchbase founder, to discuss the comparison and the merger of the two products Membase and CouchDB.
VMware has today announced VMware vFabric Suite 5.1, adding automated deployment, enterprise open source support, and PostgreSQL capabilities, as well as an expansion to the SQLFire in-memory database.
Amazon has announced support for .NET on AWS Elastic Beanstalk and a new RDS service for SQL Server, bringing better manageability to .NET/SQL Server apps hosted on AWS.
Brian C. Dilley covers pitfalls, & strengths of using MongoDB ("a very approachable NoSQL solution"), and introduces MJORM. The MJORM project is an annotation free MongoDB Java ORM library. This article builds on Brian's real world in the trenches experience with MongoDB and includes "gotchas" like "Don't treat MongoDB like an RDBMS...", how to "design your indexes carefully", and more.

In-memory data grids (IMDG) are gaining lot of attention recently because of their support for dynamic scalability and high performance for data intensive applications. InfoQ spoke with Jags Ramnarayan, Chief Architect for GemFire products at VMWare, about the architecture of in-memory data grids, their advantages compared to the traditional databases, and emerging trends in this space.

With the emergence of inexpensive cloud-based storage and cost-effective ways to process large volumes and complex data there has been a shift in approach toward data integration.
Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.
Sastry Malladi discusses the performance implications of using various data formats and versioning across eBay, showing the results of certain benchmarks concluding that JSON is the best format.

In this interview at QCon London, LinkedIn’s Sid Anand discusses the problems they face when serving high-traffic, high-volume data. Sid explains how they’re moving some use cases from Oracle to gain headroom, and lifts the hood on their open source search and data replication projects, including Kafka, Voldemort, Espresso and Databus.

Rich Hickey explains the ideas behind the Datomic database: why Datalog is used as the query language, the functional programming concepts at its core, the role of time in the DB and much more.
![]()
With Spring Data, the ever popular Spring Framework has cultivated a new patch of ground, bringing Big Data and NOSQL technology like Neo4j to enterprise developers. This guide introduces you to Spring Data Neo4j, using the fast, powerful and scalable graph database Neo4j to enjoy the benefits of having good relationships in your data.

Java Transaction Design Strategies shows how to design an effective transaction management strategy using the transaction models provided by Java-based frameworks such as EJB and Spring. Local, programmatic, declarative, and XA models are explained; the book concludes with a set of design patterns show how to effecitvely use these models.