Machine Learning on Big Data for Personalized Internet Advertising
Michael Recce discusses how advertising works and what algorithms Quantcast uses to analyze large amounts of data in order to find out what people are interested in.
Michael Recce discusses how advertising works and what algorithms Quantcast uses to analyze large amounts of data in order to find out what people are interested in.
Azavea a company based in Philadelphia that provides products for geographical data, has published an open source product called GeoTrellis under GNU GPL v3 license which is a geographic data processing engine for high performance applications.
Developed since 2010 by Rich Hickey and the Relevance team, Datomic offers some new approaches to database architecture. Leveraging current trends in cloud and storage it has strong transactions, rich query API and read scaling.
Version 2.0 of Hazelcast, a Java-based caching, clustering and data distribution solution, has recently been released. As part of this, the product is now offered in both commercial Enterprise and free open-source Community Editions.

With the emergence of inexpensive cloud-based storage and cost-effective ways to process large volumes and complex data there has been a shift in approach toward data integration.

Lucene geospatial extension proposed in this article is based on a two level search – first level search is based on Cartesian Grid search and the second level implements shape specific spatial calculations
![]()
As more companies adopt Hadoop, its integration with other applications is becoming more important. A key to such integration is usage of the appropriate OutputFormat allowing to produce output data in a form most appropriate for other applications.
Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.
Sid Anand presents the architecture set in place at LinkedIn and the data infrastructure running Java and Scala apps on top of Oracle, Voldemort, DataBus and Kafka.

In this interview at QCon London, LinkedIn’s Sid Anand discusses the problems they face when serving high-traffic, high-volume data. Sid explains how they’re moving some use cases from Oracle to gain headroom, and lifts the hood on their open source search and data replication projects, including Kafka, Voldemort, Espresso and Databus.

Ron Bodkin of Big Data Analytics discusses early adoption of Hadoop, NoSQL and big data technologies. He discusses common patterns and explains how developers can write low-level primitives to optimize MapReduce function. Other topics include Hive, Pig, multi tenancy, and security.