InfoQ

InfoQ

Topic/Tag specific view

Big Data Content on InfoQ


Latest featured content about Big Data

Machine Learning on Big Data for Personalized Internet Advertising

Topics
QCon San Francisco 2011,
Big Data,
Database Design,
QCon,
Data Analysis,
Database,
Conferences,
Advertising

Michael Recce discusses how advertising works and what algorithms Quantcast uses to analyze large amounts of data in order to find out what people are interested in.

News about Big Data

Azavea Announces Release of GeoTrellis under GPLv3 License

Topics
Big Data,
Database Design,
API,
Cloud Computing,
Open Source Projects,
GNU,
Open Source Project Releases,
Programming,
Database

Azavea a company based in Philadelphia that provides products for geographical data, has published an open source product called GeoTrellis under GNU GPL v3 license which is a geographic data processing engine for high performance applications.

Rich Hickey's Datomic embraces Cloud, intelligent Applications and Consistency

Topics
Java,
Big Data,
Clojure,
NoSQL,
Languages,
Database Design,
LISP,
JVM Languages,
Database,
Programming,
Dynamo DB,
RDF

Developed since 2010 by Rich Hickey and the Relevance team, Datomic offers some new approaches to database architecture. Leveraging current trends in cloud and storage it has strong transactions, rich query API and read scaling.

Hazelcast 2.0 Released with Off-Heap Storage and Distributed Backups

Topics
Distributed Cache,
Java,
Caching,
Big Data,
Languages,
Clustering & Caching,
Database Design,
Programming,
Performance & Scalability,
Infrastructure,
Database

Version 2.0 of Hazelcast, a Java-based caching, clustering and data distribution solution, has recently been released. As part of this, the product is now offered in both commercial Enterprise and free open-source Community Editions.

Articles about Big Data

Evolution in Data Integration From EII to Big Data

Topics
Big Data,
Database Design,
Big Data Infrastructure,
Enterprise Information Integration,
Infrastructure,
Enterprise Architecture,
Cloud Computing,
Database

With the emergence of inexpensive cloud-based storage and cost-effective ways to process large volumes and complex data there has been a shift in approach toward data integration.

Implementing Lucene Spatial Support

Topics
Big Data,
HBase,
Database Design,
Columnar Databases,
Search,
Database,
Lucene

Lucene geospatial extension proposed in this article is based on a two level search – first level search is based on Cartesian Grid search and the second level implements shape specific spatial calculations

Exploring Hadoop OutputFormat

Topics
Big Data,
Database Design,
Hadoop,
Database

As more companies adopt Hadoop, its integration with other applications is becoming more important. A key to such integration is usage of the appropriate OutputFormat allowing to produce output data in a form most appropriate for other applications.

Presentations about Big Data

Grid Gain vs. Hadoop. Why Elephants Can't Fly

Topics
QCon London 2012,
Big Data,
QCon,
Data Analysis,
Database Design,
GridGain,
Database,
Hadoop,
Conferences

Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.

Data Infrastructure @ LinkedIn

Topics
Messaging,
Big Data,
QCon London 2012,
Web Services,
Operations,
NoSQL,
QCon,
Database Design,
SOA,
Enterprise Architecture,
Conferences,
Performance & Scalability,
Infrastructure,
Database,
Architecture

Sid Anand presents the architecture set in place at LinkedIn and the data infrastructure running Java and Scala apps on top of Oracle, Voldemort, DataBus and Kafka.

Interviews about Big Data

Big Data Architecture at LinkedIn

Topics
Neo4j,
Neo,
Cassandra,
MongoDB,
Riak,
Graph Database,
Companies,
BigTable,
Key-Value Store,
Distributed Document Oriented Database,
Big Data,
Database Design,
NoSQL,
Hadoop,
Cloud Computing,
Database,
Voldemort,
Lucene,
Dynamo DB

In this interview at QCon London, LinkedIn’s Sid Anand discusses the problems they face when serving high-traffic, high-volume data. Sid explains how they’re moving some use cases from Oracle to gain headroom, and lifts the hood on their open source search and data replication projects, including Kafka, Voldemort, Espresso and Databus.

Hadoop and NoSQL in a Big Data Environment

Topics
Big Data,
QCon San Francisco 2011,
Continuous Delivery,
NoSQL,
Data Access,
Design Pattern,
Database Design,
QCon,
Agile Techniques,
Object Oriented Design,
Design,
Patterns,
Database,
Performance & Scalability,
Agile,
Data Warehousing,
Conferences,
Design Patterns,
Data Warehouse,
MapReduce,
Data Storage

Ron Bodkin of Big Data Analytics discusses early adoption of Hadoop, NoSQL and big data technologies. He discusses common patterns and explains how developers can write low-level primitives to optimize MapReduce function. Other topics include Hive, Pig, multi tenancy, and security.