Jay Kreps of LinkedIn presented some informative details of how they process data at the recent Hadoop Summit. Kreps described how LinkedIn crunches 120 billion relationships per day and blends large scale data computation with high volume, low latency site serving.
The Hadoop Summit of 2010 included presentations from a number of large scale users of Hadoop and related technologies. Notably, Facebook presented a keynote and details information about their use of Hive for analytics. Mike Schroepfer, Facebook's VP of Engineering delivered a keynote describing the scale of their data processing with Hadoop.
The Hadoop Summit of 2010 started off with a vuvuzela blast from Blake Irving, Chief Product Officer for Yahoo. Yahoo delivered keynote addresses that outlined the scale of their use, technical directions for their contributions, and architectural patterns in how they apply the technology.
The GigaOM Stucture conference a couple of weeks ago addressed many areas of cloud computing. One of the key themes of the event was the emergence of new data architectures. Throughout the panels, interviews, and presentations many speakers identified significant changes in how data gets handled that will be coming.
Recently Adobe released Puppet recipes that they are using to automate Hadoop/HBase deployments to the community. InfoQ spoke with Luke Kanies, founder of PuppetLabs, to learn more about what this means.
The team at Reductive Labs recently announced the release of version 0.25.2 of Puppet, the open source Ruby-based configuration management and automation tool for Linux and Unix servers. In this software bug-fix release, 123 open tickets were closed, and the developers claim a reduced memory footprint, improved error reporting, threading, and lock contention (a source of reported system hangs).
Continuous deployment has gained a recent buzz in the Lean-slanted "eliminate work-in-progess" movement. But while many may find this an intriguing and logically worthwhile objective, many less can visualize how this might actually be achieved. Ash Maurya helps to fill this gap by describing his experience with making it happen at his company.
Plura Processing is a SETI-like distributed network harnessing the power of tens of thousands of computers.
MySpace and Fusion-io recently announced they are working together to reduce datacenter operations costs. Using Fusion-io's ioDrive SSDs, MySpace replaced 150 standard load servers, and reduced their number of heavy load servers from 80 to 30. Overall a reduction of 51% in server footprint was achieved, and MySpace will replace over 1700 of their remaining 2U servers as they reach end-of-life.
FlightCaster recently open sourced Crane, a tool for distributing and remotely controlling Clojure instances, currently specialized for EC2. Incanter is a Clojure library and tool that makes R-like statistical computations easy with Clojure. Also: the build and dependency management tool Leiningen 1.0 is now available.
Microsoft has announced the opening of 2 new data centers, one in Dublin, Ireland, and another in Chicago, US. These data centers are a preparation for the announcement Microsoft is going to make at PDC 2009 regarding the commercial availability of Windows Azure services.
IBM announces three new ways for businesses to utilize cloud computing: standardized services on the IBM cloud, private cloud services behind the firewall (managed by the business or IBM) and Cloud burst a way to seamless incorporate secure public clouds to accommodate "overflow" demand for services.
We take a look at 3 tools that will help streamline Ruby projects. Hoe 2.0.0 sets up projects and is now extensible with plugins. YARD is a documentation generator like RDoc and it's now powered by a new faster parsing strategy. Finally: Whenever takes care of defining and updating your crontab file - and it's configured with Ruby code.
GitHub now offers an installable version of the service for users who want to keep their code inside their network - and it's built on JRuby. TorqueBox is a new solution for running JRuby on Rails on JBoss, complete with integration for job queues and SIP integration Also: EngineYard announced it will start providing JRuby as a hosting option in July.
In a recent podcast, James Gosling talked to Danny Coward about the significance of Sun's new Hotspot garbage collector Garbage First for developers of large-scale enterprise systems.