InfoQ

News

Presentation: Jinesh Varia About Amazon Alexa Web Service's Architecture

Posted by Abel Avram on Aug 16, 2008

Community
Architecture,
SOA
Topics
Clustering & Caching ,
Cloud Computing
Tags
Amazon Web Services ,
S3 ,
EC2 ,
Amazon ,
Scalability

In this presentation, Jinesh Varia, a Web Services Evangelist at Amazon, talks about the architecture of one of Amazon's web services called Alexa. Jinesh explains how Amazon has reached scalability, performance and reduced costs for the Alexa service.

Watch:  Jinesh Varia About Amazon's Alexa Web Service (43 min)

The Alexa Web Service, backed by an application called internally as GrepTheWeb, gathers various information about web sites including traffic data, contact information, and more. The collected data is then made available to clients which can run specialized queries against it in order to find specific information.

Jinesh explains that GrepTheWeb uses Hadoop, a free Java software platform which can be used to run applications processing vast amounts of data which, in this case, are stored on Amazon's Simple Storage Service (S3), and are retrieved by Hadoop clusters when a client request is processed. Finally a result is returned to the customer. Hadoop runs inside Amazon's Elastic Compute Cloud (EC2). 

The whole architecture is in a cloud whose internals are completely hidden from the service customer. When a request is issued, an entire framework is built on as many machines as is necessary in order to process it and generate a result, then the whole framework disappears. The cloud architecture makes the whole service highly scalable. By being able to extend it on theoretically unlimited number of nodes, the service has good performance. Since the entire service support is created on the fly and exists only while processing a request, the costs are low.

One of the main features of the Alexa's architecture is fault tolerance. The data is duplicated and stored on physically different locations to avoid data loss, and Hadoop takes care of spawning and controlling as many processes as necessary to process the large amounts of data involved.

Slides not clear by A D Posted Aug 19, 2008 7:34 AM
Re: Slides to download by Himanshu Bafna Posted Aug 21, 2008 8:08 AM
  1. Back to top

    Slides not clear

    Aug 19, 2008 7:34 AM by A D

    slides related to architecture are not clear and missing some of the components described in the presentation.

  2. Back to top

    Re: Slides to download

    Aug 21, 2008 8:08 AM by Himanshu Bafna

    From where can I download the slides.

Educational Content

QCon SF Keynote: Techie VC's Talk About Trends & Opportunities

Kevin Efrusy and Salil Deshpande talk about what makes a business successful or not, presenting three actual cases they have been involved with: Hyperic, G2One, SpringSource.

Project Lead Mark Fisher Discusses the Spring Integration Project

InfoQ talks to Mark Fisher, project lead for the Spring Integration project, about the framework.

How HTML5 Web Sockets Interact With Proxy Servers

Peter Lubbers explains in this article how HTML5 Web Sockets interact with proxy servers, and what proxy configuration or updates are needed for the Web Sockets traffic to go through.

Rails in the Large: How Agility Allows Us to Build One Of the World's Biggest Rails Apps

Neal Ford shows what ThoughtWorks learned from scaling Rails development: infrastructure, testing, messaging, optimization, performance.

Stuart Halloway on Clojure and Functional Programming

Stuart Halloway discusses Clojure and functional programing on the JVM in depth, and touches on the uses of a number of other modern JVM languages including JRuby, Groovy, Scala and Haskell.

Oren Teich and Blake Mizerany on Heroku

Oren Teich and Blake Mizerany talk about the technology behind Heroku and the benefits of the new add-on system.

Security for the Services World

Chris Riley presents security issues threatening service based systems, examining security threats, presenting measures to reduce the risks, and mentioning available security frameworks.

Navigating The Rapids:Real-World Lessons in Adopting Agile

This talk investigates technical issues encountered when moving to an Agile process.