Presentation: Jinesh Varia About Amazon Alexa Web Service's Architecture

by Abel Avram on Aug 16, 2008 |

In this presentation, Jinesh Varia, a Web Services Evangelist at Amazon, talks about the architecture of one of Amazon's web services called Alexa. Jinesh explains how Amazon has reached scalability, performance and reduced costs for the Alexa service.

Watch:  Jinesh Varia About Amazon's Alexa Web Service (43 min)

The Alexa Web Service, backed by an application called internally as GrepTheWeb, gathers various information about web sites including traffic data, contact information, and more. The collected data is then made available to clients which can run specialized queries against it in order to find specific information.

Jinesh explains that GrepTheWeb uses Hadoop, a free Java software platform which can be used to run applications processing vast amounts of data which, in this case, are stored on Amazon's Simple Storage Service (S3), and are retrieved by Hadoop clusters when a client request is processed. Finally a result is returned to the customer. Hadoop runs inside Amazon's Elastic Compute Cloud (EC2). 

The whole architecture is in a cloud whose internals are completely hidden from the service customer. When a request is issued, an entire framework is built on as many machines as is necessary in order to process it and generate a result, then the whole framework disappears. The cloud architecture makes the whole service highly scalable. By being able to extend it on theoretically unlimited number of nodes, the service has good performance. Since the entire service support is created on the fly and exists only while processing a request, the costs are low.

One of the main features of the Alexa's architecture is fault tolerance. The data is duplicated and stored on physically different locations to avoid data loss, and Hadoop takes care of spawning and controlling as many processes as necessary to process the large amounts of data involved.

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Slides not clear by A D

slides related to architecture are not clear and missing some of the components described in the presentation.

Re: Slides to download by Himanshu Bafna

From where can I download the slides.

Re: Slides to download by Jinesh Varia

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

3 Discuss
General Feedback
Marketing and all content copyright © 2006-2016 C4Media Inc. hosted at Contegix, the best ISP we've ever worked with.
Privacy policy

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.