Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google 'simplifies web development' with AppEngine

Google 'simplifies web development' with AppEngine

This item in japanese

At Campfire One on April 7th, 2008, Google introduced Google App Engine as a way to simplify the job of creating, running and scaling web applications, to make it 'easy.' In essence, Google App Engine allows you to build web applications locally using and then deploy them on Google's infrastructure.

This is a preview release; it's not feature complete and there is a quota system, a set of limits in terms of storage, CPU and bandwidth that applications can use during the preview period, for free. Once the preview period is over, that quota will remain free, but developers will be able to purchase additional resources as needed. The cost for additional resources has not yet been shared (and possibly not even established).

The quotas in the preview release included: 3 apps per developer, 500MB storage per app, and per day (rolling 24 hour) quotas of 2000 emails, 10 GB bandwidth in, 10 GB bandwidth out, 200M CPU Megacycles, 650k HTTP Requests, 2.5M datastore API calls and 160k URLFetch API calls.

Technology: Development Environment and APIs

The technology stack is currently based on Python, one of Google's sanctioned languages, although Google says that they 'look forward to supporting more languages in the future.' Google offers a Python runtime environment that runs in a secure sandbox which provides limited access to the underlying operating system, for the purposes of security and scale. That environment includes the standard library and can be extended through modules as long as they don't employ C:

The environment includes the Python standard library. Of course, calling a library method that violates a sandbox restriction, such as attempting to open a socket or write to a file, will not succeed. For convenience, several modules in the standard library whose core features are not supported by the runtime environment have been disabled, and code that imports them will raise an error.

Application code must be written exclusively in Python. Code with extensions written in C is not supported.

Other security limitations include outbound communication only through the supplied email and URL fetch APIs, inbound communication over HTTP and HTTPS on the standard ports, no filesystem write access and no sub-processes or code execution outside the request-response loop (e.g. background and batch processing).

In addition, Google offers APIs to access a Datastore, Google user accounts, URL fetch and email services. App Engine also includes a simplified web application framework and Django 0.96.1, although the App Engine Datastore is not relational, and can't be used with all Django APIs.

The datastore API is backed by Google's BigTable, but has a lot in common with a simple object persistence API (or an object-relational mapping framework, even though Google takes care to point out that the datastore isn't relational):

For most of you, working with the Datastore will probably take a little getting used to: as I've said, it's not SQL. That's a big difference. However, we think that after a while, the Datastore may actually grow on you, because it makes some things easier. For one thing, our datastore is schema-less, meaning it can support arbitrary new properties or columns, which you can create as you code, without having to design everything up front and create a schema. This comes back to our goal of making writing a web app as easy as possible: just start coding. Your data model can evolve along with your app.

Even though the Datastore is a departure from SQL, we still support a lot of powerful functionality that you usually expect from a traditional database. The Datastore supports efficient queries on any single property or set of properties you provide. It supports provides sort orderings on your query results, including sort orders on multiple properties. It supports transactions for writes, with transactional groupings that you control. It supports batch operations for fetching or creating a large number of entities. It optionally allows you to control the primary key of your entities, for more efficient queries and shorter URLs.

And, even though the Datastore is not SQL, we're providing you with a SQL-like query language, called GQL, to make it easier to formulate queries. GQL is in the spirit of jQuery and FBQL: the underlying store is not SQL, but nearly all of the queries that you'd like to do can still be accomplished.

One big feature that you may have noticed that our Datastore doesn't have, though, is joins. The reason for this is that joins are usually a source of performance problems in a distributed system, when you go beyond a single machine: it's much harder to efficiently support a join on a distributed system that spans many computers and many hard disks.

Although the datastore API supports transactions, they have strict limits and are tied to entity groups:

Every entity belongs to an entity group, a set of one or more entities that can be manipulated in a single transaction. Entity group relationships tell App Engine to store several entities in the same part of the distributed network. A transaction sets up datastore operations for an entity group, and all of the operations are applied as a group, or not at all if the transaction fails.

When the application creates an entity, it can assign another entity as the parent of the new entity. Assigning a parent to a new entity puts the new entity in the same entity group as the parent entity.

An entity without a parent is a root entity. An entity that is a parent for another entity can also have a parent. A chain of parent entities from an entity up to the root is the path for the entity, and members of the path are the entity's ancestors. The parent of an entity is defined when the entity is created, and cannot be changed later.

Every entity with a given root entity as an ancestor is in the same entity group. All entities in a group are stored in the same datastore node. A single transaction can modify multiple entities in a single group, or add new entities to the group by making the new entity's parent an existing entity in the group.

Because App Engine forces you to approach your development in a particular way (e.g. Datastore on BigTable instead of database), Google argues that your application will be easier to scale and can scale nearly transparently:

When a web app surges in popularity, the sudden increase in traffic can be overwhelming for applications of all sizes, from startups to large companies that find themselves rearchitecting their databases and entire systems several times a year. With automatic replication and load balancing, Google App Engine makes it easier to scale from one user to one million by taking advantage of Bigtable and other components of Google's scalable infrastructure.

The User API allows for user authentication / login via Google Account, and access to the account's nickname and email. Any further user information could be gathered directly from the user by the application and stored in the datastore.

The URL fetch API allows for retrieval of information from remote servers by fetching HTTP and HTTPs URLs (supports GET, POST, HEAD, PUT and DELETE, so it seems as if this would support REST functionality).

The Mail API allows for App Engine applications to send email asynchronously with retries if the mail server is unavailable.

The App Engine SDK includes a server to simulate the App Engine python runtime environment, and:

  • reproduces the module import restrictions, and only allows handlers to import an allowed module from the standard library, the third-party libraries included in the App Engine Python environment, and modules in the application directory
  • reproduces the app caching behavior
  • emulates the App Engine datastore using local files
  • emulates Google Accounts with sign-in and sign-out pages that accept any email address
  • emulates the URL fetch service by fetching URLs directly from your computer
  • emulates the mail service using an SMTP server or Sendmail configuration of your choice

At first glance, most of the application configuration seems to be done in YAML.

Motive and Competition

Google's announcement describes their motives, to make it easier to build, deploy and scale out web applications:

Well, we built App Engine because we want more web apps to get created. What we noticed is that, today, it's pretty hard to create one: there are significant upfront challenges to deploying even the simplest of web applications. You've got a lot of tasks to do. First, you have to write the code for your app, of course.

But then, you also have to write your Apache web server configs and startup scripts, set up your SQL database, create all of it's tables and hook up the passwords, set up monitoring so you can tell what's going on with your traffic and logs, decide how you'll push new versions of your code, and on, and on.

That's the technical setup challenge that we noticed. And then, once you've done all that sysadmin work, you have another challenge: you have to actually go find machines you can use somewhere, physically or from a virtual provider, to run your app somewhere. Right now, that costs money: even for the smallest app, which you use a few times a week, you have to pay a pretty big upfront fee to run that app with a traditional hosting provider.

So that's the financial or physical challenge. And then, once you've got the whole thing set up and working, and found and paid for a place to test it out, you've got another challenge: you've got to maintain it all as your app grows. Your machines crash, your configs have errors, your hard disks break, your traffic starts to grow, you have to re-shard your databases, set up more machines and on. Keeping everything going as your app grows is a hassle.

All of these hassles are what we're trying to abstract away with App Engine. They are the problems that we're trying to fix.

Others are already speculating about additional motives. Many point out potential competition with Amazon and Microsoft over the future of cloud computing and web services, often comparing App Engine to Amazon's web services EC2, S3, SQS and SimpleDB:

  • O'Reilly Radar said:

    After Amazon Web Services started doing so well we all knew it was just a matter of time (next will be Microsoft we can can safely assume). Though the obvious comparison is to AWS, they aren't really the same beast. Amazon has released a set a disparate services that can be used to created a general computing platform. The services, though they work together, do not come bundled.

    App Engine on the other hand is almost literally an engine for powering web applications. It bundles together many of the features that AWS offers into a singular package: storage like S3, auto-scaling and processing power like EC2, and a datastore like SimpleDB. App Engine also offers things that are not available on AWS like a Python runtime, Google-specific APIs and perhaps most notably a free portion of the service.

  • VentureBeat: "Google App Engine readies for brawl with Amazon"

Others suggest that Microsoft is heading in this direction as well with things like Ray Ozzie's Mesh strategy and SQL Server Data Services, but may already be too late:

Looking at another angle, some suggest that this could give Google a head-start on acquisitions, a form of venture infrastructure:

  • Business Week argued that the competition between Google and Amazon misses the point, that encouraging startups to develop their applications in Google's infrastructure gives Google "not only good visibility into the kinds of applications people want and the problems it may need to overcome with them, but also a bird's-eye view into the most promising new startups it might want to acquire".
  • ZDNet added that it could save Google money on acquisitions: "imagine how much time and effort could be saved if a company purchased by Google already uses Google's technology?"
  • GigaOM says, "This type of loss-leader service gets startups in the door with Google, giving the company access to the freshest ideas and an entrepreneurial talent pool that it can tap." In "How Google Can Eat Amazon's Lunch,"
  • Kevin Kelleher calls this investing:

    In the interview I speculated aloud that what Amazon was doing was a lot like what corporate VC arms like Intel Capital do — invest in startups with which they will work — or buy — later on. Only instead of using hard cash, they were using infrastructure. Very shrewd, I said.

    The executive's response was that Amazon was not doing that at all, and that it would never do that with web services. I thought but didn't say: Well, if you don't do it someone else will.

    Now some pig is saying that Google is doing it. As valued Google workers pack up their desks and launch new startups, this is the single best strategy for Google to bring them back into the fold. And it's a great way to pull the rug out from under Amazon, strategy-wise and profit-wise.

Feedback, Analysis and Resources

Rate this Article