Scaling Clojure Web Apps with Google AppEngine
Google's introduction of Java support for Google App Engine (GAE) added a low maintenance hosting solution to the world. Despite limitations of the Java support in AppEngine, it still allows to the majority of existing JVM libraries - and that includes many JVM languages.
Clojure is a JVM based language inspired by LISP, that's already being used in productions systems, eg. FlightCaster. Clojure has many strengths, some come from its LISP ancestry like macros for metaprogramming, others shine through their tight and seamless integration, such as the powerful concurrency mechanisms (Software Transactional Memory, Agents, etc) or persistent data structures. (If these latter terms are unfamiliar, this interview with Clojure creator Rich Hickey will be helpful).
But how do Clojure and GAE get along?
TheDeadline is a newly released project task management system, built using Clojure on GAE by german company freiheit.com.
InfoQ talked to Stefan Richter, of freiheit.com, about the product, the reason for choosing Clojure - and how Clojure fares with one hand tied behind its back due to the restrictions of threading on GAE.
InfoQ: How and where do you use Clojure?
We are using Clojure for our first Internet-Startup "TheDeadline".
My company freiheit.com technologies is specialized in developing large-scale internet systems as a contractor for other companies. We have been using Java for more than 10 years.
Clojure is powerful. The programs are beautiful and concise. Productivity is very high. And: It's fun! As Paul Graham once said: "Lisp's power is multiplied by the fact that your competitors don't get it". We hope we can convice new customers to choose Clojure, too.
InfoQ: Have you used Clojure on client projects yet?
No, not Clojure by now. But we've built systems in Common Lisp. One is an eCommerce system and the other one is a system that does Information Extraction by using Natural Language Parsing and Machine Learning on large data-sets.
InfoQ: What's your experience with Clojure on GAE?
Clojure works out-of-the-box with Google AppEngine (GAE). You can't use the concurrency features, but actually you don't need them. GAE is all about creating stateless applications, that can handle one request in a single thread/process.
The first thing we had to find out was how to do lisp-like interactive programming with the GAE development environment and Emacs/Slime. If you are used to this, you don't want to miss that. We published a howto in our Blog http://www.hackers-with-attitude.com.
InfoQ: What do you like about Clojure?
For me, functional programming is all about having simple datastructures and applying powerful functional abstractions on them.
This is totally different to OOP languages, where you have complex data structures with hierarchical/networked dependencies and encapsulated functionality. It is all about managing state. At least in the way the Java platform implements OOP.
Together with this, the key features of Clojure for me are actually:
I. - immutable, very simple datastructures based on hashtables/maps
II. - the lisp syntax with all the parenthesis (code = data)
III. - the macro system.
This make Clojure really powerful. You pass simple data to functions, the functions return new data or new functions that you can pass on. You have small, but powerful building blocks. And the language makes folding, transforming and filtering of data very easy.
InfoQ: There's no RDBMS on GAE. How do you access storage on GAE?
The datastore is a schema-free, distributed key-value store. Schema-free doesn't mean that you don't need a schema, but that you need to maintain it on the application side. We implemented our own DSL to declare a schema and to automatically generate all the code to access (insert, update, delete) and query the datastore.
This is one of our above mentioned "powerful functional abstraction" that operates on simple Clojure datastructures. You don't have to handle any datastore entity types inside the Clojure code. You just declare what you need and the macro system generates the corresponding code for you.
Using the datastore is totally different from using a relational database (RDB). You have to think different. But the key-value model matches very well with the Clojure datastructures. And much better than the object-relational mapping approach. And even when you send out data as an AJAX response, there is not much transformation needed. So it is simple and traceable from end-to-end.
InfoQ: What do you use for the HTML frontend?
We are using Compojure with clj-html. I was used to this approach because I used Hunchentoot (a Common Lisp Web-Server from Edi Weitz) and CL-WHO (a DSL to generate HTML from Edi, too) a lot.
For me, this is the way to write modern web-apps: Having a specific DSL to generate HTML, so that you can mix it with your application code. You are still separating the presentation from the application logic. You can create components. It is really powerful, so you don't need an extra template framework in a lisp-like language.
InfoQ: Have you run into any problems with Clojure on GAE?
AppEngine works. We only had minor problems. AppEngine differs from other Cloud-Computing environments, that you are not managing your own array of virtual servers. You just write your app using the AppEngine SDK. Deployment is really easy and the application is automatically distributed globally to the Google data-centers. When somebody tries to access your app, it is instantiated in a data-center close to this user. You get as many "machines" as you want. Or as you can pay for. :)
In the beginning, dynamic languages (jRuby, Clojure etc.) on the JVM were much slower than native Java apps. It is better now, but the App "Warmup" (the first instantiation) is still a bit slow. And it seems like when Google deploys a new API version, that the older API version gets slower. So we update quickly to new API version.
One thing would be nice: Data-Processing with MapReduce. It would be cool if we could use our app data in MapReduce-Tasks. But in general we are very happy with it.
InfoQ: How do the GAE restrictions impact you?
The restrictions you have are needed for large-scale systems: You should not start your own threads in a request. You have a time-limit for each request. You shouldn't expect to have a count-Function when you possibly have hundreds of millions or billions of records in your datastore etc. You need other mechanics then. So you have to do what you would have to keep in mind anyway, when writing an app for a global audience.
InfoQ: So what is your product, TheDeadline?
We worked with our own agile software development method for almost 10 years. We are very good organized without being bureaucratic. We have lots of experience in this area and we built a complete toolchain for agile software projects for ourselves.
Now, we are rebuilding this toolchain for cloud-computing and offer it to a larger audience as a service. And we only offer products, that we are using in our own projects, ranging from 1-2 developers over the average of 3-5 developers and up to 20 developers.
The first service is TheDeadline, an easy-to-use Todo-Manager that acts like a personal human assistant. The system keeps you updated on all important tasks, without annoying you. And we implemented a novel way on how to share with and delegate todos to a larger group of users. So this first product is not only aimed at developers, but to all types of knowledge workers and small businesses. The "Fortune 5.000.000" and not the "Fortune 500". :)
The public beta of TheDeadline starts now. In the next step we will publish Mobile Clients for the iPhone, iPad and Android and a Google Apps integration.
InfoQ: You mentioned AI concepts you used for TheDeadline. What is that about?
We are using AI techniques that are also used in Game-AI. So what we are trying to do is to build an AI-system that behaves like a really good intelligent, autonomous NPC (Non-player Character). We built for example a rule-based Expertsystem-Shell in Clojure, comparable to CLIPS, LISA or OPS5. These rules are at the core of the AI-behavior. Other techniques are Goal-oriented Action-Planning and Bayesian Learning. We are trying to learn from the user-behavior, but here we are still in the beginnings.
It is like trying to mimic the behavior of the best project manager or project assistant that you ever met. Our goal is to eliminate all human project management work on this planet. (Not the "project manager", but the "project management work"). Most project management stuff is about finding the right tasks and delegate them to the right people and then monitoring the progress, reminding people, re-planning things etc. A lot of this can be done by a machine, that interacts friendly with a human. Just put all the todos into TheDeadline. The machine then takes care for keeping everybody up-to-date and raises exceptions and questions, before something can go wrong. So you can focus on the real work.
To see what's possible with GAE and the various Clojure technologies mentioned in this article, check out TheDeadline - it's free of charge. The tool offers a modern web UI, complete with keyboard shortcuts for most interactions. A Twitter-style, inline "@name" notation can be used to assign tasks, as well as a "#tag" notation for grouping tasks using a loose set of tags.
For another look at the technology behind this, see slides from a talk Stefan gave on TheDeadline earlier this year.
"JVM based language inspired by LISP"
"The clj-html library is being deprecated in favor of Hiccup. Hiccup implements essentially the same interface as clj-html, but uses a more advanced compiler/interpreter, handles certain markup edge cases better, and offers a more complete suite of helpers. The clj-html project will remain online through August 2010, but will not be updated."
And also, Hiccup looks pretty poor at the time. It doesn't do much of what's needed to power a mid-sized web application anyway. I think I can buy into a NoTemplate movement (after all, a lot are buying the NoSQL one), but I'm skeptic as to whether Hiccup or Compujure can make the job. Will have to look into this a bit more.
Clojure is fantastic though. Good luck with TheDeadline!
"Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR ). [..] Clojure is a compiled language - it compiles directly to JVM bytecode[.]".
"Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. "
So yeah - we can say it's _a_ LISP... not that that means anything.
Try executing this perfectly valid LISP snippet in Clojure:
(car (cdr '(1 2 3)))
And no cheating and defining car and cdr...
Parser based assistents
The great original example was iwantsandy.com, which shut down services after its creator was employed by twitter.
It offered a really great parser for people, dates, tags and more. The cool thing about this idea was to use an asynchronous messaging interface (like emails/email subjects or twitter DMs) to provide a command line interface to the service.
Remember The Milk also offers similar services.
I stared to write a Sandy clone two years ago in Groovy & Grails. There the fun part was also the parser and the notification/digest services. You can find it working here: Secretari.us
Looking forward to similar services in the future. Especially as texting is ubiquitous even on mobile platforms, sending a text message (i.e. command line invocation) to a remote service doesn't require any UI fiddling but adds a clean interface usable for all (even blind people - I experienced them in text based online RPGs - MUDs and was really impressed).
Brandon Holt, Preston Briggs, Luis Ceze, Mark Oskin May 21, 2015