In part 2 of the interview with the FlightCaster team (Part 1 covered how FlightCaster is uses Clojure), the questions focus on their use of Rails and Heroku, how to gather and integrate data from multiple sources and creating multiple UIs for mobile devices and integrating them into the system.
InfoQ talked to James Bracy, Jon Bracy, Jared Strate, Jonathan Chase, of Flightcaster, and Pavel Petrek of Inmite (FlightCaster's Blackberry partner.)
InfoQ: How do you handle long running processes in your web frontend?
Jon: We use the Heroku Delayed Jobs add-in for real-time capture from many sources. If we can't consistently capture the data, it’s gone forever,so reliability is important. Right now the DJs handle over 2 million updates a day. Each data source collection is run as it’s own worker queue. If a DJ goes down Heroku brings up another the work continues. If more workers are need, it is as simple a running a single command. We also use DJs for batch processing, normalization, and transformation. Times, for example, get normalized on the DJs.The only issue we have had was with the memory limit of the DJs when pulling the FlighStats data. The XML file were just to big. Really this revealed a flaw in how we were processing the data and is forcing us to do it a better way. So instead of using a DOM parser we are moving to a SAX parser.
InfoQ: What's the experience with Heroku in the first weeks after your launch?
Jon: Heroku has scaled on demand for the production front end - even during the peak of when the New York Times, Wall Street Journal, Reuters, Techcrunch, InfoQ and slashdot all
wrote about FlightCaster at once. Durning the peak times I felt totally comfortable deploying small fixes we found on the fly to Heroku, it just worked. They also have build in caching, so after all of two minutes setting that up, I didn't worry much. We have also gotten great scalability for data capture and processing of the raw incomming data.
InfoQ: What's your experience with Rails so far?
Jared: Our site is not purely Rails, there is some Javascript / ajax to simplify input and user experience. The delay factor boxes are not part of the algorithm, logic is applied to flight data to get these. The weather data(metar reports) is parsed and some basic logic applied to determine significant weather related delays, and inform the traveler of weather conditions. Official status, estimated time of arrival/departure are analyzed to determine some obvious delays. Inbound plane data analyzed to determine obvious delays. We do some simple thresholding math to determine how predictions are displayed. Timezones were a big issue, depending on where the user is, inbound plane is, destination, etc.
InfoQ: Which parts of Rails do you use? All the ORM, REST, etc?
Jon: All of it for different things. But there isn't a part of Rails we don't use. Never use a field that Rails uses. We started using the 'updated_at' and 'created_at' fields for storing timestamp information about when the data got updated, not when we received it and put it in our database. This works if you tell Rails not to write to the fields, but you always have to remember to do that.
Jared: User input is parsed and given to the autocomplete; Rails autocomplete is awesome.
InfoQ: How does the data get from the backend to the frontend?
Jon: The Rails webserver reads in the JSON produced by the Clojure / Cascading / Hadoop side of the stack, and uses it to interpret incoming real time data and formulate the predictions. Clients recveive the the predictions and real time data via the REST API. The clients are just a view, we keep the logic on the server.
InfoQ: How do you get your data? Is there one point to get the delay and other data you need?
Jon: Right now we are pulling FAA data from the source, NOAA data for weather, and Flightstats for carriers. We are also starting to get deals with certian carriers to pull their data directly.
InfoQ: Mashups are popular - have you run into problems with integrating data from a lot of sources?
Jon:Dates and times have been our biggest problem. Time zones and daylight savings times all have to be taken into account, otherwise you could be searching for a flight today and end up with tomorrows flight. It was often difficult to find the time zones that the airports were located in. The latitude and longitude of the airports was readily available, though. So using the latitude and longitude we were able to discover what time zone the airports were in using a time zone vector map.Some of our sources also give invalid dates and times (such as 26:00). Data, such as invalid dates, typically just gets thrown out. Although we do get notified about the invalid data just to keep tabs on it. The amount of data following through our systems allows us to do this, but it really depends on the context of what you are doing.We have a lot of issues with the airline and FAA data. The IATA, ICAO, and FAA codes for airlines and airports cause a lot of problems. Airline IATA codes, for example, are always 2 characters, but the spec specifies that it can be 3 characters. Also, airlines that fly in different parts of the world can share the same IATA code. ICAO codes have been better to work with. The whole airline / airpot identification is a big issues because 3 organizations are issuing 3 different identifiers, and all of the identifiers are optional.So even though we primarily use the ICAO code, we can't anticipate an ICAO code always being there.Dealing with government data has also been a pain. The FAA, for example, throws ICAO codes in a field that is designated for the FAA code.We also found that Flightstats was reporting an imaginary airline (Oceanic Airlines).
InfoQ: How complex are your mobile GUIs, do you use any device specific features like geolocation or the accelerometer?
James: We don't use any device specific features at the moment. We had thought about adding the 'shake for a random flight' feature, but every application already has the 'shake for ....' feature. So it really wasn't that important, and didn't make it in the release.
Pavel: There may be some opportunity to use the GPS, like searching airport by current pos, etc.
InfoQ: Have you considered an HTML5 solution that'd work across platforms?
James: The HTML5 solution is one that hasn't come up. May have to look into that, but we are very pleased with the native API.
InfoQ: How do the mobile apps connect to the Flightcaster backend?
James: Our backend is just a RESTful API over HTTP, all of our client apps make use of it.
InfoQ: What's your experience with iPhone vs Blackberry vs Android?
Pavel: Achieving an iPhone app look for RIM OS 4.2.1+ Blackberry users is only possible, if you write everything yourself. There is a lot of extra work in developing "pretty app" for Blackberry vs. Android. On Blackberry, there is an important balance between install package size, multiple resolutions support and multiple OS versions. We ended up with only two versions of installation packages - one for old 4.2.1 devices and one for 4.3 and above (including 5.0 touch devices), both having some extra code to fit into actual devices. The various configuration support on Android is much easier to use. You specify configuration simply by separate configuration directory (like layout-en-finger-480x320) overriding the default values. There are various types of data connection, carriers, enterprise BB policies, these all make it difficult to have general procedure for switching between transports (WiFi, BES, BIS, direct TCP, ...) in your application - we give user chance to say which transports are used. We are going to do some improvements on this, so that most of the users will be saved from the settings screen. Unfortunatelly, communication layer is not as easy to do as on Android where the connection switching is made for you automatically.
James: The GUI is relatively simple for an iPhone application. The performance isn't as great as we would like. The first view when loading the application, for example, doesn't scroll very smoothly.
We soon recognized where went wrong. The iPhone community and documentation has been great about how to make sure that your application performs well. The iPhone SDK has really been thought through. This was the first mobile application that we did. We have also done an Android application since then. Comparing the two, the iPhone has been the easiest to work with and is a real joy to use. When developing for the Android I found the SDKs were decent, but the device just doesn't respond as well or as fast as the iPhone does. Also programming in Java felt very verbose, even coming from Objective-C. Apple is known for their design, and it shows through in the GUI. Interface Builder really makes the design easier. One particular thing I found interesting when programing on the iPhone was that manually allocating memory really helped me slow down and actually understand throughly what was going on.
Jared: On the Android, getting data to views and passing it around is a larger task than it should be. Java is very verbose, I would have thought on a mobile platform a different language would be more ideal. Simple output to views is easy with the xml based layouts. Complex views are slow and difficult to make work, especially with inconsistent data. An autocomplete text field, however, is an amazing feature and is very simple to use.
Chase: We saw improvements in the responsiveness of the iPhone GUI after we rewrote parts of the views utilizing Apple's draw methods rather than their higher level APIs. The sluggish rendering was due to our use of transparency which is very taxing on the iPhone's performance. The app approval process was what we expected. (about two weeks) Although we did have some major bug fixes that were waiting for approval that caused 1 star reviews.