BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Interviews Joseph Molnar discusses scanR

Joseph Molnar discusses scanR

Bookmarks
   

1. Hi, my name is Ryan Slobojan, and I am here with Joseph Molnar of scanR. Joe, can you give us an overview of scanR?

Sure. So, ScanR is a service that is designed to take physical or digital documents and turn them into something that's usable and accessible anywhere. Some explicit examples: so let's say you have a whiteboard. If you take a photo of a whiteboard and send it off to the service, we will basically do geometry correction on it and then send it back to you in a PDF. If you take some photos of a document, again we'll do things like geometry correction, we will do things like color balance, we then throw it through some OCR to pull data out of it and send it back to you.

And if you do a business card, we'll do things like return back a vCard, so we take the contact information right out of it. And on top of that then we have integration abilities, so being able to integrate to things like Plaxo or Salesforce. So you can take the data and put it elsewhere. And an important part of this is the whole being anywhere - obviously from a mobile side being able to capture is half the equation and the other part is being able to view it.

What we introduced today is basically an application that allows you - and this is on the Storm, the BlackBerry Storm - that allows you to be able to view, you can zoom in, zoom out, go through the different pages without even having the image locally, without using a lot network bandwidth, if you are not downloading some large massive, say PDF, back. It just allows you to use it essentially anywhere. That part is all centered around the API that we've built and which we use internally and we're starting to open up to third parties currently.

   

2. One of the choices that you made in the development of scanR was to implement it in .NET. A lot of people in a similar situation might have chosen Java. Why did you choose .NET?

Let's talk history for a moment. I'm going to talk history actually from 2 perspectives: both from scanR and from my own. Java has been around for a while. I used Java from the beta back in '95 up through, pretty heavily through 2000. I participated in the release of 1.2, and the initial... Sun had sent us initial specs for the J2EE components. The VM we tended to use was the Microsoft VM; it was actually one of the most powerful VMs back at that time.

Moving forward, back in 2002 we started to play with .NET and that's when .NET 1.0 was released and there was just more comfort around what the language... You could see that Microsoft had basically learned a fair amount from what had gone on on the Java side and built something, which, when you have history behind you, it opens up new possibilities for what you do when you can start fresh. And so between the language and simple things, like the ability to do XML cleanly and easily and the way it was integrated in. Language features to libraries, it was just a nicer system.

So we played with it more and we used .NET on a fair number of different projects, and so the familiarity was high. When you take a look at what does it mean to use and why would you choose a platform, the most important thing that you want to do is pick something you are familiar with. When you are talking about a startup you only have so much in the way of resources available to you. You don't want to be learning something new.

You don't want to be taking something... so I didn't use Java for a while, none of the team could really use Java for a while, there was a stronger familiarity with it. But on top of that, it's simple things like I can install Visual Studio.NET and, within a minute, I can have a service running. I'm not worrying about large installations of different kinds of services and products and whatnot, which in the past, when I've used Java, that was the case. Getting up something like a WebSphere or some of these other things just took a long time. Even if you took Tomcat -

I wrote an article back in 2004-5 on how do you get Java and Tomcat and all these things kind of working together, so you have kind of a cohesive environment you can actually develop for and it was difficult. It's not something... That was probably the single largest article I've ever written that got hit and indexed and whatnot. It's just a lot of work and with .NET it's actually a lot simpler.

I'm ignoring certain things like cost and whatnot, which are separate discussions, but I think familiarity has to be one of the strongest reasons why you want to go with something, particularly in a startup environment. You don't want to be learning on the fly. You are going to be learning certain aspects on the fly, like how does your app run, how are customers using it, but you don't want to be playing with your environment too too much. That's the main reason why .NET.

   

3. Can you give us an overall idea what the architecture of scanR is?

Sure. And I'm going to go through a little bit of history here as well. I think it's important to understand how we evolved. The very first thing that we actually did back in 2004 was build a prototype service that was designed around business cards. Even though the focus now tends to be more... You take a look at what we've built and the complexity, that was the hardest thing that we could've actually done first.

And we were actually pretty lucky, because we managed to take the single best handset on the market at the time. We didn't know this; this was just pure happenstance. We took the best handset on the market at that time and took a photo and it seemed to work. I think it was a Nokia of some kind, it was 1.1 - 1.3 megapixel camera phone and it took amazing shots. It turned out there were no other camera phones on the market even close to its capabilities.

And so I would question whether or not we'd actually live as a company if we had used as our prototype a different camera phone.

So we started out building something that was for business cards. The complexity was too high. We started to realize what the different camera phones were actually like and we had our angel funds come in. So we said "OK, let's build something that is going to be more universally used." So, we concentrated on documents.

What we did is we built a system that was essentially a set of different services. Each service had a particular kind of role. Communicating between them all was a queuing system. It was basically a large-scale asynchronous system. We had no front-end UI, people interacted with the system using SMS and MMS, so the mobile side was obviously important to us.

As a startup, dealing with carriers is pretty difficult. And so by using MMS it actually allowed us to get off the ground fairly quickly. We didn't have to negotiate with the carriers; all we had to worry about were things like the bandwidth available to, or the size of the file that was available to be sent up to us via MMS. And so that's kind of our rev 1.0 was doing this, basically, set of services.

So what we have is, you have coming in, our inputs are essentially... we have this e-mail processing system that would kind of initiate it. We would read it from an e-mail stream. And so, MMS... whether it came from email or MMS, it came in essentially as email. We would do processing of it there, we would hand it off to something that would actually do the email processing. So we had something that would grab the mail, then had something separately that would process it, and we had to do that because different carriers use different techniques for how they give you the images.

A lot of carriers will give you the image inside the actual e-mail, so you just parse it out and you are fine, but there are carriers that actually use something called LightSurf. That means that the images aren't actually in the e-mail, so you have to parse the e-mail, then go out to a separate location to actually grab it and bring it back. So you couldn't have something right in the middle of your processing queue, that's going to stop you dead from a scale standpoint.

So we then take... so once we actually have the image, then what we do is we send it into this director service, and it's... You can kind of look at it, the director service, as a mini workflow system. It would essentially direct "from here, what do we need to do next?" So, image comes in, we then send it off to our scan imagining system, which essentially does the image processing components. When that's complete, we send it off to, potentially to OCR, which actually will come a little bit later, but there is an OCR path that comes on.

And if you are dealing with business cards, it then goes on to a classification system and then we have a fairly sophisticated classification system that goes on. And then there is something that produces the results, say the PDF that's going to be generated. Then it goes out to a service which takes a look at any application that you have, that you are using; and I will describe what application means in a second; Any application that you happen to have subscribed to as a user and deliver to that application - Salesforce and Plaxo are examples of that.

So that was kind of our first pass, a big portion of the service. I mentioned something about applications. One of the core aspects for us in our system is this notion of an application. An application can mean anything from a handset app that you have running, to our own internal system. So whether we are going to fax to someone, whether we are going to send it to Salesforce or Plaxo, our own e-mail receiver and e-mail delivering are all applications. And so when a user signs up in our system, we automatically subscribe them to set of applications that exist.

Now there are obviously different kinds of applications - like I said, you have some that deliver, some are handset apps, and those play a role in how we do delivery, how we accept input, but it was a core central component for us that we had built. And we extended that later so I'll go over that next.

One of the big portions that came out after we did the service, was people weren't using it as much because SMS and MMS is hard -- typing on your handset is a pain. Obviously we needed to build a front end to this thing, so that was kind of our next big push. We built basically a PC web-based front end, which was AJAX-enabled and which would allow you to, I'll call it do, rudimentary document management. It allows you to take a look, view, see the content, you can remove the images, you can remove the PDFs that exist in the system. So that was the next step.

Then we... I'd say the next piece that we started to work on was, we had dealt with internationalization to a certain extent -- you have to, when you are dealing with the carriers around the world -- though we hadn't really done a fair amount of partnering at that point in time. One of the next pieces that we did was update the system to do a better job of internationalization. And a lot of that centered around a resource system that we built. So resource, the term comes largely from .NET - resource strings. And so we had a resource system that we had built on our own, but it allowed you to do things like not only deal with different languages, but deal with different partners that exist in the system.

Obviously we were dealing with carriers and so when you are dealing with carriers they want to do things their own way. For example, if we have a website, they are going to want it to look differently if people are coming in from their own networks. And so this resource system that we built allowed you to choose the resource, if you will, based on not only the language in question, but also the partner in question. On top of that we needed to be able to do things conditionally. So we ended up building a resource system that was like a mini-language, I guess you could say, so we could do evaluation of parameters, for example. We basically had this kind of weird mix between an 'if' statement and a 'switch' statement - it's kind of hard to describe unless you see it, but it makes it easy for people to actually make updates - "Oh, I'm dealing with this parameter. OK, what is its value and how does it relate?" and then you can throw back some different text.

Internationalization was obviously key. And then as we started looking at doing billings - we hadn't done billing yet - so as part of that we started putting in billings. So now you're starting to build a system from an international standpoint that needs to deal with currencies and differences in currencies, so we built those portions up. So our subscription model - when I say subscription, I'm referring to subscribing to the service, so how many scans can you use a month - so the subscription model also would change via the partner that we're dealing with, and they would have different prices and different currencies. We have offerings in currencies from 40 different countries. We are in broader countries than that, but in terms of directly supporting them, for example the EU, you need to deal with them and deal with the different countries individually due to that. There is a whole billing infrastructure that we put on place, so not only we are dealing with billing differences in terms of price, but we're also dealing with, if I am going to be billing through the carrier, that's going to be different.

So going through, say, KDDI versus Vodafone Germany versus AT&T versus our own - we are all different - so there is an entire billing infrastructure that sits on top of that. As we then continued to progress forward, we had a rudimentary API that we had built back in the day, which was meant to live 6 months and lasted a lot longer - not for traditional reasons as in it got used a lot, because we were the only users at that point in time of the API, but it was just prioritization.

And so we finally had the time, particularly as we started talking to more of our partners, that we wanted to have a robust API. We had already designed a lot of it, it was... this whole application infrastructure was the first stages of that and so there was some foresight in terms of looking at where we were trying to progress. And so we built an API next, which is a RESTful API, that sits on top and it's all custom, it's not using much in the way of what .NET facilitates and we can probably go over that after, but we do have this API that sits on top that allows us to do everything from looking at what's inside your gallery - so what have you uploaded in the past, being able to retrieve that content back again, being able to view the content in a way similar to Google Maps where you are getting a bunch of tiles back and being able to zoom.

It's very robust: being able to create users, being able to create the devices that communicate - it's a very robust system that we built around the API.

So if I had a whiteboard, I would be able to describe that a little bit more and kind of draw all the different relationships, but those are probably the key major architectural components that exist in the system. What you kind of have is: in the middle you've got this kind of hardcore message queue system that is doing all the processing of content, all the processing of images coming in, processing of the data. And then on the periphery you've got something that's managing input, managing the visuals, and whether that's our websites - the mobile or PC website - or whether it's our API.

   

4. What are some of the challenges that you've encountered while dealing with all of these different mobile providers, these different carriers?

I think I mentioned one of them - dealing with the MMS side of things is, you are going to get differences in how you are going to be able to retrieve it. Sometimes there are differences in the way that you receive e-mails. For example, it's very important for us to be able to identify who's sending an e-mail in. If a carrier happens to randomize the MMS e-mail address, that makes it very difficult to be able to communicate and understand who you are actually talking to.

That's one, but these days, we don't tend to get a lot of that. A lot of our communication occurs via our applications now. That or, if you are talking about Japan, they actually don't use MMS, they actually just plain use email, which is much nicer to deal with. Japan is... dealing with the carriers in Japan in general is a lot nicer. That's not technical, but... Additional things on the technical side: when you talk about the API, there are a lot of different things...

We have a RESTful API, but the carriers do things differently. For example, there may be URL length limits, so you can't necessarily... Frequently on older handsets, you can't necessarily spawn something off if you are dealing with older handsets that's longer than 127 characters. Again, newer handsets this is a non-issue, but we are talking about a fairly long period of time for us and some of the things we've learned. They will strip things from the HTTP headers - certain HTTP POST verbs may not be accessible.

When you take a look at those different components, we concentrate on watching our URL length, obviously; relying on your standard GET and POST and not worrying about... From a RESTful standpoint, it's common to use PUTs, DELETEs and just don't do that, because you never know when that's not going to work.

You'd think you'd be safe after that, but that's not true. Different carriers will transform content.

You are talking across port 80, generally, or port 443, so HTTP/HTTPS, which means that they assume, certainly if it's HTTP that they can transform because "Hey, it's just HTTP content", when of course it's our app and it's the API and it's XML that's being sent back and forth. So you have to make sure that, in certain cases, you need to be able to get yourself whitelisted to not be transformed because there is a no-transform HTTP header that you can use and they don't necessarily listen to that.

You want to put in 'no caching' and other things like that, so transparent proxy servers - which as I said are not very transparent -- don't start munging with it. And that's still... Sometimes you then even go further and you've got carriers who will not even let you communicate on the network unless you're whitelisted. So, from a technical side, that's a fair amount of the difficulties or challenges from a technical side. On top of that, things like the billing systems - so you need to be able to integrate with different billing systems - and so putting infrastructure in place, which will allow plug and play, if you will, of the different kinds of billing providers is important.

So that was another challenge.

And then if you take a look just from... I guess to help education for anyone who is looking on the mobile side, dealing with the carriers from the business side, it can be hard to get something out the door. North America is way worse, for example, than Japan. I would say ease, in terms of us dealing with the carriers, Japan has been by far the easiest - they're very helpful, they say they are going to do stuff when they are going to do it. In the US you tend to get a fair number of third parties, so it's not actually being run by the carriers themselves. So when you have an app out there, and you want to get out the door, you are actually going through a testing vendor which is an entirely separate vendor. And it's not uncommon for people to change some of these processes over time, so you are part-way through the process and then "oh, something has changed". It's just... It can be pretty hard dealing with the carriers on that, from that standpoint. Europe tends to be better than the US.

   

5. With the system that you have set up, what versions of the .NET Framework and other libraries that are out there are you using?

Because we started in 2004, we started on .NET 1.1. And we've migrated all the way up through .NET 3.5 SP1 - that's the latest - but that doesn't mean that we were taking advantage of the features along the way. I would say we are largely .NET 2.0 from a feature side. And it's just because it's expensive to... You don't refactor pieces unless you are trying to touch... And also from a startup again, you are prioritizing your resources. Resources are obviously things like money, time, people. So we didn't necessarily have a huge amount of time available to us to say "Let's go back and redo this" unless we are already touching that area. And if we are going to be touching that area, then we would tend to update. So .Net 3.5, though I would say .NET 2.0 is where we're really spending, is where most of the time has been spent.

In terms of external libraries, we use things for... Oh, it's not really a library per se, but we are using Postgres as a database, we use Lucene.NET to deal with the searching components that we provide from a UI and API standpoint. We use Adobe's PSL for generating the library and we have used iText# as well prior to that; log4net for dealing with logging components. We do multiple things from a logging side, but the main log to disk, if you will, is log4net. Obviously we are running on ASP. NET and we use JQuery for some of the things related to AJAX. Those are kind of our main libraries.

   

6. What are some of the major problems that you've encountered and how did you solve them?

We can look at problems from different perspectives. Going back to being a startup, if you are an enterprise company and you've got a fair amount of money you can do things differently than we did. A lot of what we did was say, "Hey! We want to do these particular features and we are going to build towards that". Then you say "Hey! Alright, this isn't going to really work, but we are going to do that for now. We are going to look to change that when we come back to it later." And so we would identify issues that we know we were going to hit at a later point in time.

Yes, there were problems, but they were conscious decisions not to deal with them 'till we were able to get to a point in time that we were either mature enough in areas and got to see how the customer used it. What we tried to do is adapt. We put something out there, take a look at how our customers were using it, take a look at how it performed. I don't necessarily mean speed, but just it's usability for customers, and we would then adapt. And so we would pinpoint the issues that we would need to come back to. That doesn't mean we didn't have any surprises. When you're scaling a system you've always got surprises that you need to deal with.

The first big surprise was Postgres. We were using, we still are using actually, an object/relational mapper that I built, called Cello. I built this on my own time between 2002 and largely 2005, a little bit of it in 2006. You could run this against Postgres, MySQL, Oracle, SQL Server, even Access actually. The Postgres driver that we have had written, it turned out that Postgres performance on certain kinds of queries is fairly poor compared to other databases. Postgres has got some great other components to it, but in this particular area...

We ended up putting the server, we had to get something out pretty quick, and our performance just dropped to the floor. What the hell happened? And so we took a look, and we narrowed it down pretty quickly, but over the next 3-4 days, it turns out this is not right, so I just changed the code gen for it and we were fine. That was probably our first scale issue that we actually hit. But after that, from a scale standpoint, it really wasn't that bad. We made one mistake that I would say, one big mistake that I can think of off the top of my head, which is when we first built the system, we obviously are dealing with a fair number of files, we've got images coming in - there's a lot of metadata around it.

And our first decision that we made was we were going to store all this on disk. We are not going to use a database, because we are going to be hitting it a lot, we just don't want to deal with it. That turned out to actually be a mistake, and I don't mean from a speed standpoint. It turned to be a mistake from a manageability standpoint. Scale is not just about performance, it's about your ability to manage what it is that you are looking at, what it is that you are doing.

So it turned out that it was pretty unnatural. We needed to make changes to the structures, and crawling over a disk is one of the slowest ways to do it. If you are going to do it in a database it takes half a second. One of our big migrations that we actually did was to shift things over to the database. So our storage structure actually is entirely laid out in a database, which is absolutely fantastic, because it allows you to do a couple of things. One: it allows you to do your migrations in a much simpler fashion. It's not that you are going to stop using the disk, obviously we are going to be keeping images on the disk - it's easier to manage them that way, for a variety of reasons, but you are going to look at caching content there as well.

It means that you can, for example, if you have a catastrophic issue - on your filers or whatnot - you have the ability to recover the entire system based on the database. You're taking a look at managing both sets of the equation. Databases tend to actually run... these days, the performance of the CPUs is pretty impressive, what databases can do, and caching practically everything in memory, and Postgres does an absolutely fine job of that. It's pretty amazing what you can do there, but having that performance notched up a little bit more by caching everything to disk obviously plays a pretty big role.

I would say that was one of the initial decisions we made, which was obviously wrong, that we went back and changed. And other than that from a problem standpoint, it just tends to be, a portion is dealing with the carriers and how do we look at those issues when they come up. In many circumstances, it's dealing with failure conditions of either the carriers in terms of notifying us, the billing providers and notifying us of information - again, from a scale side, we weren't... things ran fairly well.

I would say one other portion that we made a conscious decision on, which bit us a little bit -- again, conscious decision to do this -- was we needed to have a way to be able to update the Lucene records. Obviously, you can't do that, you can't do multiple updates on a row concurrently -- and again, we're using Lucene.NET -- you can't do multiple concurrent updates in a row, plus you've got read operations going on. We've certainly seen that the supposed ability to do concurrent read and write doesn't exist.

You do run into issues at various points in time and you do have the threaded call... a possibility of multiple writes going into the Lucene index at the same period of time. We didn't want to have to solve it, because you've got multiple ways you can solve that issue. We are storing our indexes per user, which is great from a management standpoint, but it means that, if you need to do an update, you need to be able to do one of two things in order to do that right.

You need to be able to do some form of locking system in place that allows you to lock down that resource and come back to it later; Or you need to get some form of constant hashing mechanism, so you can basically say "Hey, I've got this particular user trying to do something. He is going to be managed by this particular indexer or indexer service and make the update."

That was something that we made a conscious decision on and it turns out... The reason why it became a bit of an issue was, we funneled everything down to a single server to run it at first, because we didn't want to have to deal with it, the operations that were going to be fast, it was an asynchronous operation, so arguably we didn't need to deal with it right away, our time was better spent dealing with something else. But when you start getting galleries with 15,000 items in them that people are putting in - so people are uploading 15,000 documents to the system, all of a sudden your operations to update the Lucene index actually starting taking some time.

And so if you have a lot of those going on, it quickly starts backing up the entire system. That's not our slowest link, if you will, in the overall chain, but the service that was running it was doing other operations - it was kind of like our kitchen sink service if you will. This work object would come in and it would just execute whatever came in. What is happening is that it starts backing up everything from just plain email deliveries going out the door. And so that was probably one that doing that differently earlier on would have been a little bit more helpful, but if you look back and say "Hey, did we do things in the wrong order?", I would argue "No". You don't know what's really going to happen until you run it, but we did what we could given what we were trying to target at the time, and for that, I think we did a pretty good job.

   

7. You've hinted at the manageability stuff, talked a little bit about log4net. I'm curious what the monitoring story is. You've got his big system with lots of moving parts -- how do you manage that? How do you drill down on problems?

There are a couple of components to that. One of them is, all of the services that we have in the system are inherited off of a base class. And this general structure is essentially something comes into the service, we log what it is, it does its work, we log when it's done and then continue. This service that we built, that every single service executes or has as a base class, has a couple of things: one is, it will do self-monitoring.

It monitors the queues and understands when thresholds are too high in the queues. It monitors how long it's taking to execute a particular job. You don't get into a situation where if something is supposed to take 2 minutes and it's taking 10, something is probably wrong. We put in a various different levels, whether it's high or low watermarks. We expose a lot of that out through PerfMon, through event logging that exists within Windows Server.

Then, what we have on top of that is we have what we call our operation service. Assuming that your queuing system is alive - and generally that's been good, we are using MSMQ, generally that's been good - what happens is issues are reported out through the queuing system and into this operation service. That service is able to then report issues out either via email or SMS or what have you, so that the people can actually be paged automatically. And included in that, when it's sent out to SMS it sends a smaller bool, just a little byte, because you only have so many characters.

The email basically gives you a description of what we believe the problem is at that point in time as well as a full call stack. At the same time, we actually log every single work object, so our entire queuing system is built on work objects. We log every single work object and every single job through the system, there's a GUID basically representing the job. That gets reported in the operations queue too, what gets sent out in the e-mail, and then you can go back and you can take a look at, depending on where the service is, you can take a look at the actual job.

This even allows things to be able to just take that job and resubmit it back in again, depending on what the circumstance is. Obviously, with certain failures you can't do that. The operations queue is also entirely used by the websites as well. Essentially, if you have failures come off websites, it will automatically go out to the operations queue as well.

On top of that, we log to disk and the log to disk is more detailed. Again,some of this is just standard log4net stuff.

You can turn it on and say "I want this level logging for now". And so, if we need to we can go to those logs on top of that. And then in terms of monitoring web and components, you've got various logging services that exist out there that just ping the services to make sure they're correct. The API for example you can ping, and you generally... To do something, you need to have an actual account on the API, but you can ping it and it will return back a response saying "You can't do that". As long as it returns that response, you know that's kind of valid.

And then you go all the way through things like shoveling something through the system, right? You can automate putting something through the system and did you get the result out the end, and so you know that the general flow is actually working correctly. So those are a lot of different components that we have in place. It's actually pretty good. For example, we just put an update out the other day and something weird happened and within seconds, within 5 minutes we knew exactly what the problem was and it turned out to be a config issue. Within that very short period of time you could identify what the problem actually was and you could update very very quickly.

   

8. If you were to start over from scratch today writing scanR, what would you do the same and what would you do differently?

That's an interesting question, because there are 2 ways you can look at it. If I were to start now, or if I were to use that same information and have started back then. And the reason why I say there's a difference is, starting now, the tools are a lot different - we're talking 4 and a bit years at this point in time. I would do things like evaluate what .NET 3.5 has available to you, things like WCF - what can that actually do for me? I'm not exactly... I don't like all components of WCF. I mean I haven't played with it enough to really make a good evaluation of it. But, the parts I don't particularly care for are, you are putting attributes down on your classes to indicate your contracts that exist, whether it's communication, data, the operations, etc.

The problem is, as you version, what does that mean? Because you are actually changing that interface directly. You have the code which represents your back-end, but how you put that on the wire may be something entirely different. So the question is, how do you manage the differences between those 2 things? I would do a fair amount of evaluation on that. The big thing is, I would probably go back and take a look at those areas where some of those scale issues came into play, in terms of things like what we call the high-latency service - the one that was doing the indexing.

I probably would have done that sooner. I probably would have done... I mentioned about the file handling - I would have done that differently. The API I would have pushed harder, to make sure... Again, part of it I don't have as much control over; I talked about this a little bit earlier off camera, but you have different companies out there. You've got... In the valley you've got startups that are tech-focused and product-focused. And a lot of it centers around who is the CEO. The people at scanR are fantastic! Everyone, from the CEO all the way down through QA, to Marketing - they are all great people.

The CEO is a business dev product person, and so that influences obviously how you are going to do things. Given that, I think we made some of the right calls of getting items out the door, but it means, the items that I probably would have done differently, or I would have done some things differently if it had been tech-focused and I would have been looking at some of those scale issues and the manageability issues a little bit sooner than we actually had. We haven't really had any major issues, things like migration of databases - we talked about this again off camera before.

Migration is always an interesting thing and when you are in a startup, you are not going to get it right the first time; you are going to see, you are going to learn, you are going to update your schemas and you're going to go back again and you're going to migrate. We never had, for example, a migration fail on us, which is good. We always do backups and all this good stuff, but migrations and whatnot have been good. It's impact hasn't been high. It just would've meant that we... Some things would have been a little bit easier to get done and that means we probably would have also done some tools that would help us out a little bit sooner than we otherwise would have, because there tends to be a trade off between the feature set and your infrastructure.

The feature set are what the customers actually use and the infrastructure is how the system actually runs. When people don't understand, they are non-technical, they don't understand how the infrastructure really plays a role, they understand and can see, and we have great monitoring for this, and understand how the users are using the system. And so the emphasis tends to be there. And so my job is to pull over, make sure the infrastructure part gets dealt with, and we refactor nicely and whatnot. So there are areas I probably would have done some refactoring a lot sooner than we probably would have otherwise.

May 11, 2009

BT