Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews's Dave Carroll on Scalability and the History of Their APIs's Dave Carroll on Scalability and the History of Their APIs


1. Dave, can you tell us a bit about the platform and how it relates to

Sure. is, of course, our company name, but is also what we refer to when we talk about our cloud-based applications such as: our sales cloud, our service and support cloud, our collaboration cloud, which is Chatter. is the underlying platform that supports those applications. is also the platform that we provide to developers to not only extend sales and services support, but to create whole new applications in the cloud.


2. has an API which is used by those applications. Can you give us an overview of that API?

Sure. We've had a Web Services API for a little over 8 years, probably closer to 10 years. In the early days it was XML RPC and then a couple of years after that we switched to SOAP We've actually got several APIs, so they are all based on web services, we've got our core APIs which allow you to move data in and out of as well as we call our metadata APIs and these allow you to actually modify the artifacts that you use when you customize the CRM, or you create new applications on the platform. So that is how you can do that, for instance, in an IDE such as Eclipse. We've also got bulk APIs which is used for moving really large amount of data back and forth between the platform and whatever integration point that you happen to be using.


3. Can you walk us through the initial creation in the first couple of years of the API?

Yes. When I joined Salesforce we had an API, XML RPC, and it was very difficult to use, it was pretty robust, but at that time there was a lot of IDEs, a lot of other tooling popping up around Soap and WSDL. And so we made a decision then to migrate from our XML RPC to a Soap and WSDL based web service API. More recently what we're doing is introducing a new REST based API. The main reason we migrated to Soap was to be able to make it easier for developers to use these APIs. One of the big factors in the success of Salesforce today has been the ability to integrate the data that's inside the sales and server's support applications to other applications, like ERP that you might have on premises or integrate with other clouds.


4. What was the initial feature set of the API and how has that evolved over time?

Interestingly enough, the API, although it has evolved, it hasn't changed much since the initial Soap API was first introduced. The features we made available, of course, are database operations managing your data through web services, but other things like being able to describe the data. In other words to make a web service call so that you knew what objects you had created inside your org, to know what the fields on those objects are, to understand what the attributes of those fields are. And one of the things you have to remember about our platform is it's highly customizable. So this means that our web service API had to be able to handle any kind of data schema the costumer created.

And so it has to be able to manage any kind of data schema as well as be highly performant. One of the main features we made sure that we included early on was the ability, or the requirement, I should say, that all of the interactions with the database happened in batches. So whenever you create a set of records inside of the API or using the API, you send up an array of objects and when you update objects you send an array of objects. So the idea was to encourage developers not to create an API call for a single update, but to optimize that call by collecting those up and then sending them up together at the appropriate time.

As you know you can have a one-element array, so we weren't forcing developers into a particular pattern but just encouraging it.


5. As the platform scaled up and was used by more and more costumers, how did you address some of the scale concerns and what were, for instance, some of the bottlenecks that you ran into?

That is one of the great aspects of the implementation of the API, in that the shape of the API that was visible to the developer hasn't really changed over the years. That's one of the big, important things that you need to understand when you are building a platform that is meant to last. Applications are meant to last when you have costumers who create an integration to it. They want that integration done the first time they create it and they want it to last for years and years. And so one of the things that we figured out early on, was that we needed to version our API and make sure that a costumer or a developer that is using the API didn't need to go back and revisit that integration because of some change, some improvement that we made to the platform.

As far as scaling up to a large number of users, another point of beauty with the API is the path that the API takes when it accesses data is almost exactly the same path as the UI we've created in the app takes to access the same data. And so that means that in the course of optimizing our platform for a large number of users the API benefited from that without very much additional work.


6. What are some of the major architectural components of the platform?

Well, we've got a pretty complex and powerful platform at this stage. The platform itself consists of the infrastructure underneath it, but on top of that we've got the database which I think everything is central to. When you create an object inside of the platform, otherwise known as table for a traditional database, that also triggers or creates a user interface is also it the mechanism by which you can begin to specify workflow or approval processes that are associated with your business objects. And so the application as a database application is really central to that application.

And we've got a lot of other features or components of the platform such as the ability to very easily have these objects included in the analytics in reporting. We also made sure that the objects are very deeply integrated with our user and sharing model, so we've got a very fine grain field level sharing model a role hierarchy that you can also take advantage of and all of these pieces are really interleaved closely together. Now the thing about the API is that it honors all of these settings. So when you create a record, say, inside of the user interface and you've attached workflow to that data, you've actually set up some specific sharing rules around that data, then the data behaves the same through the API as it does through the web interface.

When you authenticate with our API, you are authenticating as a user and that authentication then determines what you have access to, which is of course, then determined by the admin of that particular application.


7. What are some of the software components that make up the platform?

Our platform is, from a software perspective I think, pretty standard. On the database tier we're an Oracle costumer, so we have an Oracle database. We've also selected Resin as the Java runtime or the servlet engine that we use mainly because it's pretty light-weight. Of course thinking about scale from day one is very important. We've learned a lot over the years and it's turned out that's been a really good choice for us. For our search we use Lucene and of course we do all of our programming in Java. The machine-set we use, we are migrating towards Dell machines.

I think we use about 1500 to service our 2,000,000 costumers, which is pretty amazing. So we've got the Dell machines and then of course we are using Oracle RAC 8-way up to 20-way machines. They have a very heavy memory footprint, in other words, we use upwards 50 Gigs on these machines to support the kind of caching that we do on the system as well. Obviously caching is very important when you are dealing with volume of records that we are talking about here. That is the basic architecture. I think we are really stretching the capabilities of Oracle in certain ways and I think it is a pretty obvious choice for us.

The way we handle scale outside of that, is our architecture is essentially a node-based architecture. We have 7 nodes for North America and each node contains the full stack: the server machines, the database and so on. When we need to scale, we can scale up inside the node if required, but there is a limitation. The other thing we can do is add another node. So when we reach a certain amount of saturation in the existing nodes, we go ahead and create another node to start putting new costumers on that node. Those are the two primary ways that we maintain scale.


8. Is data synchronized between the different nodes by having each one communicate with a single Oracle database?

The data is not really synchronized between the nodes. So any one costumer has all their data in one node, of course, next to the other costumers on that node. What we do replicate or synchronize, we do through a mirror data center. So each node has a counter-part mirror in another data center and that, of course, is for disaster recovery.


9. The has been a cloud offering for several years now, as one of the first cloud offerings the platform master-service model. What were some of the challenges in launching into what was basically a new space at that time?, in my opinion, is one of the first cloud platforms that helped define that term. Platform, as you know, is a service up to debate as a definition and it depends on who you ask. We define Platform-as-a-Service as something more than being able to rent hardware, for instance. We define it as having much more abstracted services than simply a place to run your machine image, for instance. And so the challenge for us is to actually create the definition and to evangelize that definition so that there is not this false cloud, if you've heard that before.

That is a term that was recently very prominent. Cloud computing involves more than just running applications in the cloud, it's the multi-tenant model, it's a whole different business process or business model, it's not big licenses upfront buying a lot of hard ware, it's pays you go, pay for what you need, scale up, scale down as required.


10. As a provider of cloud services, what are some of the challenges that you face on your side in providing that cloud to costumers?

The way that we ensure that our cloud services are adequate for our costumers is in the way that we've architected both the multi-tenant kernel that is the underlying foundation of the platform itself, as well as the way that we've architected our data centers and our data recovery strategy. And the way this works is, as I mentioned previously, we've got a couple of different ways to scale out, but we've always created capacity limits so we have more than enough room for a number of users. As a developer on our platform you don't really have to think about should I spin up another VM or how many users are coming to my site. The node that you are on can handle millions of users, not concurrently, I don't know what their concurrent capability is, but you know, very high volume.

I mean we've got costumers on single node with over 145,000 users and they're living on a node with other companies as well. And so as a developer what you think about scale-wise is making sure that the application you are implementing on our database has an efficient data model as you create it. That the code you've written to support it in terms of Apex code or Visual Force pages is well written and optimized and then you don't have to think about scaling up to new servers. The concept of that kind doesn't exist for the platform.


11. What are some of the tools that are offered to users of the clouds so they can see what the current status of the different nodes are?

We have a site called and when you go there you can see the current status of any of the nodes that are part of our infrastructure and with that opneness, hopefully they are all green. But if they don't happen to be green, there are explanations about what is going on at that current time, an estimated time for a resolution and so on. And the idea here is that we are really trying to own the responsibility of keeping these systems both performant and available to you all the time.

So that you don't have to worry about it. When there is something that happens though, is the place you can go to see immediately what is happening. We want you to know we are actually working on your behalf both as a developer, whose chosen the platform, and a costumer, whose chosen the software as a service.


12. As the provider of a public API and cloud service that is used by a large number of people, what kinds of testing do you do on releases before they are pushed into production to ensure quality and speediness?

Of course we implement a lot of best practices around testing at the code level. So our engineers write probably more tests than they do actual code. Which doesn't sound very glamorous, but it's the way they keep the system running. In addition to that when we prepare for a release we have an environment that we test our release on, one that looks exactly like our production environment. Once we release, we have a rolling release and so we role out on certain nodes at certain times. One aspect of the platform that I didn't mention is we have a concept of a sandbox. And a sandbox is a completely separate environment from a production environment that allows you to take your production application and have an exact duplicate of it within the sandbox.

Then first step of our release process, after we've done our standard QA processes, is to release to that sandbox environment and ensure that everything is functioning properly there. Of course we've got many tests that we run, that we've accumulated over the years to make sure that we're backwards compatible with all the different versions of the API and we also have code that we've accumulated in the sandbox that we can run to ensure that we don't break any costumer customizations or any applications built on the platform.


13. How does a platform user hook into the process behind the API and execute their own custom logic during the process?

Apex code, which is a language that we specifically invented for the platform. It's a combination of Java and PL-SQL, very object oriented although you could argue if it's fully object oriented, it is the mechanism by which we provide that capability. So what that means is if you think about a traditional database, you've got stored procedures, you've got triggers that you can write to have actions happen and logic applied when database operations occur, we provide the same thing with Apex code. So we provide the ability to after a create and an insert operation on the database or prior to the insert operation to be able to inject your custom logic into the process and either allow that operation to continue or prevent it from continuing.

In addition to that we also allow you to extend the API through creating your own custom web services. And again, this is done using Apex code. So you can write a method in a class of Apex code and do all your logic in there, maybe put some transactionality in there because we support save points and the ability to role back and commit DDL statements. And once you've created that logic it's very easy to deploy as a web service by simply adding a key word to the signature of that method. The key word is web service. Once that's done and saved back up to the cloud, then you automatically have an end-point that you can access via SOAP.

You can generate a WSDL specifically for that class and in that way you can really extend the web services API to include very complex integration logic. Providing a different level of transactionality than simple database updates.


14. You've recently announced a new REST API, which is under development for the platform. What was the motivation behind that move?

Well, REST has become a very accepted standard way of doing web services, I think more SOA than SOAP. REST APIs are very simple, in other words they are semantic, they really follow the spirit of the web, the whole HTTP protocol and one of the reasons why we have announced that and why we've created that API is to, again, expand the number of developers that can use the platform, by including those that would prefer a REST API. The REST API is currently in private beta and we expected to go to a 2-step release process, where it will go from pilot to a developer preview and then from there to general availability.

I think that with the evolution of the web our APIs need to evolve as well and this really represents Salesforce's recognition of both the evolution of and integration across the web as well as recognizing standards that are being used and trying to be a part of the community of people who are adopting REST standards.


15. When do you expect the wider development community to be able to use this REST API?

The release date, as with any software under development, is always something that is very hard to predict. We are hoping to be in developer preview in the Spring timeframe, with GA shortly after that. Having said that we do have a private pilot that although it's private, it's very easy to get into. You can make a request to somebody at, like me, and I can get you set up to begin using that API. Again, we are looking for feedback, we want to make sure that we get it right. And so going through a pilot then a developer preview and getting feedback on what we've done correctly and what we've done incorrectly is a really important part of the process.

Jan 21, 2011