Bio Don Box is a Distinguished Engineer at Microsoft working on declarative languages and tools to simplify developing applications and services. In that role, Don is involved in creating languages, frameworks, and end-to-end experiences to help people translate their intentions and desires for software into a machine readable and executable form.
QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community. QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.
Thank you, Stefan. It's my pleasure finally get to meet you. My name is Don Box, I work at Microsoft. I work right now as an architect in a group that owns a lot of our data machinery. The list of technologies that I'm currently overseeing, rather more one than what I've done before I joined the team, are our XML stack, our data access stacks - we have a variety of ways of getting access to databases -, a lot of our modeling technologies, the M language, a tool we built called Quadrant, the way the database shows up in the Visual Studio my team owns, and we built the SQL Server modeling services, which is the repository. We actually built a repository which we are shipping in a future version of SQL Server. Basically, a lot of data plumbing - that's what I get to work on and I have a lot of fun doing it. One more thing - my team is also doing work in getting the HTML storage stuff going.
2. When I quote something that you said, which I do often, I always put a simple line saying "Don Box, co-inventor of SOAP", so you go back a long way with all of that SOAP and web services kind of thing. Can you tell us a little bit about your history with that? How did you get into that?
The way I got into the web services stuff was because a friend of mine who worked at Microsoft called me up and said "Don, we want to go make programmatic access work over the Internet. You know a lot about some stuff we've built, so come up and help." Basically, I flew up, Dave Winer of RSS fame flew up, and we sat down with and Bob Atkinson and Peter DeYoung, as well. We just sat around for about a day and wrote some XML down and said "OK, how are we going to do stuff?" and we started coding stuff up. That was how we started with SOAP. It was pretty impressive. I remember we got stuff going really quick.
I think I stayed out there for a week and Dave stayed for a day. He had Frontier - which I think is the name of his thing. We all kind of agreed "Here is what the XML is going to look like that we are going to exchange." He went back home to the Bay area - I don't know exactly where he lived - and we started sending messages back and forth and it was just really fun. We said "OK, great! There is some axioms, there is HTTP and there is XML and we want to go send messages and get responses back, we want to build a request-reply protocol." That's how we started.
What I learnt from XML - I've been around the XML space for a while, I didn't work on the standard, but I've certainly written my share of XML values and code the processes values. The biggest takeaway I've got is it's text in name only as far as most of the world is concerned. It is a virtue that you can use a text editor to edit it, but the vast majority of people don't view it as a format they actually want to interact with. The project I work on now, or one of the projects I work on now, we started by leading people to write stuff down.
A lot of the things I've been working on lately have been how do we get information out of the people's heads and into code and into stores, text files or databases or whatever. We started the project saying "Great! We're going to have some XML file format that we'll use to capture certain information we need in various parts in stages of our processing and people don't want to use any XML." Our own engineers didn't like it. We went through the standard art. We have an XML file format - great! - nobody wants to use it, so, what do we have? We have a design surface. We didn't put a lot of people on it, so the design experience was not so good, either. That was for me the realization that there is a limit to what people can do with XML. It seems to be reasonably good for data interchange, it seems to be reasonably good for things that need to be mildly edited. If you give most developers a blank text editor and say "OK, start typing in this XML document, you know what it's supposed to do." Most people don't want to do it and for many years of my life, a while back, I really thought that it was going to take off, that XML was going to be the way people wrote down lots and lots of information and it didn't pen out quite that way.
I learned different lessons from XML Schema than I did from XML. With XML Schema there is so much value in what you leave out. In the XML Schema working group you had people from lots of different constituencies. OO folks - I was on the schema working group, I would consider myself during that era I was definitely an OO guy. So, I brought in a certain sensibility and a certain set of expectations of what we wanted to achieve. We had document folks, who were working from the SGML, troff lineage who had a different set of expectations about what they were trying to achieve with the language. And then we had the database folks, the relational folks, who had yet another thing they brought to the mix. It was kind of like the object-relational plus document impedance mismatch trifactor. We wound up with something that tried to satisfy all the masters and it was a pretty big piece of technology, which means it's hard to implement and it has some characteristics that are pretty tough for people to get their heads around.
In retrospect, I wish we would have kicked out features, I wish we would have gone for much simpler spec. It might have been much better. I look at what James Clark did with RELAX NG and I think it's pretty good - it's a great example. You had XML Schema, which was designed by a committee. Everyone brought in their proposals and their we-do-feature-work and have these little groups of "Let's go solve how we're going to do element substitution groups" or whatever. But having a design center - and James is an awesome designer, I've worked with him now and he's great - RELAX NG is just really nice.
Unfortunately it happened in a period of time that XML Schema was already largely out the door, but I think what I learned from XML Schema that leaving things out is always a good thing. Also, having a small design brain doing the work is better than starting a standards group and saying "OK, great! Let's start typing!", which is effectively the way XML Schema worked. Yes, there were input documents, Microsoft submitted an input document, Commerce One submitted an input document, there were four - as I recall - input documents that were different schema languages in the XML space, but I would like to not do it that way again.
Before we go up, I learnt things from XML. XML has a lot of value, XML does make the world go round, just like many other technologies do. If it disappeared from the world tomorrow, the trains would stop running somehow. The nice thing about doing software projects is you do work and you take some lessons and you hopefully apply them on the next one, but I still use XML, so I don't want people to get confused.
The higher up in the stack we go, Stefan, the less you like it.
7. Probably true, yes. What do you think about SOAP and WSDL these days? If you look back at what you expected from it and what you thought you would achieve with that, do you think that it has fulfilled the goals that you set?
It certainly fulfilled some of the goals. I definitely think it's getting the world into a place where we think about doing message exchanges, where we think about data interchange, which is really the goal for SOAP. I worked on SOAP, I didn't work on WSDL, I was doing something else when we did WSDL. The goal we had for SOAP was to enable people to have programs talk to one another over Internet protocols, using data exchange. What happened was, just by virtue of the way we built the stack out and by bringing in things like WSDL, we built something that can be made to work if you have programmer intervention.
If I put a SOAP endpoint outon the Internet, I can describe it both in WSDL form, or I could write a text document that told you how to format the SOAP messages, even though no one would ever read it, because the SOAP - WSDL coupling is so strong in people's heads, even if it may not be so strong in the actual technology. Ultimately, where we got to with SOAP was yes, you can write programs against it, but a programmer has to go do something with it.
To that end, we achieved our goal. I do look at what happened with AtomPub and RSS and Atom itself and the ability to interact with something without actually having a programmer write code against it was the thing we weren't going for. Now, obviously, in retrospect, the human factor of having someone being able to just evaluate an expression, which is a URL in this case, and have it do something on the server and bring me back something, it's such a compelling visceral experience.
The thing we didn't do with SOAP and WSDL, which is again hindsight, is we never had a REPL experience. The thing about REST-based systems -modulo authentication, which is always where the rub comes in - If I'm in a non-authenticated world, my REPL experience is actually, at least for read-only operations, killer. Absolutely killer! We didn't have that killer experience for SOAP. I got to say the human factors are such a huge aspect of software. That's one of the reasons why I think REST has taken on so well.
8. For the record, I'd like to point out that you've actually used the word "REST" first, not I, so our viewers will forgive me if I ask what is your opinion about this REST stuff? What do you actually think about that?
The web is axiomatic. You deny the existence of the web and how the web works at your peril. When we did SOAP originally, many of us - myself especially - were newbies to the web technology. I remember vividly being on the Microsoft campus just as a non-Microsoft guy working on SOAP. I remember reading the HTTP spec and saying "I need to make sure I do the right thing relative to the spec." And because our goal was "I send you a message, you return me a message and I'm silent about what you are going to do with it", I read the spec and I tried to be a good citizen, as did everybody else and we said "Well, the only HTTP method that makes sense for doing this is POST because its semantics are free enough, are loose enough, that is it doesn't have to be idempotent and it doesn't have to be side effect-free, we can use HTTP POST and be conformant with the spec.
That's the way we built the system. Obviously, over the years, being pummeled guys like Mark Baker over and over again, I finally got it through my skull - Oh, great! Actually I had several epiphanies about how the web works, not directly from Mark, but certainly Mark's incessant nature has just been a force of something good. He really frustrated me early on, but Mark Baker, thank God you exist! The world is a better place. It's really awesome how effective he's been, holding on to the idea, saying "Here is what I think" and sometimes I disagree with him and sometimes I may argue with some of the relative import, but the basic things he's been saying are pretty good.
It's astonishing how much the ideas of HTTP and web architecture have taken hold inside of Microsoft. You can't throw a stone down the hallway without hitting someone with the REST endpoint or at least a basic HTTP GET endpoint. People have accepted it as something we are going to go do and something we are doing and what's interesting is we are able to take our messaging stack, which is WCF, which we started to build a very XML-centric messaging system - I mentioned this to someone else, someone tweeted it recently.
What we did with WCF, which is different from the stack - there is WS-* and SOAP and WSDL all those things - Great! - those things exist -, there is a software artifact, there is a set of DLLs we built as part of our platform called WCF. As we were doing WCF, we were doing the protocol work and watching the world as we were all understanding how web protocols work and how web architecture works.
We were able, through the product cycle to get to at least make sure the plumbing allowed us to go make it possible to go build REST-oriented services with it and then, once we got V1 out the door, we were able to do work post-V1 both with the kind of very simple to WebGet/WebInvoke attributes, which allow me to do URI template, simple HTTP based services. But then, also with the things which we originally called code name Astoria, which was again built on WCF, gave me the ability to write full resource-oriented endpoints all the way to being able to support Atom Pub out of the box.
The nice thing is we have a low level piece of machinery called WCF, which gives us things like throttling and dispatch and activation, and we're able to support both classic Enterprise messaging scenarios with SOAP and WSDL and all the things the we did on that angle, which is great if you got a web sphere transaction monitor and you want to actually do two-phase commit with it within your Enterprise. It's not an insane thing to do within your Enterprise and we need to make that work, so our transaction manager, DTC supports WS-Transaction, so you can interoperate with DB 2 or whatever, but the same underlying messaging machinery also supports the REST model.
One of the interesting things about Microsoft that took me a couple of years to understand was there is no place for religion in products. We build products for lots of different people and if someone wants to do REST, I want them to love our stuff. If someone wants to do SOAP and WSDL, I want them to love our stuff. Basically, it's our job to allow our customers to do what they want to go do with their software and the easier we can make that, the better.
10. Let's move on a little bit to the more recent stuff that you have been working on. You gave a great talk on M just today. Can you go a little bit into detail about what Microsoft is doing in that space?
M is a language I've been working on for a couple of years now. M is a language for data, so we basically built a language do data processing, things like schematization, transformation, query. The things I showed off today, because this is QCon and a lot of people like fancy stuff at QCon, there was a DSL track and someone was giving an M talk and they asked me to come to do a cameo in it and I showed how M and DSLs work.
I showed how to write text processing in the M language. The idea is pretty simple. I write a set of pattern based rules that do transformation from text to arbitrary values. So, I wrote a grammar over some mini language. I think the language I used was Twitter feeds and I did a variety of transformations on it, then we got a value, brought into C#, did some dotting through it and did stuff with it and that was it.
Here is what we're doing. We put M under the OSP, which is the Open Specification Promise, which is an IP license thing. I'm not a lawyer, so I'll let you go read it. We did it to encourage other people to implement the language. We built the language for us, but we know when someone starts writing stuff down in a text file, that's an intimate relationship. You are making a very deep commitment when you open your text editor. We wanted to make sure that, if you wrote stuff down in M, then you could take that stuff you're writing down and if someone else wanted to implement a different runtime they could do it without worrying about us.
We've built a parallel universe where we've got a data model, XML has got the Infoset, we've got the M data model, which is an abstract data model we rarely talk about, but there is one underneath there. We have a surface syntax for writing down values in that, all like JSON, just like JSON has linearized certain Java script values, we have a way to linearize M values. We have a functional expression language, XML has XPath and XQuery and XSLT.
We have Query and Expression language. And we have the ability to write down types and in our case, we write down structural types, which is much closer to what you can do in RELAX NG, as opposed to kind of the more nominal stipulated typing world of XML Schema. From the 10,000 ft. view, you can say "Yes, they kind of do the same thing", but they are different enough that I think there is some interesting stuff you can do with them.
13. Going back to the original thought that you expressed that you initially believed programmers would be happy in writing down their data in XML. Now you obviously think something a little different that they would actually like to write it down in any way they would want to express it and then build a language to interpret that stuff. Is that the right interpretation?
It's a little bit stronger than I would say. The way I think about it and the way I thought about it for a while is people want to write stuff down, and a very common way for people to write stuff down is in text. What they are really doing is just writing down information. I want to extract the value from that information that they are writing in those text documents.
So we have this language, which lets me extract that out and then do further processing on it. I tend not to think of it as I'm writing a language, I use a little bit more blue collar terminology, I'm just writing the format. Yes, it's a DSL, and I think if Martin Fowler were in the room, he would say "Yes, M is a way to write external DSLs". Great! I just look at it as there is some data and I want to extract that data out and then do more things with it.
It turns out sometimes I actually want to put this information in the database and query it later on, so we've built two backends so far that we've got in the bits - one is the one that builds the GLR parsers, one is the one that builds the databases based on the M specifications you write. A new one, which just slipped in for our most recent build, although it's not at the same level of quality, so I wouldn't encourage people to go use it, but it's an indicator of where we're going, is we have this thing called entity data model inside of Microsoft, which is the thing that drives our ORM, it actually drives part of our REST stack as well - we use it for a lot of things inside the company.
One of the things we did in the past 12 months is we built a backend that generates the model definitions that are used by EDM so that I can actually write my schemas in M and get validation off of text, I can get storage schemas for my relational database, I can get the specification that my ORM needs, if I actually want go do ORM against my database.
This is my very first QCon. I now have a job that doesn't get me out of the house much. I almost never traveled and I was finally able to make it at QCon. I have to say it's been a really exciting and a really fun event. For years I've been watching QCon over the Internet, primarily by reading Stefan Tilkov's blog, which is why I was so happy to see you. I think it's been a pretty awesome conference. I'd like to thank the people at InfoQ for asking me to come speak, it's been really nice.