Bio Stuart Charlton is the Chief Software Architect for Elastra, a provider of Cloud Computing software infrastructure. Stuart specializes in the areas of systems architecture, RESTful web architecture, data warehousing, and is an avid student of lean & agile approaches to business processes and product development.
QCon is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community. QCon is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers.
So, the work that I'm doing at Elastra on the Semantic Web, we're using... And I often say that the way to use the Semantic Web is like Tabasco sauce - a little goes a long way, unless you're a spice nut - then you can use a lot if you really want to, but not everybody will understand. The idea is very simple, and it is... We have a web of documents right now, where I can share quite a lot of information with my friends and my colleagues, I can conduct e-commerce and whatnot.
The move to the Semantic Web is the idea of a web of things. The web can be used to represent everything from my car to my house, to my data center, to anything. Effectively, it is a distributed object system. I actually did a tutorial at OOPSLA a year ago about the web as a distributed object system with Mark Baker, talking about this very idea of... We think of it as a document web, but it really is an abstract interface and the REST architectural style really describes that abstract interface and why it works and why the architectural constraints on the web led to such success. So the Semantic Web really is, "I want to overlay a logic on this web. I want to be able to have, effectively, an open-world relational database for the web." There is nothing really too crazy about that.
I mean, it really is, I want to have data that's readable. I want to be able to have a web page and annotate it with some markup -- and it's called RDFa, which is the W3C's version of microformats -- and it allows me to say, if I have a data model, I can keep it in the web page and not lose the fidelity of the data model. If you think about how you build web applications today, very often with Struts or with Spring MVC or Rails or whatnot, we have this relational data model, and then we generate web pages out of it and we lose the fidelity of where it came from.
Well, all the Semantic Web says is "OK, we'll keep that fidelity in the web page, so you can get it later", so I can take the web page and throw it at an RDFa processor and it can actually get the data back in its original form. Now the big difference with the Semantic Web is that it's open-world in the sense that anybody can say anything about anything - there is no referential integrity. You can still do it, but you need different technology to do it. But there's, you know... Everyone worries that there's this large uber-ontology or one-world order that is what the idea of the Semantic Web is, and that's actually wrong.
The Semantic Web is really this idea of, everybody sort of has different terminologies - when I say tank, I might mean water tank or tank in the military, a military weapon. When I may say storm, I maybe mean "I'm storming the beachfront" or I might mean something in the clouds. So, being able to distinguish between those things, I think, is something that computers are not very good at. And so what you do is you apply a logical framework to it and there is 30 years of research of how to do that in a fairly efficient way. And then you distribute that out on the web and now we have a really good way of being able to just have machine readable information that's much more integrateable than what we do now, which is effectively neurosurgery, where we're splicing stuff and transforming it and service bussing it and all that.
I'm not saying that entirely goes away - I mean, syntax discrepancies you just have to deal with, with transformation. But having a semantic layer on top of that, enables quite a lot more fluidity of being able to join data in ways that weren't thought possible, and it's open world so that you can create things that are contradictory. It all is really a matter of, when you are reading something, "OK, what interpretation I am using at this very moment?" This is going to take time for people to wrap their heads around, but I really do think that the power of the relational database was providing that logical structure on information, and I think that we'll see a resurgence of that in the coming decade with the Semantic Web.
I guess the challenge is that the practitioners of the Semantic Web tend to be focused on a lot of issues, focused more on knowledge representation, classic artificial intelligence-type stuff, as opposed to practical matters of enterprise IT or integration or whatnot. And that's starting to change. You are starting to see a re-emphasis of the 'web' in the Semantic Web, whereas the past several years has been much more about the 'semantic' part of the Semantic Web. The new part is not that - that's been around for a while, though there are still some... Quite a few theoretical problems there to deal with, but the web part is the new thing and that's the part that I think... where a lot of the practical emphasis should be.
I think that the Semantic Web is the next step after REST. If people start adopting RESTful web architecture as a way of building their systems, they start to think about "Well how do I organize my data in this model?" There is a variety of ways that are sort of ad-hoc in doing that today. There's Atom feeds and whatnot, and they are quite useful, and there's microformats and whatnot. This is a well-thought-through framework for providing a fairly grounded logic to all that. There is a lot of resistance, just like there was really with relational databases. Like if you think back in the '70s, there was a huge movement of network databases and hierarchical databases, and the move to apply logic to it was quite resisted, though the business benefits eventually caught on and it took off and became very popular.
I think you are going to see kind of a similar thing here, is that when you are focused purely on imperative-style development you want to be able to control things, whereas logic is all about declarative-style development, I want to declare things so I don't have to do the pieces. So it's very productive to use, but it's not necessarily how a lot of a developers think. But eventually you start realizing "Well this actually saves me a lot of time and heartache" and then you start going "OK, maybe it's a new tool to have in the toolbox". And that's really all it is, is that it's a tool in the toolbox and I think the tools have to be improved and there is quite a few really good open source ones out there.
I think the whole movement towards linked data, which is the pragmatic or practical version of RDF and the Semantic Web, that's starting to come out of Tim Berners-Lee and the W3C. That's very promising and there is a lot to be said over the next year or two, seeing how people adopt that, and there's the Semantic Web query language called SPARQL. So I think that, though, there is some low hanging fruit there that's very practical, that you might start seeing some adoption.
First of all, there actually are a few good guides out there on this, that... YouTube has an introduction to Semantic Web and RDFa, which are great. The real thing, I would say, to start with is RDFa. RDFa is a very easy way of annotating your web pages with semantics. There is a... It gives you the ability to say, "what's the object or the properties that I relate to", fields in a table, fields in a form - that sort of deal. Microformats are also a great way of starting to get into the Semantic Web - it's sort of the lower-case S Semantic Web in that it has the same idea and intent, it's just sort of done in a bit more of an ad-hoc way, taking existing formats like iCalendar and turning it into an overlay on top of HTML. So my recommendation is, first of all, look at RDFa and annotating your web pages with that.
The second thing is, look at SPARQL, which is the query language of the Semantic Web. Start playing with it and there are quite a few interesting libraries out there that allow you to use SPARQL against the database, but also you can query the web with or you can give it a link and it will actually spider all the links underneath it and come back with a result set based on what you asked for. It's still early days, but I think there is a lot of benefit to both of those technologies and they are very exciting.
4. One of the things that comes to mind when you mention SPARQL, you said that it goes out and spiders a lot of content. So, is a Semantic Web query something that would generally go on in the background and an index would be created, or would it be something that you would do realtime, like for instance when you go to Google and you punch in a piece of text and you search for that?
Well this actually leads to what the history is of how things have evolved. I come from the enterprise - I'm not part of the Semantic Web community, really. I just sort of came upon it and I quite like it now. But if you look at the history of the technology, it started with a focus of using web technology, but they didn't even use the web, they used these databases called triple stores, which... A triple store really is just like a relational database, but it just stores the atom of the Semantic Web, which is called a triple: which is a subject, a predicate and object.
So it's sort of like "Mary has a lamb". Everything boils down into those three... a triadic relation. This gets stored in a database just like a regular relational database or a proprietary object database that got created that are called triple stores. And so you can do... In the past there was APIs to query those things and the adoption of SPARQL was just ratified this year, so it's... I think January [ed: 2008] it was finally recommended by the W3C. You are actually seeing a lot of those triple stores now have SPARQL engines on top of them so you can query these databases with it.
Again, the URIs are just identifiers, they don't necessarily mean you are going out on the web. It means I can look in my cache, I can look in my database or I might go out onto the public web and look at the linked data that's out there. So I think when you are dealing with things like response time, very quick response time, you are going to be dealing with databases, but when you are really just trying to get information off the live web and you want to take a couple of seconds, then I think that doing it on a live web makes sense too, just like what we are doing with mashups today. It really is a technology to make mashups a lot more... Again, a lot less neurosurgery focused and a lot easier to actually do.