JavaOne Semantic Web Panel
In this JavaOne panel session, speaker shared their experiences and opinions on the current state of semantic web technologies. Before the panel started, each of the speakers started out with a short introduction of how they are working with semantic technologies:
Lou Tucker, VP Radar Networks, started out with a demonstration of his companies service. At twine.com, people are the most important elements, and the service provides semantic connections between users in a form of discussion forum. Information is shared by providing RDF rather than HTML, and tags take advantage of this providing semantic tagging. Under the covers, Java code accesses an RDF triple store for persisting information.
Next was Jans Aasman, CEO of Franz Inc. Franz Inc. is the creator of AllegroGraph, a high performance triple store that allow the use of SNA, DB lookup, RDF reasoning, temporal and spartial semantics in a single query.
Brian Sletten, a partner at Zepheira, takes a different approach as a consultant rather than product provider. Zepheria believes data should have a public and private facet, as well as a human readable form and a machine readable (meta-data) form. By using common tags and naming conventions, transformations can be create that allows data to become rich semantic data.
Last was Dean Allemang, Chief Scientist at TopQuadrant, a company that provides a suite of semantic web tools and technologies. The problems Dean is helping to solve are "how to build flexible data stores" (which are not based on traditional relational databases) and "how to manage information even when it's inside your own enterprise". He sees the semantic web as a tool to allow data inheritance and better aligning forms with data.
With the introductions over, Henry Story, semantic web evangelist at Sun, led the panel discussion. The first discussion was a commentary on how the semantic web related to Java and Java developers. The general agreement was that Java provided a mechanism for accessing the structure of the data, and it is the data that is more important.
Many programmers see objects or classes and expect nothing new - but they are wrong. Database programmers come in seeing database management like they have been dreaming of.
You have to start thinking about where the data comes from. It's no longer coming from the same place.
The Java platform provides an easier way to access the information. But Java developers have to realize that its the data that's now important.
GRDDL (Gleaning Resource Descriptions from Dialects of Languages) makes accessing data easier. Tools provide a common structure for different underlying data structures and formats.
GRDDL is a W3C recommendation, and enables users to get RDF out of XML and XHTML documents via XSLT.Question: "Is the semantic web going to change taxonomies that are already being created? i.e. XBR".
On the web there is no single agreement on anything, and the web infrastructure will handle this. The semantic web will provide the same handling.
Presenter Question: "There are about 600 tools now, how can the community help us?"
A standardization of APIs would help a lot.
There is a disconnect between thinking of the semantic web as data vs. objects.
It's still too early for standards. we need more time to understand the problems.Question: "How do create the triples from existing text?"
It's difficult, especially in free text, but it's easier when you create the data and know the context.
One solution is DBpedia (DBpedia is a community effort to extract structured information from Wikipedia), which is created from wikipedia.
Most of the structured data is going to come directly from the database and not text.
Semantic markup and HTML markup is easy to do. With Solvent (a tool from MIT) you can select DOM elements from web pages and provide the mapping to RDF. We need to attack the problem from multiple angles.
This is a chick and egg problem - as soon as people start supplying data, applications will become available that utilize the data.
The semantic web is able to grow virally, due to public data.
There is also a social problem - not all data is going to be public.
The final question, from Henry was "where is 'reasoning' headed?"
At the moment academia is working mainly in memory, larger data sets cause problems because more disk reads are required which makes the processing slower.
The semantic web has not come of age because of these types of problems. Instead, leverage the 'low hanging fruit' while the advanced parts arrive. There's a lot of value in making the data semantic.
The take away from the panel was simple: the semantic web is here now, and the tools are available to create and transform data.