Nuxeo's employee blog recently introduced fise (Furtwangen IKS Semantic Engine) - an open source RESTful semantic engine to which NUXEO has made contributions. The goal of fise is to "help bring new and trendy semantic features to CMS by giving developers a stack of reusable HTTP semantic services to build upon." fise is part of a larger effort, IKS (Interactive Knowledge Stack) as a means of enhancing CMS offerings with Semantic Web capabilities.
A 'semantic engine' takes unstructured input (e.g. text files) and produces what amount to search-able indices and concordances as a means of extracting the "meaning" of that input. For example, semantic engines can typically categorize documents (e.g. by language or topic; suggest tags, or extract known entities (e.g. names, places, dates). Using this kind of classification information the engines can also sort and link related documents and extract assertions (e.g. "company x bought company y on this date for this amount of money"). A content management system is primarily concerned with the creation, persistence, and organization of texts (multimedia texts in many cases) and so the integration of a semantic engine provides obvious advantages for search and for organization of content. A content management system might be designed and used primarily to keep track of documents generated and used within an enterprise, or it might be used to organize and manage all of the 'documents' (web pages) that comprise a sophisticated site. One aspect of the effort to create a "Semantic Web" is for every Web page to incorporate the kind of classification, indexing, and concordance data generated by a semantic engine.
Open Calais, Zemanta and Evri are examples of semantic engines, available via Web APIs, that can be used to semantically annotate web pages and sites. An ancestor of this kind of semantic engine was IZE developed and marketed by a small Madison, Wisconsin company called Persoft, back in 1988.
The rationale for semantic annotation is summarized by Olivier Grisel (author of the Nuxeo blog) thusly:
Linking content items to semantic entities and topics that are defined in open universal databases (such as DBpedia, freebase or the NY Times database) allows for many content driven applications like online websites or private intranets to share a common conceptual frame and improve findability and interoperability.
Publishers can leverage such technologies to build automatically updated entity hubs that aggregate resources of different types (documents, calendar events, persons, organizations, ...) that are related to a given semantic entity identified by an disambiguated universal identifiers that span all applications.
fise offers three basic fttp services, defined as endpoints:
fise offers three HTTP endpoints: the engines, the store and the sparql endpoint:
- the /engines endpoint allows the user to analyse English text content and send back the results of the analysis without storing anything on the server: this is stateless HTTP service
- the /store endpoint does the same analysis but furthermore stores the results on the fise server: this a stateful HTTP service. Analysis results are then available for later browsing.
- the /sparql endpoint provide a machine level access to perform complex graph queries the enhancements extracted on content items sent to the /store endpoint.
These services can be accessed directly via "a web user interface for human beings who want to test the capabilities of the engines manually and navigate through the results using there browser. This is primarily a demo mode." "The second way to use fise is the RESTful API for machines (e.g. third party ECM applications such as Nuxeo DM and Nuxeo DAM) that will use fise as an HTTP service to enhance the content of their documents."
Organizations and individuals are discovering that they are being overwhelmed by the sheer volume of information, mostly in the form of unstructured documents, that they must deal with on an ongoing basis. this accounts for the increasing interest in content management systems and CMS enhanced with semantic engine technology. Nuxeo is itself a provider of CMS services and has plans to integrate fise with its product line.
Right now fise is a standalone HTTP service with a basic web interface mainly used for demo purposes. To make it really useful some work is needed to integrate it with the Nuxeo platform so that Nuxeo DM, Nuxeo DAM and Nuxeo CMF users will benefit from a seamless semantic experience.
To what extent are you and your organization using CMS and what value are you finding in adding semantic annotations to your content?
Community comments
Good in theory
by peter lin,
Re: Good in theory
by Vanni Torelli,
Re: Good in theory
by peter lin,
Re: Good in theory
by Olivier Grisel,
Re: Good in theory
by Christopher Churchill,
Re: Good in theory
by Peter Rajsky,
Re: Good in theory
by peter lin,
Good in theory
by peter lin,
Your message is awaiting moderation. Thank you for participating in the discussion.
Although the idea of integrating semantic technology is great, it appears fise uses RDF/OWL/Jena2. As history has shown, there have been dozens of attempts at commercializing W3C semantic web and most of them have failed miserably. RDF/OWL approach is fundamentally flawed and not usable for anything realistic. There several papers discussing the limitations and how to fix or "work around" the issues. The IT industry needs to learn the painful lessons of why RDF/OWL failed and go back to AI for inspiration. There's been way too much NIH in the semweb world.
Re: Good in theory
by Vanni Torelli,
Your message is awaiting moderation. Thank you for participating in the discussion.
Another problem is the poor, if not non-existent, interoperability across knowledge bases. Ontologies don't speak to each other even if they relate to the same context/domain and contain the same terms/concepts.
Re: Good in theory
by peter lin,
Your message is awaiting moderation. Thank you for participating in the discussion.
The problem is worse than that. It's not feasible to create one global ontology that everyone uses. Even if someone builds one, it can't serve everyone's needs. This means everyone builds their own ontology, which can't speak to any other ontology without proper mapping.
Who is going to create those mapping and how is it going to be maintained? Digging deeper, RDF Schema and OWL aren't well suited to modeling complex business models. The whole triple approach to RDF is flawed, resulting in people extending it to quads. Triples are only good for capturing simple data, like "sky-is-blue", "water-is-wet", etc. Using triples results in poor pattern matching performance and poorly written reasoners. Many semantic web reasoners claim to implement RETE, but don't. I've studied the code closely and know they don't really implement RETE.
Re: Good in theory
by Olivier Grisel,
Your message is awaiting moderation. Thank you for participating in the discussion.
I would not call Reuter's Open Calais a miserable failure. The choice for RDF was mainly pragmatical and driven by the fact that now other solution makes it so easy to share and reuse globally identified entities than the Linked Open Data cloud:
www4.wiwiss.fu-berlin.de/lodcloud/
Furthermore you don't need complex ontologies for basic entities with types Person, Place and Organization. Everybody can agree upon what is the the birthdate of a person, the inception date of an organization and the latitude and longitude of a place. The dbpedia ontology is more than enough for this simple use case and many members of the community are already reusing it (see the linked datasets from the afore mentioned LOD cloud).
Re: Good in theory
by Christopher Churchill,
Your message is awaiting moderation. Thank you for participating in the discussion.
Boring technologies. Even dog wont use this 5 years later.
Re: Good in theory
by Peter Rajsky,
Your message is awaiting moderation. Thank you for participating in the discussion.
1) These technologies are used already. E.g. there is fedora-commons.org/, which is often used for cultural heritage repositories, but there are other working examples in credit risk management and other areas.
2) I think RDF/OWL were developed by really smart people, but for different use cases than many people would like to use and commercialize them - not for development and integration of enterprise applications. In this case OWL/RDF technologies have really problems, which prevent them to be used more in practice. But there are few real benefits, why to describe "complex" metadata in RDF and not in "pure" XML or other form. E.g. you as a user don't have to understand data structures (which are defined by XML schemas) for query definition, but only concepts/entities and their relationships. I think it is great quality if you want to support "complex" and dynamic metadata. Using "semantic" engine for CMS is good and practical idea according to me (of course it can be not descriptive enough for your use cases).
Re: Good in theory
by peter lin,
Your message is awaiting moderation. Thank you for participating in the discussion.
My bias perspective, it's not the RDF/OWL that make Reuter's Calais successful. It's the data Reuters provides. They could have created their own Ontology language and it probably would have been just as successful. The "magic" in Calais isn't RDF/OWL, it's the information extraction bit www.clearforest.com/solutions.html.
There's ample evidence from the failed semWeb ventures that show simple ontologies often fail to solve real business problems.