InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Nuxeo Introduces fise Semantic Engine

Posted by Dave West on Sep 02, 2010

Sections
Operations & Infrastructure,
Enterprise Architecture,
Development,
Architecture & Design
Topics
Open Source ,
Architecture ,
REST ,
Semantic Web ,
Java
Tags
Content Management Systems

Nuxeo's employee blog recently introduced fise (Furtwangen IKS Semantic Engine) - an open source RESTful semantic engine to which NUXEO has made contributions. The goal of fise is to "help bring new and trendy semantic features to CMS by giving developers a stack of reusable HTTP semantic services to build upon." fise is part of a larger effort, IKS (Interactive Knowledge Stack) as a means of enhancing CMS offerings with Semantic Web capabilities.

A 'semantic engine' takes unstructured input (e.g. text files) and produces what amount to search-able indices and concordances as a means of extracting the "meaning" of that input. For example, semantic engines can typically categorize documents (e.g. by language or topic; suggest tags, or extract known entities (e.g. names, places, dates). Using this kind of classification information the engines can also sort and link related documents and extract assertions (e.g. "company x bought company y on this date for this amount of money"). A content management system is primarily concerned with the creation, persistence, and organization of texts (multimedia texts in many cases) and so the integration of a semantic engine provides obvious advantages for search and for organization of content. A content management system might be designed and used primarily to keep track of documents generated and used within an enterprise, or it might be used to organize and manage all of the 'documents' (web pages) that comprise a sophisticated site. One aspect of the effort to create a "Semantic Web" is for every Web page to incorporate the kind of classification, indexing, and concordance data generated by a semantic engine.

Open Calais, Zemanta and Evri are examples of semantic engines, available via Web APIs, that can be used to semantically annotate web pages and sites. An ancestor of this kind of semantic engine was IZE developed and marketed by a small Madison, Wisconsin company called Persoft, back in 1988.

The rationale for semantic annotation is summarized by Olivier Grisel (author of the Nuxeo blog) thusly:

Linking content items to semantic entities and topics that are defined in open universal databases (such as DBpedia, freebase or the NY Times database) allows for many content driven applications like online websites or private intranets to share a common conceptual frame and improve findability and interoperability.

Publishers can leverage such technologies to build automatically updated entity hubs that aggregate resources of different types (documents, calendar events, persons, organizations, ...) that are related to a given semantic entity identified by an disambiguated universal identifiers that span all applications.

fise offers three basic fttp services, defined as endpoints:

fise offers three HTTP endpoints: the engines, the store and the sparql endpoint:
  • the /engines endpoint allows the user to analyse English text content and send back the results of the analysis without storing anything on the server: this is stateless HTTP service
  • the /store endpoint does the same analysis but furthermore stores the results on the fise server: this a stateful HTTP service. Analysis results are then available for later browsing.
  • the /sparql endpoint provide a machine level access to perform complex graph queries the enhancements extracted on content items sent to the /store endpoint.

These services can be accessed directly via "a web user interface for human beings who want to test the capabilities of the engines manually and navigate through the results using there browser. This is primarily a demo mode." "The second way to use fise is the RESTful API for machines (e.g. third party ECM applications such as Nuxeo DM and Nuxeo DAM) that will use fise as an HTTP service to enhance the content of their documents."

Organizations and individuals are discovering that they are being overwhelmed by the sheer volume of information, mostly in the form of unstructured documents, that they must deal with on an ongoing basis. this accounts for the increasing interest in content management systems and CMS enhanced with semantic engine technology. Nuxeo is itself a provider of CMS services and has plans to integrate fise with its product line.

Right now fise is a standalone HTTP service with a basic web interface mainly used for demo purposes. To make it really useful some work is needed to integrate it with the Nuxeo platform so that Nuxeo DM, Nuxeo DAM and Nuxeo CMF users will benefit from a seamless semantic experience.

 

To what extent are you and your organization using CMS and what value are you finding in adding semantic annotations to your content?

Good in theory by peter lin Posted
Re: Good in theory by Vanni Torelli Posted
Re: Good in theory by peter lin Posted
Re: Good in theory by Olivier Grisel Posted
Re: Good in theory by Christopher Churchill Posted
Re: Good in theory by Peter Rajsky Posted
Re: Good in theory by peter lin Posted
  1. Back to top

    Good in theory

    by peter lin

    Although the idea of integrating semantic technology is great, it appears fise uses RDF/OWL/Jena2. As history has shown, there have been dozens of attempts at commercializing W3C semantic web and most of them have failed miserably. RDF/OWL approach is fundamentally flawed and not usable for anything realistic. There several papers discussing the limitations and how to fix or "work around" the issues. The IT industry needs to learn the painful lessons of why RDF/OWL failed and go back to AI for inspiration. There's been way too much NIH in the semweb world.

  2. Back to top

    Re: Good in theory

    by Vanni Torelli

    Another problem is the poor, if not non-existent, interoperability across knowledge bases. Ontologies don't speak to each other even if they relate to the same context/domain and contain the same terms/concepts.

  3. Back to top

    Re: Good in theory

    by peter lin

    The problem is worse than that. It's not feasible to create one global ontology that everyone uses. Even if someone builds one, it can't serve everyone's needs. This means everyone builds their own ontology, which can't speak to any other ontology without proper mapping.

    Who is going to create those mapping and how is it going to be maintained? Digging deeper, RDF Schema and OWL aren't well suited to modeling complex business models. The whole triple approach to RDF is flawed, resulting in people extending it to quads. Triples are only good for capturing simple data, like "sky-is-blue", "water-is-wet", etc. Using triples results in poor pattern matching performance and poorly written reasoners. Many semantic web reasoners claim to implement RETE, but don't. I've studied the code closely and know they don't really implement RETE.

  4. Back to top

    Re: Good in theory

    by Olivier Grisel

    I would not call Reuter's Open Calais a miserable failure. The choice for RDF was mainly pragmatical and driven by the fact that now other solution makes it so easy to share and reuse globally identified entities than the Linked Open Data cloud:

    www4.wiwiss.fu-berlin.de/lodcloud/

    Furthermore you don't need complex ontologies for basic entities with types Person, Place and Organization. Everybody can agree upon what is the the birthdate of a person, the inception date of an organization and the latitude and longitude of a place. The dbpedia ontology is more than enough for this simple use case and many members of the community are already reusing it (see the linked datasets from the afore mentioned LOD cloud).

  5. Back to top

    Re: Good in theory

    by Christopher Churchill

    Boring technologies. Even dog wont use this 5 years later.

  6. Back to top

    Re: Good in theory

    by Peter Rajsky

    1) These technologies are used already. E.g. there is fedora-commons.org/, which is often used for cultural heritage repositories, but there are other working examples in credit risk management and other areas.
    2) I think RDF/OWL were developed by really smart people, but for different use cases than many people would like to use and commercialize them - not for development and integration of enterprise applications. In this case OWL/RDF technologies have really problems, which prevent them to be used more in practice. But there are few real benefits, why to describe "complex" metadata in RDF and not in "pure" XML or other form. E.g. you as a user don't have to understand data structures (which are defined by XML schemas) for query definition, but only concepts/entities and their relationships. I think it is great quality if you want to support "complex" and dynamic metadata. Using "semantic" engine for CMS is good and practical idea according to me (of course it can be not descriptive enough for your use cases).

  7. Back to top

    Re: Good in theory

    by peter lin

    My bias perspective, it's not the RDF/OWL that make Reuter's Calais successful. It's the data Reuters provides. They could have created their own Ontology language and it probably would have been just as successful. The "magic" in Calais isn't RDF/OWL, it's the information extraction bit www.clearforest.com/solutions.html.

    There's ample evidence from the failed semWeb ventures that show simple ontologies often fail to solve real business problems.

Educational Content

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.

Max Protect: Scalability and Caching at ESPN.com

Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.

The Seven Deadly Sins of Enterprise Agile Adoption

Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.

Questions for an Enterprise Architect

Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?

Wrap Your SQL Head Around Riak MapReduce

Sean Cribbs explains what Map-Reduce and Riak are, why and how to use Map-Reduce with Riak, and how to convert SQL queries into their Map-Reduce equivalents.