InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Nuxeo Introduces fise Semantic Engine

Posted by Dave West on Sep 02, 2010

Sections
Operations & Infrastructure,
Enterprise Architecture,
Development,
Architecture & Design
Topics
REST ,
Architecture ,
Semantic Web ,
Java ,
Open Source
Tags
Content Management Systems

Nuxeo's employee blog recently introduced fise (Furtwangen IKS Semantic Engine) - an open source RESTful semantic engine to which NUXEO has made contributions. The goal of fise is to "help bring new and trendy semantic features to CMS by giving developers a stack of reusable HTTP semantic services to build upon." fise is part of a larger effort, IKS (Interactive Knowledge Stack) as a means of enhancing CMS offerings with Semantic Web capabilities.

A 'semantic engine' takes unstructured input (e.g. text files) and produces what amount to search-able indices and concordances as a means of extracting the "meaning" of that input. For example, semantic engines can typically categorize documents (e.g. by language or topic; suggest tags, or extract known entities (e.g. names, places, dates). Using this kind of classification information the engines can also sort and link related documents and extract assertions (e.g. "company x bought company y on this date for this amount of money"). A content management system is primarily concerned with the creation, persistence, and organization of texts (multimedia texts in many cases) and so the integration of a semantic engine provides obvious advantages for search and for organization of content. A content management system might be designed and used primarily to keep track of documents generated and used within an enterprise, or it might be used to organize and manage all of the 'documents' (web pages) that comprise a sophisticated site. One aspect of the effort to create a "Semantic Web" is for every Web page to incorporate the kind of classification, indexing, and concordance data generated by a semantic engine.

Open Calais, Zemanta and Evri are examples of semantic engines, available via Web APIs, that can be used to semantically annotate web pages and sites. An ancestor of this kind of semantic engine was IZE developed and marketed by a small Madison, Wisconsin company called Persoft, back in 1988.

The rationale for semantic annotation is summarized by Olivier Grisel (author of the Nuxeo blog) thusly:

Linking content items to semantic entities and topics that are defined in open universal databases (such as DBpedia, freebase or the NY Times database) allows for many content driven applications like online websites or private intranets to share a common conceptual frame and improve findability and interoperability.

Publishers can leverage such technologies to build automatically updated entity hubs that aggregate resources of different types (documents, calendar events, persons, organizations, ...) that are related to a given semantic entity identified by an disambiguated universal identifiers that span all applications.

fise offers three basic fttp services, defined as endpoints:

fise offers three HTTP endpoints: the engines, the store and the sparql endpoint:
  • the /engines endpoint allows the user to analyse English text content and send back the results of the analysis without storing anything on the server: this is stateless HTTP service
  • the /store endpoint does the same analysis but furthermore stores the results on the fise server: this a stateful HTTP service. Analysis results are then available for later browsing.
  • the /sparql endpoint provide a machine level access to perform complex graph queries the enhancements extracted on content items sent to the /store endpoint.

These services can be accessed directly via "a web user interface for human beings who want to test the capabilities of the engines manually and navigate through the results using there browser. This is primarily a demo mode." "The second way to use fise is the RESTful API for machines (e.g. third party ECM applications such as Nuxeo DM and Nuxeo DAM) that will use fise as an HTTP service to enhance the content of their documents."

Organizations and individuals are discovering that they are being overwhelmed by the sheer volume of information, mostly in the form of unstructured documents, that they must deal with on an ongoing basis. this accounts for the increasing interest in content management systems and CMS enhanced with semantic engine technology. Nuxeo is itself a provider of CMS services and has plans to integrate fise with its product line.

Right now fise is a standalone HTTP service with a basic web interface mainly used for demo purposes. To make it really useful some work is needed to integrate it with the Nuxeo platform so that Nuxeo DM, Nuxeo DAM and Nuxeo CMF users will benefit from a seamless semantic experience.

 

To what extent are you and your organization using CMS and what value are you finding in adding semantic annotations to your content?

Good in theory by peter lin Posted
Re: Good in theory by Vanni Torelli Posted
Re: Good in theory by peter lin Posted
Re: Good in theory by Olivier Grisel Posted
Re: Good in theory by Christopher Churchill Posted
Re: Good in theory by Peter Rajsky Posted
Re: Good in theory by peter lin Posted
  1. Back to top

    Good in theory

    by peter lin

    Although the idea of integrating semantic technology is great, it appears fise uses RDF/OWL/Jena2. As history has shown, there have been dozens of attempts at commercializing W3C semantic web and most of them have failed miserably. RDF/OWL approach is fundamentally flawed and not usable for anything realistic. There several papers discussing the limitations and how to fix or "work around" the issues. The IT industry needs to learn the painful lessons of why RDF/OWL failed and go back to AI for inspiration. There's been way too much NIH in the semweb world.

  2. Back to top

    Re: Good in theory

    by Vanni Torelli

    Another problem is the poor, if not non-existent, interoperability across knowledge bases. Ontologies don't speak to each other even if they relate to the same context/domain and contain the same terms/concepts.

  3. Back to top

    Re: Good in theory

    by peter lin

    The problem is worse than that. It's not feasible to create one global ontology that everyone uses. Even if someone builds one, it can't serve everyone's needs. This means everyone builds their own ontology, which can't speak to any other ontology without proper mapping.

    Who is going to create those mapping and how is it going to be maintained? Digging deeper, RDF Schema and OWL aren't well suited to modeling complex business models. The whole triple approach to RDF is flawed, resulting in people extending it to quads. Triples are only good for capturing simple data, like "sky-is-blue", "water-is-wet", etc. Using triples results in poor pattern matching performance and poorly written reasoners. Many semantic web reasoners claim to implement RETE, but don't. I've studied the code closely and know they don't really implement RETE.

  4. Back to top

    Re: Good in theory

    by Olivier Grisel

    I would not call Reuter's Open Calais a miserable failure. The choice for RDF was mainly pragmatical and driven by the fact that now other solution makes it so easy to share and reuse globally identified entities than the Linked Open Data cloud:

    www4.wiwiss.fu-berlin.de/lodcloud/

    Furthermore you don't need complex ontologies for basic entities with types Person, Place and Organization. Everybody can agree upon what is the the birthdate of a person, the inception date of an organization and the latitude and longitude of a place. The dbpedia ontology is more than enough for this simple use case and many members of the community are already reusing it (see the linked datasets from the afore mentioned LOD cloud).

  5. Back to top

    Re: Good in theory

    by Christopher Churchill

    Boring technologies. Even dog wont use this 5 years later.

  6. Back to top

    Re: Good in theory

    by Peter Rajsky

    1) These technologies are used already. E.g. there is fedora-commons.org/, which is often used for cultural heritage repositories, but there are other working examples in credit risk management and other areas.
    2) I think RDF/OWL were developed by really smart people, but for different use cases than many people would like to use and commercialize them - not for development and integration of enterprise applications. In this case OWL/RDF technologies have really problems, which prevent them to be used more in practice. But there are few real benefits, why to describe "complex" metadata in RDF and not in "pure" XML or other form. E.g. you as a user don't have to understand data structures (which are defined by XML schemas) for query definition, but only concepts/entities and their relationships. I think it is great quality if you want to support "complex" and dynamic metadata. Using "semantic" engine for CMS is good and practical idea according to me (of course it can be not descriptive enough for your use cases).

  7. Back to top

    Re: Good in theory

    by peter lin

    My bias perspective, it's not the RDF/OWL that make Reuter's Calais successful. It's the data Reuters provides. They could have created their own Ontology language and it probably would have been just as successful. The "magic" in Calais isn't RDF/OWL, it's the information extraction bit www.clearforest.com/solutions.html.

    There's ample evidence from the failed semWeb ventures that show simple ontologies often fail to solve real business problems.

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.