Apache Tika 1.0 Allows Easy Text Extraction for Java
InfoQ interviewed Chris Mattman from Apache Tika, a text extraction and detection library, in the occasion of the 1.0 release and the publication of the "Tika in Action" book.
InfoQ interviewed Chris Mattman from Apache Tika, a text extraction and detection library, in the occasion of the 1.0 release and the publication of the "Tika in Action" book.
The Apache Harmony PMC initiated a vote earlier this week to begin the process of moving the codebase into the Apache Attic and disbanding the PMC. With 18 for and 2 against, the result will be that the Apache Harmony project will be wound up and placed in the Attic for posterity.
Johnathan Ellis keynoted at Cassandra SF 2011. Ellis reviewed accomplishments including better support for multi-data center deployments, optimized read performance, included integrated caching and improved client APIs including a SQL-like language CQL. Looking forward, Ellis emphasized polish - efficient database repair, storage compression, optimized performance and an expanded CQL language.
Yahoo spun-out its core Hadoop team, forming a new company Hortonworks. CEO Eric Baldeschwieler presented their vision of easing adoption of Hadoop and making core engineering improvements for availability, performance, and manageability. Hortonworks will sell support, training, and certification, primarily indirects through partners.
The Apache Foundation has announced on May 25th that it has graduated Libcloud from Incubator status to a Top-Level Project. Libcloud represents a Python library that introduces a vendor-neutral interface to proprietary APIs of various cloud providers. As a Top-Level-Project the solution will get much more awareness and support from the open-source community in the future.
While VMWare offering a new range of products to support its vision of enterprise cloud computing at VMWorld 2010 is interesting from an operations and user perspective, developer focus is on vFabric the Spring platform for developing and running cloud based applications. The goal is to provide the same convenience infrastructure for cloud applications as for spring based enterprise applications.
Recently the Deltacloud and libCloud projects were accepted into Apache incubator status. Now it looks like Nuvem, another Cloud-related project may be coming soon. Although there may be overlaps with these other projects, it seems that Nuvem may be taking a SOA-based approach with dependency on SCA.
Three recent announcements highlight the evolving cloud ecosystem in favor of openness and standards. Red Hat has moved its Deltacloud effort to the Apache Incubator, Rackspace has made its Cloud Files code open source, and the Distributed Management Task Force (DMTF) has released two documents laying out the essential functions for cloud computing and descriptive language for them.