Bindings, Platforms, and Innovation
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
Tracking change and innovation in the enterprise software development community
Posted by Ryan Slobojan on Jan 24, 2008 10:00 PM
The Apache Lucene project, a high-performance full-featured text search engine library written entirely in Java, released version 2.3 today. InfoQ spoke with committer and Project Management Committee (PMC) member Grant Ingersoll to learn more about this release and the future plans for Lucene.
Ingersoll indicated that the largest change in this release is a new indexing algorithm, which uses new in-memory models to achieve large speed improvements. According to Ingersoll, simply switching the existing Lucene 2.2 JAR for a Lucene 2.3 JAR resulted in speed-ups of 500% in indexing performance in several tests which were performed. Other changes include:
Document, Field and Token instances can now be reused during indexing analysis, which both speeds up analysis and reduces the number of allocations during indexingsetMaxBufferedDocs method has been supplanted by the more intuitive setRAMBufferSizeMB methodIn addition, 2.3 is intended to be a drop-in replacement for 2.2, with no recompilation required. A comprehensive changelog is also available.
Ingersoll also discussed the future plans for Lucene, saying that the next release would be 2.9. The 2.9 release will be a relatively minor, with items being marked as deprecated and other clean-up being performed in preparation for Lucene 3.0. The 3.0 version will be a major release which will involve moving the codebase to JDK 5 as the minimum supported codebase - the other major features of 3.0 are yet to be determined.
The Lucene community as a whole was also discussed, with Ingersoll indicating that Lucene and Solr have a strong integration, and that Nutch, Tika and Hadoop also enjoyed a fair amount of intercommunication. Ingersoll also described a new project named Mahout which he is in the process of launching:
That will be a separate project, but may be beneficial to Lucene users. There are currently some patches in JIRA for Lucene that implement ML algorithms. The goal of this project is to provide commercial quality, large scale machine learning (ML) algorithms built on Hadoop under an Apache license. I have seen a fair amount of interest already, and hope to have this project underway in the coming month.
Ingersoll said that, by creating Mahout, he hoped to "further unlock the mysteries of Google and companies like it to provide these capabilities to the masses and spur on new innovation in the space" -- for those with an interest in this new project, there are both a project plan and an incubator proposal available.
Download the Free Adobe® Flex® Builder 3 Trial
Adobe® Rich Internet Application Project Portal
Usage Landscape: Enterprise Open Source Data Integration
After following up with Grant after the publishing of this item, I learned two things:
Ryan Slobojan
Any releases on Lucene.Net?
This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.
This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.
This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.
This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.
This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.
After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.
IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.
Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.
2 comments
Watch Thread Reply