New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Ryan Slobojan on Jan 24, 2008
The Apache Lucene project, a high-performance full-featured text search engine library written entirely in Java, released version 2.3 today. InfoQ spoke with committer and Project Management Committee (PMC) member Grant Ingersoll to learn more about this release and the future plans for Lucene.
Ingersoll indicated that the largest change in this release is a new indexing algorithm, which uses new in-memory models to achieve large speed improvements. According to Ingersoll, simply switching the existing Lucene 2.2 JAR for a Lucene 2.3 JAR resulted in speed-ups of 500% in indexing performance in several tests which were performed. Other changes include:
Document, Field and Token instances can now be reused during indexing analysis, which both speeds up analysis and reduces the number of allocations during indexingsetMaxBufferedDocs method has been supplanted by the more intuitive setRAMBufferSizeMB methodIn addition, 2.3 is intended to be a drop-in replacement for 2.2, with no recompilation required. A comprehensive changelog is also available.
Ingersoll also discussed the future plans for Lucene, saying that the next release would be 2.9. The 2.9 release will be a relatively minor, with items being marked as deprecated and other clean-up being performed in preparation for Lucene 3.0. The 3.0 version will be a major release which will involve moving the codebase to JDK 5 as the minimum supported codebase - the other major features of 3.0 are yet to be determined.
The Lucene community as a whole was also discussed, with Ingersoll indicating that Lucene and Solr have a strong integration, and that Nutch, Tika and Hadoop also enjoyed a fair amount of intercommunication. Ingersoll also described a new project named Mahout which he is in the process of launching:
That will be a separate project, but may be beneficial to Lucene users. There are currently some patches in JIRA for Lucene that implement ML algorithms. The goal of this project is to provide commercial quality, large scale machine learning (ML) algorithms built on Hadoop under an Apache license. I have seen a fair amount of interest already, and hope to have this project underway in the coming month.
Ingersoll said that, by creating Mahout, he hoped to "further unlock the mysteries of Google and companies like it to provide these capabilities to the masses and spur on new innovation in the space" -- for those with an interest in this new project, there are both a project plan and an incubator proposal available.
Monitor your Production Java App - includes JMX! Low Overhead - Free download
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Fair Trade Software Licensing - A Guide to Neo4j Licensing Options
agility@scale eKit: 10 Principles, Scaling Model, Metrics, Collaboration
18 agile and lean practices for effective software development governance
After following up with Grant after the publishing of this item, I learned two things:
Ryan Slobojan
Any releases on Lucene.Net?
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
2 comments
Watch Thread Reply