Collaboration: At the Extremities of Extreme
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Charles Humble on Sep 09, 2009
Lucid Imagination, a commercial company working with the Apache Lucene and Solr search engine libraries, has introduced a new monitoring product called LucidGaze. The product is a fully instrumented version of Lucene for developers. Performance data can be printed to a log file, stored in a round-robin database, or made available through a Java API. If the round-robin database method is used then the RRD4j library provides a standalone Swing application that you can use to read and process the database.
Installation is straightforward. The software is supplied as a .jar file which acts as a drop in replacement for the Lucene .jar. To install it a developer simply switches lucene-core.2.4.1.jar for lucene-core-gaze.2.4.1.jar on their application's classpath. As such, developers need make no changes to the source code of their application, and could potentially also use the product in situations where the source for the application to be monitored is unavailable.
LucidGaze offers developers a range of analytics for looking at how well searches are transformed into document retrieval operations, how effectively user input is analyzed and decomposed for processing by the index, and how text is processed and indexed. The tool uses 5 different monitors to collect statistics:
The overhead of running with full monitoring is considerable. During a conversation Grant Ingersoll, a member of Lucid Imagination's technical team, suggested a figure of around 10-15%. It is however possible to reduce the overhead by configuring which statistics are collected and whether they should be persisted.
InfoQ also talked to Ingersoll about some typical applications for LucidGaze. One he highlighted was a common developer error when working with Lucene: an apparent memory leak caused by the developer failing to close an IndexReader. LucidGaze collects data on the number of currently open Indexeaders, the total number of IndexReader#reopen() calls and which of these resulted in a new instance of an IndexReader, along with the total estimated RAM that all IndexReaders in use in the JVM are consuming. These stats could be very useful when tracking down a memory leak caused by failing to close a reader - at its most basic, if you are expecting to be using two IndexReaders and you have ten in memory then you have a leak somewhere. A second common use case would be when looking at re-indexing strategies during volume testing for a high volume (i.e. lots of document creates and deletes) site. Lucene's index database is composed of a number of separate "segments" each stored in an individual file. When you add documents to the index, new segments may be created. You can compact the database and reduce the number of segments, thereby speeding up query times, but there is an overhead for doing so and working out the best strategy tends to involve a lot of trial and error. LucidGaze offers stats for the number of new index segments created, as well as the number of segment merges that have occurred and the average time they took, helping developers tune their implementation. The tool can also be used to look at specific issues encountered during volume testing - isolating long running queries that are consuming an unfair share of resources, or pinpointing specific fields or documents that are causing a processing bottleneck.
The product is offered as a free, though closed-source, download from the Lucid Imagination web site. At present it supports Lucene 2.4.1 only, though Lucid Imagination have stated that they may make other versions available if there is sufficient demand.
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Improve Java Garbage Collection, Runtime Execution, and JVM visibility with Zing
18 agile and lean practices for effective software development governance
Monitor your Production Java App - includes JMX! Low Overhead - Free download
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.
John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.
1 comment
Watch Thread Reply