InfoQ

News

.NET Spotlight on Open Source: Beagle

Posted by James Vastbinder on Aug 12, 2007 09:45 PM

Community
.NET
Topics
Open Source
Tags
Beagle ,
Mono
In this .NET Spotlight on Open Source, Infoq interviewed Joe Shaw and Pierre Ostlund on Beagle.  Beagle is one of the most famous Mono applications on Linux and provides desktop search to that community.


InfoQ:
 What prompted the creation of Beagle?

Joe Shaw: Beagle really grew out of a project called Dashboard (http://nat.org/dashboard) that I worked on with Nat Friedman, Alex Graveley, Jim Krehl and others primarily in the summer of 2003.  The idea behind Dashboard (which predated Apple's own very different Dashboard) was that the computer knows what you're doing at any given  time -- reading emails, IMing with a friend, working on a document -- so it should be able to show information relevant to what you're doing right then.  A lot of the work we were doing was people-based: if I was IMing with Nat, it would show me Nat's latest blog entries, or the last few emails I received from him.  It'd show me his email address and IM nickname and phone number.  And if he typed someone else's phone number in the window, it would look it up and provide me info there too.

In doing Dashboard we found two real deficiencies of our platform: (1) data (and metadata) was difficult to access for a variety of reasons and (2) we were losing vast amounts of metadata which established relationships between various pieces of data.

Beagle was largely created to address problem number 1.  It would index textual content and metadata so that it could be efficiently  searched and applications only had to go to a single location to find it and quickly retrieve it.

InfoQ: How is development of Beagle funded?

Joe Shaw: From the beginning Novell has funded at least one person full-time to developing it.  Initially it was Jon Trowbridge.  A little bit after Dave Camp and I were part-time contributors to it.  After Dave went to work on Hula I became a full time contributor on it with Jon, and when Jon left last January it was myself alone.  For a period there Dan Winship was also working on it part-time with me.

In addition to that, Google has indirectly funded development the this summer and last summer through its Summer of Code program.

But of course, Beagle's strengths are in that it's an open source project.  Large amounts of effort have been provided by individual contributors, and Beagle would not be possible without them.  A little over a year ago I listed all of the contributors up to that point, and it numbered over one hundred.

InfoQ: What is the current status of Beagle?

Joe Shaw: At this point Beagle is by far the most featureful, usable desktop search system on Linux today.  We support over 20 data sources (file system, email, IM logs, etc.) and over 60 data formats (MS Office, ODF, PDF, MP3, etc.) which I think is the most of any desktop search system on any operating system.

We're shipped on most Linux distributions and some of them integrate Beagle pretty deeply in the desktop experience.

As for the project itself, we're working toward a 0.3.0 release -- a major upgrade from our 0.2.x series -- which will feature faster indexing, more complete indexing of archive contents, better support for externally stored metadata like tags and annotations, etc. 

InfoQ: What is it like competing with Google Desktop and MS Desktop Search?

Joe Shaw: Well, MS Desktop Search doesn't run on Linux and Beagle doesn't run on Windows (yet), so I don't even see them as competitors.

Google Desktop just came out for Linux and although it indexes Gmail (and we don't... yet) it lacks the wide coverage Beagle has.  It doesn't index IM conversations or integrate well with mail clients other than Thunderbird.  It taxes the system while it indexes and has no integration with existing desktop applications.  Not being open source, that fundamentally limits its ability to be extended to support new and existing data types and means that it'll never achieve tight integration in the Linux desktop.  Beagle's permissive open source license is a strength in this area.

GDL has some nice features: it seems to do some sort of version control and storing of cached data; it handles plain mailbox files on disk nicer; and it supports indexing of Gmail, but none of these are radical features that Beagle can't implement.

InfoQ: Who is the target user of Beagle?

Joe Shaw: Beagle targets both users and developers.  For developers, we provide some really nice APIs for extending the types of data Beagle can index and then searching those indexes.  This means that developers can integrate index and search into their applications, or build entirely new user interfaces around search.

For users, the goal is simply to make it easier to find your data.  The file system is a fairly arcane metaphor that users have to deal with, and in many cases people simply ignore it.  They just dump all their files into their Documents folder.  I do this to an extent myself; everything I download goes into a special folder, things pile up over time, and then it's impossible to extract a needle from the haystack.  Then you have things like email that abstract away the storage (either on the file system or a server) but only allow you to access the mail through the email program.  Ditto for addressbook contacts or calender events.  Until recently on Linux, there was no user accessible (non-command line) way to access IM chat logs at all.  Web pages are cached by your browser but essentially inaccessible to you.

Beagle solves these problems by making them all readily and easily accessible through a graphical search interface.  You don't need to navigate a folder hierarchy anymore.  You don't need to go one-by-one through a list of files in a directory trying to remember what you named that document.  Your emails and IM logs and RSS feeds and web history and addressbook contacts are right there alongside files.

Of course, that's the idealistic view.  Some people, like me, are just disorganized and a tool like this helps me.  Some people are highly organized, love folders, and desktop search might be completely superfluous to them.  That's fine, it's not for everybody.  In the future, however, we might see some really innovative applications built on top of desktop search that can benefit even these organized individuals, like the Dashboard project I mentioned earlier.

InfoQ: What are the futures for Beagle?

Joe Shaw: There is always more data to index, performance optimizations to make, etc.  That's the boring future. :)

Beyond that, we're looking at adding networked searches, so that you'll be able to run searches against several machines.  We'd like to use Zeroconf here with multicast DNS and service discovery to be able  to search machines on your local network without needing any configuration.  Another potential feature is automatically determining what language a document is by doing some statistical analysis on
it... we have patches floating around for that.

I'd like to see the platform on the Linux desktop expand to do del.icio.us-style tagging of any piece of data -- files, emails, web pages -- and make that data available for Beagle to index.  I'd like to see applications evolve so that they stop siloing their data and make it more available to other applications, including Beagle.  I'd like to see applications storing implicit relationships between data -- when I save an email attachment, store the relationship of that file on disk to the person who sent it to me -- and make that available to Beagle for indexing.  I'd like to see more apps use Beagle internally as their search mechanism.  None of these are necessarily changes to Beagle itself but how we can broadly improve the user experience.

Enjoy related news on Mono brought to you by InfoQ.

No comments

Watch Thread Reply

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.