BT

.NET Spotlight on Open Source: Beagle

| by James Vastbinder Follow 0 Followers on Aug 12, 2007. Estimated reading time: 6 minutes |
In this .NET Spotlight on Open Source, Infoq interviewed Joe Shaw and Pierre Ostlund on Beagle.  Beagle is one of the most famous Mono applications on Linux and provides desktop search to that community.


InfoQ:
 What prompted the creation of Beagle?

Joe Shaw: Beagle really grew out of a project called Dashboard (http://nat.org/dashboard) that I worked on with Nat Friedman, Alex Graveley, Jim Krehl and others primarily in the summer of 2003.  The idea behind Dashboard (which predated Apple's own very different Dashboard) was that the computer knows what you're doing at any given  time -- reading emails, IMing with a friend, working on a document -- so it should be able to show information relevant to what you're doing right then.  A lot of the work we were doing was people-based: if I was IMing with Nat, it would show me Nat's latest blog entries, or the last few emails I received from him.  It'd show me his email address and IM nickname and phone number.  And if he typed someone else's phone number in the window, it would look it up and provide me info there too.

In doing Dashboard we found two real deficiencies of our platform: (1) data (and metadata) was difficult to access for a variety of reasons and (2) we were losing vast amounts of metadata which established relationships between various pieces of data.

Beagle was largely created to address problem number 1.  It would index textual content and metadata so that it could be efficiently  searched and applications only had to go to a single location to find it and quickly retrieve it.

InfoQ: How is development of Beagle funded?

Joe Shaw: From the beginning Novell has funded at least one person full-time to developing it.  Initially it was Jon Trowbridge.  A little bit after Dave Camp and I were part-time contributors to it.  After Dave went to work on Hula I became a full time contributor on it with Jon, and when Jon left last January it was myself alone.  For a period there Dan Winship was also working on it part-time with me.

In addition to that, Google has indirectly funded development the this summer and last summer through its Summer of Code program.

But of course, Beagle's strengths are in that it's an open source project.  Large amounts of effort have been provided by individual contributors, and Beagle would not be possible without them.  A little over a year ago I listed all of the contributors up to that point, and it numbered over one hundred.

InfoQ: What is the current status of Beagle?

Joe Shaw: At this point Beagle is by far the most featureful, usable desktop search system on Linux today.  We support over 20 data sources (file system, email, IM logs, etc.) and over 60 data formats (MS Office, ODF, PDF, MP3, etc.) which I think is the most of any desktop search system on any operating system.

We're shipped on most Linux distributions and some of them integrate Beagle pretty deeply in the desktop experience.

As for the project itself, we're working toward a 0.3.0 release -- a major upgrade from our 0.2.x series -- which will feature faster indexing, more complete indexing of archive contents, better support for externally stored metadata like tags and annotations, etc. 

InfoQ: What is it like competing with Google Desktop and MS Desktop Search?

Joe Shaw: Well, MS Desktop Search doesn't run on Linux and Beagle doesn't run on Windows (yet), so I don't even see them as competitors.

Google Desktop just came out for Linux and although it indexes Gmail (and we don't... yet) it lacks the wide coverage Beagle has.  It doesn't index IM conversations or integrate well with mail clients other than Thunderbird.  It taxes the system while it indexes and has no integration with existing desktop applications.  Not being open source, that fundamentally limits its ability to be extended to support new and existing data types and means that it'll never achieve tight integration in the Linux desktop.  Beagle's permissive open source license is a strength in this area.

GDL has some nice features: it seems to do some sort of version control and storing of cached data; it handles plain mailbox files on disk nicer; and it supports indexing of Gmail, but none of these are radical features that Beagle can't implement.

InfoQ: Who is the target user of Beagle?

Joe Shaw: Beagle targets both users and developers.  For developers, we provide some really nice APIs for extending the types of data Beagle can index and then searching those indexes.  This means that developers can integrate index and search into their applications, or build entirely new user interfaces around search.

For users, the goal is simply to make it easier to find your data.  The file system is a fairly arcane metaphor that users have to deal with, and in many cases people simply ignore it.  They just dump all their files into their Documents folder.  I do this to an extent myself; everything I download goes into a special folder, things pile up over time, and then it's impossible to extract a needle from the haystack.  Then you have things like email that abstract away the storage (either on the file system or a server) but only allow you to access the mail through the email program.  Ditto for addressbook contacts or calender events.  Until recently on Linux, there was no user accessible (non-command line) way to access IM chat logs at all.  Web pages are cached by your browser but essentially inaccessible to you.

Beagle solves these problems by making them all readily and easily accessible through a graphical search interface.  You don't need to navigate a folder hierarchy anymore.  You don't need to go one-by-one through a list of files in a directory trying to remember what you named that document.  Your emails and IM logs and RSS feeds and web history and addressbook contacts are right there alongside files.

Of course, that's the idealistic view.  Some people, like me, are just disorganized and a tool like this helps me.  Some people are highly organized, love folders, and desktop search might be completely superfluous to them.  That's fine, it's not for everybody.  In the future, however, we might see some really innovative applications built on top of desktop search that can benefit even these organized individuals, like the Dashboard project I mentioned earlier.

InfoQ: What are the futures for Beagle?

Joe Shaw: There is always more data to index, performance optimizations to make, etc.  That's the boring future. :)

Beyond that, we're looking at adding networked searches, so that you'll be able to run searches against several machines.  We'd like to use Zeroconf here with multicast DNS and service discovery to be able  to search machines on your local network without needing any configuration.  Another potential feature is automatically determining what language a document is by doing some statistical analysis on
it... we have patches floating around for that.

I'd like to see the platform on the Linux desktop expand to do del.icio.us-style tagging of any piece of data -- files, emails, web pages -- and make that data available for Beagle to index.  I'd like to see applications evolve so that they stop siloing their data and make it more available to other applications, including Beagle.  I'd like to see applications storing implicit relationships between data -- when I save an email attachment, store the relationship of that file on disk to the person who sent it to me -- and make that available to Beagle for indexing.  I'd like to see more apps use Beagle internally as their search mechanism.  None of these are necessarily changes to Beagle itself but how we can broadly improve the user experience.

Enjoy related news on Mono brought to you by InfoQ.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss
BT