InfoQ

News

.NET Spotlight on Open Source: Beagle

Posted by James Vastbinder on Aug 12, 2007 09:45 PM

Community
.NET
Topics
Open Source
Tags
Mono ,
Beagle
In this .NET Spotlight on Open Source, Infoq interviewed Joe Shaw and Pierre Ostlund on Beagle.  Beagle is one of the most famous Mono applications on Linux and provides desktop search to that community.


InfoQ:
 What prompted the creation of Beagle?

Joe Shaw: Beagle really grew out of a project called Dashboard (http://nat.org/dashboard) that I worked on with Nat Friedman, Alex Graveley, Jim Krehl and others primarily in the summer of 2003.  The idea behind Dashboard (which predated Apple's own very different Dashboard) was that the computer knows what you're doing at any given  time -- reading emails, IMing with a friend, working on a document -- so it should be able to show information relevant to what you're doing right then.  A lot of the work we were doing was people-based: if I was IMing with Nat, it would show me Nat's latest blog entries, or the last few emails I received from him.  It'd show me his email address and IM nickname and phone number.  And if he typed someone else's phone number in the window, it would look it up and provide me info there too.

In doing Dashboard we found two real deficiencies of our platform: (1) data (and metadata) was difficult to access for a variety of reasons and (2) we were losing vast amounts of metadata which established relationships between various pieces of data.

Beagle was largely created to address problem number 1.  It would index textual content and metadata so that it could be efficiently  searched and applications only had to go to a single location to find it and quickly retrieve it.

InfoQ: How is development of Beagle funded?

Joe Shaw: From the beginning Novell has funded at least one person full-time to developing it.  Initially it was Jon Trowbridge.  A little bit after Dave Camp and I were part-time contributors to it.  After Dave went to work on Hula I became a full time contributor on it with Jon, and when Jon left last January it was myself alone.  For a period there Dan Winship was also working on it part-time with me.

In addition to that, Google has indirectly funded development the this summer and last summer through its Summer of Code program.

But of course, Beagle's strengths are in that it's an open source project.  Large amounts of effort have been provided by individual contributors, and Beagle would not be possible without them.  A little over a year ago I listed all of the contributors up to that point, and it numbered over one hundred.

InfoQ: What is the current status of Beagle?

Joe Shaw: At this point Beagle is by far the most featureful, usable desktop search system on Linux today.  We support over 20 data sources (file system, email, IM logs, etc.) and over 60 data formats (MS Office, ODF, PDF, MP3, etc.) which I think is the most of any desktop search system on any operating system.

We're shipped on most Linux distributions and some of them integrate Beagle pretty deeply in the desktop experience.

As for the project itself, we're working toward a 0.3.0 release -- a major upgrade from our 0.2.x series -- which will feature faster indexing, more complete indexing of archive contents, better support for externally stored metadata like tags and annotations, etc. 

InfoQ: What is it like competing with Google Desktop and MS Desktop Search?

Joe Shaw: Well, MS Desktop Search doesn't run on Linux and Beagle doesn't run on Windows (yet), so I don't even see them as competitors.

Google Desktop just came out for Linux and although it indexes Gmail (and we don't... yet) it lacks the wide coverage Beagle has.  It doesn't index IM conversations or integrate well with mail clients other than Thunderbird.  It taxes the system while it indexes and has no integration with existing desktop applications.  Not being open source, that fundamentally limits its ability to be extended to support new and existing data types and means that it'll never achieve tight integration in the Linux desktop.  Beagle's permissive open source license is a strength in this area.

GDL has some nice features: it seems to do some sort of version control and storing of cached data; it handles plain mailbox files on disk nicer; and it supports indexing of Gmail, but none of these are radical features that Beagle can't implement.

InfoQ: Who is the target user of Beagle?

Joe Shaw: Beagle targets both users and developers.  For developers, we provide some really nice APIs for extending the types of data Beagle can index and then searching those indexes.  This means that developers can integrate index and search into their applications, or build entirely new user interfaces around search.

For users, the goal is simply to make it easier to find your data.  The file system is a fairly arcane metaphor that users have to deal with, and in many cases people simply ignore it.  They just dump all their files into their Documents folder.  I do this to an extent myself; everything I download goes into a special folder, things pile up over time, and then it's impossible to extract a needle from the haystack.  Then you have things like email that abstract away the storage (either on the file system or a server) but only allow you to access the mail through the email program.  Ditto for addressbook contacts or calender events.  Until recently on Linux, there was no user accessible (non-command line) way to access IM chat logs at all.  Web pages are cached by your browser but essentially inaccessible to you.

Beagle solves these problems by making them all readily and easily accessible through a graphical search interface.  You don't need to navigate a folder hierarchy anymore.  You don't need to go one-by-one through a list of files in a directory trying to remember what you named that document.  Your emails and IM logs and RSS feeds and web history and addressbook contacts are right there alongside files.

Of course, that's the idealistic view.  Some people, like me, are just disorganized and a tool like this helps me.  Some people are highly organized, love folders, and desktop search might be completely superfluous to them.  That's fine, it's not for everybody.  In the future, however, we might see some really innovative applications built on top of desktop search that can benefit even these organized individuals, like the Dashboard project I mentioned earlier.

InfoQ: What are the futures for Beagle?

Joe Shaw: There is always more data to index, performance optimizations to make, etc.  That's the boring future. :)

Beyond that, we're looking at adding networked searches, so that you'll be able to run searches against several machines.  We'd like to use Zeroconf here with multicast DNS and service discovery to be able  to search machines on your local network without needing any configuration.  Another potential feature is automatically determining what language a document is by doing some statistical analysis on
it... we have patches floating around for that.

I'd like to see the platform on the Linux desktop expand to do del.icio.us-style tagging of any piece of data -- files, emails, web pages -- and make that data available for Beagle to index.  I'd like to see applications evolve so that they stop siloing their data and make it more available to other applications, including Beagle.  I'd like to see applications storing implicit relationships between data -- when I save an email attachment, store the relationship of that file on disk to the person who sent it to me -- and make that available to Beagle for indexing.  I'd like to see more apps use Beagle internally as their search mechanism.  None of these are necessarily changes to Beagle itself but how we can broadly improve the user experience.

Enjoy related news on Mono brought to you by InfoQ.

No comments

Reply

Exclusive Content

Book Except and Interview : Aptana RadRails, An IDE for Rails Development

Aptana RadRails: An IDE for Rails Development by Javier Ramírez discusses the latest Aptana RadRails IDE, a development environment for creating Ruby on Rails applications.

Fast Bytecodes for Funny Languages

Cliff Click discusses how to optimize generated bytecode for running on the JVM. Click analyzes and reports on several JVM languages and shows several places where they could increase performance.

Scott Ambler On Agile’s Present and Future

Scott Ambler, Practice Lead for Agile Development at IBM, speaks on the current status of the Agile community and practices having a look at the perspective of the Agile’s future.

Manager's Introduction to Test-Driven Development

Dave Nicolette and Karl Scotland try to introduce non-technical managers to one of the most popular Agile development techniques: Test-Driven Development (TDD).

Structured Event Streaming with Smooks

Smooks is best known for its transformation capabilities, but in this article Tom Fennelly describes how you can also use it for structured event streaming.

How to Work With Business Leaders to Manage Architectural Change

Successful architectures evolve over time to meet changing business requirements. Luke Hohmann presents how to collaborate with key members of your business to manage architectural changes.

Colors and the UI

In this article, Dr. Tobias Komischke explains how colors used in a GUI can influence our interaction with a computer and offers advice on using the appropriate colors for the interface.

Building your next service with the Atom Publishing Protocol

In his presentation, recorded at QCon San Francisco, MuleSource architect Dan Diephouse explores ways to use the Atom Publishing Protocol (AtomPub) when building services in a RESTful way.