BT

Google Releases Search Engine Specifically For Code

| by Scott Delap Follow 0 Followers on Oct 05, 2006. Estimated reading time: 2 minutes |
Google has released Google Code Search, a search engine explicitly for code. This comes only a few months after the release of Google Code Hosting. Google is crawling all the publicly available code they can find including archives (.tar.gz, .tar.bz2, .tar, and .zip), CVS repositories and Subversion repositories. Developers can also block Google from crawling their files if desired. An API is available to support the creation of plugins and enable the adding of search to existing websites. Among the searching options.
  • Use regular expressions to search more precisely
  • Restrict your search by language, license or filename
  • View the source file with links back to the entire package and the webpage where it came from
Google Code Search competes with other search engines for code such as Krugle and Koders. Nik Cubrilovic did a quick comparison of the three on TechCrunch.

To test Google Code Search out against both Krugle and Koders, I ran a search for “md5 in C”, hoping to find an implementation of the MD5 hash algorithm in C. In Google, I can specify the implementation language I would like in the search query, while in both Krugle and Koders I needed to select the language from a drop down. Krugle and Koders didn’t seem to filter the results based on language too well as they both had results that were implementations in other languages. One problem here is that the search engines don’t actually know you are looking for a simple implementation of md5, they are just string-matching against their indexes so you get some very poor results (such as functions that call an MD5 library). Across the 3 search engines, I could not find a good, pure MD5 implementation – just a lot of header files and functions that had the string ‘md5’ within them.

Developers have began to comment on the official forum for the service and other outlets about support for additional languages and missing of incorrectly identified licenses for various code entries. A number of concerns have already been raised. A simple search for @ symbol reveals hundreds of code snippets with developer email addresses that could possibly be used by spammers. Digg.com contains an entry where a developer has reported to have found the Winzip serial number algorithm. Among the novel uses of the new service, Joe Walker of DWR has commented that it can also serve as a web based repository viewer.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT