BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Google Releases Search Engine Specifically For Code

Google Releases Search Engine Specifically For Code

Bookmarks
Google has released Google Code Search, a search engine explicitly for code. This comes only a few months after the release of Google Code Hosting. Google is crawling all the publicly available code they can find including archives (.tar.gz, .tar.bz2, .tar, and .zip), CVS repositories and Subversion repositories. Developers can also block Google from crawling their files if desired. An API is available to support the creation of plugins and enable the adding of search to existing websites. Among the searching options.
  • Use regular expressions to search more precisely
  • Restrict your search by language, license or filename
  • View the source file with links back to the entire package and the webpage where it came from
Google Code Search competes with other search engines for code such as Krugle and Koders. Nik Cubrilovic did a quick comparison of the three on TechCrunch.

To test Google Code Search out against both Krugle and Koders, I ran a search for “md5 in C”, hoping to find an implementation of the MD5 hash algorithm in C. In Google, I can specify the implementation language I would like in the search query, while in both Krugle and Koders I needed to select the language from a drop down. Krugle and Koders didn’t seem to filter the results based on language too well as they both had results that were implementations in other languages. One problem here is that the search engines don’t actually know you are looking for a simple implementation of md5, they are just string-matching against their indexes so you get some very poor results (such as functions that call an MD5 library). Across the 3 search engines, I could not find a good, pure MD5 implementation – just a lot of header files and functions that had the string ‘md5’ within them.

Developers have began to comment on the official forum for the service and other outlets about support for additional languages and missing of incorrectly identified licenses for various code entries. A number of concerns have already been raised. A simple search for @ symbol reveals hundreds of code snippets with developer email addresses that could possibly be used by spammers. Digg.com contains an entry where a developer has reported to have found the Winzip serial number algorithm. Among the novel uses of the new service, Joe Walker of DWR has commented that it can also serve as a web based repository viewer.

Rate this Article

Adoption
Style

BT