Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Code Search Now Available to Browse Google's Open-Source Projects

Code Search Now Available to Browse Google's Open-Source Projects

This item in japanese

Code Search is used by Google developers to search through Google's huge internal codebase. Now, Google has made it accessible to everyone to explore and better understand Google's open source projects, including TensorFlow, Go, Angular, and many others.

CodeSearch aims to make it easier for developers to move through a codebase, find functions and variables using a powerful search language, readily locate where those are used, and so on.

Code Search provides a sophisticated UI that supports suggest-as-you-type help that includes information about the type of an object, the path of the file, and the repository to which it belongs. This kind of behaviour is supported through code-savvy textual searches that use a custom search language. For example, to search for a function foo in a Go file, you can use lang:go:function:foo.

For repositories that include cross-reference information, Code Search is also able to display richer information, including a list of places from where a given symbol is referenced. Code Search repositories that provide cross-reference information include Angular, Bazel, Go, etc.

Cross-reference searches are powered by Kythe, another Google open-source project that aims to provide a standard, language-agnostic interchange mechanism. This can be used to share information across different development tools, such as editors, compilers, code-review tools, and so on. As a first step, Code Search uses Kythe to create a graph from compiled code.

Google then runs an internal pipeline that combines these graphs for the different languages, prunes unnecessary pieces, and optimizes it for serving cross-references. The whole process runs several times per day to keep the data fresh.

As a Google engineer explained on Hacker News, Code Search does not give access to the real repositories used at Google. It just exposes indexed versions of those repositories to make their content available through search. Additionally, the public Code Search interface does not include all features provided to Google engineers, including automatic code analysis and linting, code coverage, fuzzing integration, and so on. As another commenter of Hacker News pointed out, some of of those features rely on Bazel, and so they cannot be easily exported.

Developers and organizations willing to replicate this kind of infrastructure for their own repositories could look into Kythe alongside with TreeTide underhood, an open source project that provides an advanced code browsing experience on top of Kythe.

A commercially available alternative to Kythe+TreeTide underhood is SourceGraph, which is used at many large companies, including Uber, CloudFlare, and others.

Rate this Article