Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Google Research Use of Concept Vectors for Image Search

Google Research Use of Concept Vectors for Image Search

This item in japanese

Google recently released research about a tool called Similar Medical Images Like Yours (SMILY) that uses concept vectors to enhance searching for medical images. The research uses embeddings for image-based search and allows users to influence the search through the interactive refinement of concepts.

Google released two peer-reviewed academic papers in succession. The first paper, "Similar image search for histopathology: SMILY" focused on the deep neural network architecture that was used to create the embeddings necessary to find similar images. The second paper, "Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making," focused on human interaction aspects necessary to improve the usage of the tool created in the first paper. It describes novel usage of directional vectors as concepts within the dimensions of the embeddings. The user could tune the amount a concept should be prevalent in an image, which in turn influenced the selection of similar images by augmenting the location in the embeddings.

The deep neural network used to create the embeddings is an architecture known as deep ranking network. The network consists of three parallel neural networks that receive three separate inputs. The first neural network gets an image being searched on, the second neural network gets an image from the same class as the first, and the third neural network gets an image that is in a different class. All three networks create embeddings and are trained to create a shorter distance between the embeddings of the images in the same class as compared to the distance with the third image from another class. This neural network that Google created allowed for generating embeddings with 128 dimensions for 300 × 300 pixels images. Google communicated the following about creating the network:

Our network was trained on about 500,000,000 "natural images" (e.g., dogs, cats, trees, man-made objects etc) from 18,000 distinct classes. In this way, the network learned to distinguish similar images from dissimilar ones by computing and comparing the embeddings of input images.

In the first paper, "Similar image search for histopathology: SMILY," Google showed that a user could select a segment of an image, create the embeddings for that section, and then use k-nearest neighbors algorithm to retrieve similar images from the embedding space. However, they identified that as users were searching for similar images, there was no way for the user to communicate the intent of the search. The inability of the user to convey the meaning limited the tools engagement. Therefore, research continued in the second paper to improve the interactive search.

In the second paper "Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making," Google improved the SMILY tool by incorporating a feature called refine-by-concept. This feature used the ability for directions within the embedding dimensions to represent concepts. The directions were identified by selecting a sample of images and having a specialist label the images with a concept or an opposite concept and then used a linear classifier to identify a plane in the embedding space separating the concepts. Next, the orthogonal vector to the plane was computed, and that is the direction of the concept. Users could then offset the search by influencing the prevalence of a concept which in turn moved the embedding generated by the image selected, and in turn the k-nearest neighbor selection of similar images.

Rate this Article