Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Convolutional Neural Network Deep Learning Techniques for Crowd Counting

Convolutional Neural Network Deep Learning Techniques for Crowd Counting

Deep learning techniques like Convolutional Neural Networks (CNNs) are a better choice for crowd-counting use cases, compared to traditional detection or regression based models. Ganes Kesari, co-founder and head of analytics at Gramener, spoke last week at AnacondaCon 2019 Conference, on how to count objects using artificial intelligence (AI) models.

CNN algorithms use density-based estimations to preserve spatial information and can localize the count, while also estimating the overall tally. They also capture both global and local features with accuracy. Some of the factors that influence the counting of crowds and lines include occlusion, density difference, perspective distortion, and camera angle.

Kesari discussed three business case studies: in-store marketing conversions, counting Antarctic penguins from camera images, and biological cells in microscopic images. The penguin tracking project involved about 100 cameras setup in 16 different locations, and hourly images were taken over years. Crowd-sourced annotations were used to identify the penguins captured in the images.

InfoQ spoke with Kesari about his conference talk and how deep learning can be used in crowd-counting use cases.

InfoQ: Can you define what Deep Learning is and how it's different from traditional Machine Learning techniques?

Ganes Kesari: The basic premise of machine learning (ML) is to teach programs on learning to solve a problem, rather than just executing a predetermined logic. Equipped with this learning, the programs can then get to a desired outcome, when presented with any unseen input. For example, predict tomorrow’s stock price when shown the past six month’s history. Or, classify whether a customer would churn the next month.

While Deep learning (DL) falls under the umbrella of Machine learning, there is a key difference. Let us say we are building a system that detects human faces. With traditional ML, we would have to identify and extract the key features (eyes, nose, chin..) and let the software learn and match just these features across faces.

But with DL, we show hundreds of face pictures labelled with just a person’s name, and let the machine figure out what is really unique (curvature of the cheek, or something more subtle which humans may not even notice). The machine decides which attributes are significant enough (called feature detection) to distinguish faces, and it does this through hundreds or thousands of iterations through the data (called epochs).

Deep learning has taken the machine learning world by storm over the past decade and is particularly effective with pictures, video and audio. It is used with structured data as well and powers most of the stuff around us – such as facebook friend recommendations and article recommendations on most newsfeeds.

InfoQ: Can you talk about crowd-counting use cases and what algorithms and technologies you used?

Kesari:  Crowd counting can be applied in a variety of scenarios to count people, animals, objects or other entities. Here are the three use cases I presented:
-    Counting people: This has great potential in the retail industry to get reliable counts of store footfalls, estimate conversions and measure success of campaigns.
-    Counting penguins: Researchers in Antarctica are studying impact of global warming and human interventions on Penguin population. Millions of images from camera traps are finally being counted, thanks to these algorithms.
-    Counting biological cells: Drug characterization is a key step in the pharma drug discovery process. Scientists need counts of different cell types from microscopic images. This painful, manual process is being automated by the algorithms.

There are several published approaches to crowd counting. Traditional methods have used detection-based approaches, for example, this scans the image to identify people or their heads. By drawing bounding boxes around all such matches, the total count is estimated.

This suffers from several disadvantages – a) occlusion or people at the back getting hidden from view, b) perspective distortion or faces in the front appearing bigger than those at the end, c) density differences within the same picture where some portions have unusually high clusters of people, d) camera angles where top-down pictures need separate training than frontal angles.

Density-based counting approaches can handle most of these challenges, by approximating the number of people in clusters of different sizes. Published papers have proven that they are far superior. There are several architectures that are being experimented – such as cascaded CNNs, muti-column CNNs.

InfoQ: How can Cascaded Convolutional Neural Networks help to solve crowd-counting type of problems?

Kesari:  We used Cascaded CNNs to solve problems faced with the three use cases I mentioned above. The key highlight of this architecture are its two-stages: Stage 1, called ‘high-level prior’, does an initial estimation of the counts and classifies the image into a broad bucket. Stage 2 does the density estimation and also takes inputs from the earlier stage to generate highly refined density maps. (image below)

These density maps help estimate the count of images. These cascaded CNNs are fairly adaptable to be trained on people, animals or even entities with custom-defined shapes. More technical details on the architecture can be found from this published paper.

InfoQ: Can you discuss more about the penguin tracking case study in terms of the technical solution details? What challenges did you face and how did you solve them?

Kesari:  Sure. The objective here was to count the number of penguins from time-lapse pictures taken by the camera traps setup in Antarctica. An initiative of Oxford university, the Penguin watch website crowdsourced the task of labelling the data. Volunteers from around the world placed markers on thousands of pictures to help the models learn how penguins look. Now, the challenge was to build models that can identify penguins from pictures of different exposures, some with high density of penguins, and others where they are hardly visible.

Gramener partnered with Microsoft AI for Earth to solve this challenge. We used Pytorch to build the cascaded CNN architecture. With Adam optimizer, we set a very low learning rate and momentum. With cross-entropy loss function, the error metrices were Mean Absolute Error (MAE) and Mean Squared Error (MSE). Running on NC6 v3 machines with V100 GPU card, the training took 3-4 days for 200 epochs. The final model delivered a MAE of ~10, which was then hosted as an API on the MS AI for Earth infrastructure.

InfoQ: You also discussed counting the biological cells case study in your presentation. How is this different from other AI models?

Kesari:  We worked on the biological cells counting problem for a pharma client. While the counting part of the application used a similar model architecture like that of the penguins, this had several differences from the other use cases. To start with, the data volume was in hundreds, unlike the ten thousand or hundred thousand labelled data available for people or penguins. Most of the labelling had to be done from scratch to make it machine-understandable.

We were counting biological cells that don’t have perfect shapes. Anything that looks close to a circle is considered to be of the same cell type. While this can be subjective for humans, it is all the more challenging for machines. And, the microscopic image had extraneous portions that weren’t to be counted, so we had to do outlier detection to identify and exclude those areas.

To make this actionable, the model results had to be reviewed and potentially corrected by humans. A big challenge was to find the x,y coordinates of every shape counted from the density maps. We used a contour detection approach to find the cells counted by the model, and then presented them on a UI for further manual inspection. We had to package all of this into a business workflow to ensure seamless integration and adoption by the users.

InfoQ: How can our readers learn more about your projects and try it in their development environments?

Kesari:  Most of our work with Microsoft AI for Earth is published as open APIs and notebooks, to let NGOs and researchers directly reuse the models. The work on penguin counting is not productionized yet, but there are other deep learning work like the one on classification of species.

We have recently made our data science platform, Gramex, open source. This visual analytics platform has libraries of data handlers, computational modules and visualization charts to let anyone build highly interactive data applications.

For more details on the AnacondaCON 2019 Conference held in Austin, Texas, last week, check out their website and conference agenda.

Rate this Article