Size of SARS-Cov2 virions

© 2021 Tom Röschinger. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license

This notebook is heavily inspired by this lesson by Justin Bois.


In this notebook we will take a look at images obtained with electron microscopy of SARS-Cov2 virions outside a cell. This images were sent to us by Elizabeth Fischer. We will try to identify the virion particles in the image and then obtain their diameter.

Image analysis is far from trivial, and often there are many different ways one can take to come to a conclusion, and rarely there is one optimal way. So here we explore two different ways of identifying virions, but there are many more ways to get equivalent or even better results.

Load images

Here we define the path to the location where the images are stored. Change this accordingly. We use skimage to read the image.

Here we can see that this image is simply a two dimensional array of integers. From this matrix of integers we are going to identify virions and determine their size.

We write a short function to plot images, since we will be doing this task quite a lot.

For image analysis, we cut out the scale bar. We will come back to the scale bar later.

Thresholding

First we are going to try to identify particles by thresholding, meaning we are telling the computer that a particle relates to a pixel with an intensity that is larger than a certain value. In order to do that, we are going to do some preparation steps.

Background subtraction

First, we try to get rid of any background, such that a pixel intensity corresponding to a virion is the same across the image. For that purpose, we transform the image to an array of floats of values between 0 and 1.

As we can see, we have not changed the image yet, but simply transformed the scale and type of the values in the array.

The background can be obtained by using a Gaussian filter with large radius. Using this filter, each pixel is replaced by an weighted average of its surrounding pixels, where the pixel are weighted by a 2D Gaussian distribution. Any details in the image get averaged out, and only large scale variations remain.

We cannot identify a single particle in the image anymore, however, we can see the edges of the cell in the background. Let's subtract the background from the image and take a look.

To simplify further analysis, we can look at the part of the image that contains most virions, excluding the part which looks like a pore and sets the lower bound for pixel intensities.

This gave us a better contrast of the virions to the cell. The next step is to identify the virion particles and separate them from the cell in the background. Let's zoom in so we can get a better look at some virions.

Filtering

There is a lot of "noise" going on in the background. We can smooth this out by using a Gaussian filter with small radius.

Choosing a threshold

Now that we have put a filter on the image, let's look at the distribution of pixel intensities in the image.

We can see that there are two modes in the distribution. The large peak at lower pixel intensities belongs to the background, while the second mode at higher intensities belongs to pixels of particles. We want to find the optimal threshold that keeps as many virion pixels as possible, while keeping the background pixels at a minimum. Here we are simply eyeballing a single value for the threshold, but there are may ways of computing an "optimal" threshold.

We did a good job at identifying particles. However, we also removed pixels that belonged to particles. We could spend more time trying to optimze thresholding, but we will proceed to another method of identifying particels called edge detection.

Edge detection

In this part we will try to identify particels by their edges. Edges are regions of high gradient in pixel intensity, where the intensity changes rapidly from background to the particle.

Filtering

Before we try to identify the edges of particles, we will have to filter the image again. Here we are using a ranking median filter. This filter takes all pixels in the area of the pixel, ranks them by their intensity, and then replaces the pixel value by the median of the ranked intensities. Therefore, we need to decide which pixels we are ranking, meaning we need to define a geometry. There are many geometries to choose from, but here we are choosing a disk. One also has to define the size of the object. Here we are going with a rather small object, since we want to separate virions that are close to each other.

Let's apply this filter to the image and see what we get.

Finding edges

While the background still is not that smooth, the particles stand out more compared to it, which makes it easier to detect them. To detect the edges of the particles, we are using a so called "canny" edge detection, which is a series of operations, including computing the gradient in pixel values and finding areas with high gradients. The result is a binary image, similar to an image obtained by thresholding, where the highlighted pixels are found edges.

There is still some gibberish around. However, we can try to fill all closed areas, followed by removing small objects (much smaller than the virions).

This looks really good! We got rid of most gibberish, while keeping the virions we are interested in. Also, they kept a nice round shape, as we could see by eye in the original image. Let's have a look at the entire image to see if there are regions where our edge detection failed.

Wow. With a couple of exceptions we were able to extract plenty of virions from the image.

Extract features

Now we need to label the particels so we can extract some features, such as size and eccentricity. To do this, we can use the skimage.measure.label function, which takes as input the image we just obtained. We also need to tell the function which value in the array belongs to the background, so it can correctly identify the virions.

We can use this labeled image to compute properties of each object. The function skimage.measure.regionprops does exactly that. It computes various properties for each object which we can extract and write into a dataframe.

Now it is time to look at the scale bar. We want to find the distance between each pixel to compute the diameter of the virions. A single interval of the scale bar is 500nm. Now we need to count how many pixels are separating each bar.

I think we are good with calling the interpixel distance 75 pixels. Let's compute the diameter. We also exclude some particles that are more ellipsiod shaped than shaped like a circle. Finally, we look at the distribution of obtained diameters.

Most virions seem to have a diameter around 100nm, which is close to the value described in the literature. There are some outliers, which could be overlapping virions or other objects which we were not able to remove in the filtering process. Let's look at the median of our obtained values.