Work in Progress: How to Find Gold with FLOSS
Software content: Nicolas Robidoux, John Cupitt and Chantal Racette (2009)
Non-software content: Nicolas Robidoux with contributions from John Cupitt, Stanly Steinberg, Janet Oliver, Bridget Wilson, Mary-Ann Raymond-Stintz, Nick Andrews, and other associates of the NIH/NIGMS Center for the Spatiotemporal Modeling of Cell Signaling (2009).
(Title inspired by Gepostet von Stani's "How to make money with free software.")
Note: The methods have seen major improvements since this presentation was put together.
Introduction and Experimental Context
This document discusses an ongoing research project which concerns the automatic and semi-automatic (human-assisted) identification of approximately spherical gold nanoparticles in Transmission Electron Microscope photographs of cell membranes. Nip2 is a key component of the project's workflow. A related project, involving the separate identification of two different sizes of gold nanoparticles, is presented following the discussion of the single particle type case.
Stanly Steinberg, co-director of the Center for the Spatiotemporal Modeling of Cell Signaling (STMC), a NIH/NIGMS Center of Excellence in Complex Biomedical Systems Research, brought to our attention the need for better software in this context. Following is an excerpt of Characterizing the Topography and Interactions of Membrane Receptors and Signaling Molecules from Spatial Patterns Obtained using Nanometer-scale Electron-dense Probes and Electron Microscopy by J. Zhang, K. Leiderman, J. Pfeiffer, B. Wilson, J. Oliver and S. Steinberg, Micron, Volume 37, Number 1, pages 14-34 (2006):
"The flow of information through a cell requires the constant remodeling of cell signaling networks. Thus, spatially- and temporally-resolved microscopy of signaling components is needed to understand the behavior of normal cells as well as to uncover abnormal behavior leading to human disease. Nanoprobe labeling and transmission electron microscopy of cytoplasmic face-up sheets of cell membrane has been developed as a high resolution approach to map the interactions of proteins and lipid during cell signaling. Membrane sheets are labeled with 3-15 nm electron-dense probes for receptors, signaling proteins and lipids and micrographs record the distributions of the probes relative to each other and to surface features."
Hence the need for identifying and finding the locations of gold nanoparticles, the "3-15 nm electron dense probes," which are attached to molecules of interest to locate them.
Because the medical researchers are primarily interested in the distribution of the molecules of interest on the cell membrane (for example, they use statistical analysis to determine whether they are uniformly distributed or appear in clusters), large swaths of cell membrane are photographed at once at a somewhat low resolution. This implies that the smallest nanoprobes are just a few pixel wide in the images, a scale at which noise is significant.
Automatic Detection of a Single Type of Visually Small Nanoparticles in Transmission Electron Microscope Photographs of Cell Membranes
Sample Test Image With Only the Smallest Type of Nanoparticles (5 nm Nominal)
3330_cropped.tif is a typical photograph of a cell membrane with 5nm (nominal size: there is significant variation in actual size) nanoparticles, which are visible as dark, approximately circular, disks with a diameter between 6 and 14 pixels. (An uncropped version of the test image, 3330.tif, with some typed data about the image at the bottom, is the actual input of the Nip2 workspace.)
According to the gold standard prepared by Nick Andrews of the University of New Mexico, this test image contains 235 gold disks. (Note that the gold standard is not set in stone: Different experts have identified slightly different sets of locations as being nanoparticles. They all agree to within a handful of nanoparticles. In particular, the majority of experts agree that nanoparticles that touch are most likely nanoparticles that stuck to each other prior to the coating process, and consequently that such contiguous pairs should be counted as one.)
(Warning: If your image viewer is strongly smoothing, uses nearest neighbour resizing, or fits to window size on a small screen, you may have trouble seeing the nanoparticles. Use another viewer (Nip2 or GIMP, for example) and/or a different magnification.)
Sample Final Result (5 nm (Nominal Size) Nanoparticles)
Note: The following results are outdated. For current results, see the "Automatic Detection and Classification of Two Types of Nanoparticles" section below.
3330_result.tif is the result of the particle identifying filtering and thresholding, with nanoparticles identified by red circles.
There is no false negative, and only two false positives. Notwithstanding the fact that the software was specifically tuned for this particular image, it performs well, since the error rate is about the same as that of careful experts. On the down side, the two false positives are fairly easy to identify as such by most people; in other words, the "machine" fails in "obvious" ways.
One particular pair of nanoparticles is very tricky to properly identify due to them touching each other. If the workspace is allowed to only detect one of the two contiguous nanoparticles, it is easy to tune parameters so that there is only one false negative (one of the two "siamese twins") and no false negative. For this reason, one probably should leave the identification of "touching nanoparticles" to a human operator (this is relatively easy because contiguous pairs of nanoparticles are "big and dark") or to a filter specifically designed for them.
BREAKING NEWS: For physical reasons, nanoparticles that "touch" actually must be fused together, and consequently should be counted as one. Given this, it is easy to set parameters so that there are no false negatives and no false positives. (Details: The gold nanoparticles are "functionalized:" they are coated with a biological substance that is about 5nm thick. Consequently, they must be at least 10nm or so away from each other, so that nanoparticles that are closer than this must actually have fused together prior to or during the coating process, which implies that they should be counted as one. The only other way that nanoparticles could appear closer than 10nm in the image without being stuck to each other is if they sit "on top of each other" in a membrane fold. This is very unlikely.) This property of the nanoparticles makes our life considerably easier.
One open question is whether the software can be made to automatically adjust to varying picture taking conditions (in particular, variable microscope gain).
Visual Properties of the Smaller Type of Nanoparticles
In summary, the nominal size 5nm nanoparticles appear as dark, blurry and noisy "gaussian" disks with a diameter in the 7-13 pixel range. Specifically, the smaller type of nanoparticles
- have darker pixels than their surrounding over an area with diameter at least 7 pixels (scale at which there is quite a bit of noise) and no more than 13 pixels (note that because of the noise, not all pixels within the diameter are darker than a disk's surroundings);
- become lighter as one goes from their "center" to their "edge" (with an approximately gaussian profile);
- are among the darkest objects in the image;
- are approximately circular.
In addition, although nanoparticles are often close of each other, they rarely touch. (This holds for both all types of nanoparticles.)
The small size of many of the nanoparticles, combined their being noisy and blurry, contributes to them being difficult to detect. The factor of two variation in visible size (within the same type) does not help matters.
Unfortunately, other objects ("cell domains") in the images are also dark. Such "non-nanoparticle" dark objects are either longer or larger. In addition, they often are located near very bright (near white) features: cell domains that visually look like "holes" or "folds" often display a quick alternation between dark and light "edges" (somewhat reminiscent, although unrelated, to the "haloing" effect often seen when images are enlarged with Lanczos or Catmull-Rom). The larger type of nanoparticles sometime have this feature (near white "halo"), but not the smaller type.
To some extent, this "bright haloing" can be mitigated by changing the white point of the image. This does not appear to be necessary or even useful. For this reason, white point (and black point) manipulation is not performed in the current versions of the Nip2 workspaces.
Description of the One-Type Particle Detection Code
The Nip2 workspace small_gold_disks.ws, programmed by N. Robidoux, J. Cupitt and C. Racette, performs the small disk identification. (Warning: Because Nip2 stores workspace information in xml format, some browsers may complain when you download this file. Your browser may also try to execute it; this will fail since it can't understand Nip2 xml.) This workspace is very fast: The 3330_result.tif image takes about 2-3 seconds to produce on a Core2 2GHz laptop with 2GB RAM running Ubuntu 9.04 in 32 bit mode.
Combination of Two Tests With No False Negatives
Given the noisiness and small size of the nanoparticles, it was decided to design tests from the ground up instead of relying on standard blob detectors. The reason for this is that second derivatives are particularly sensitive to noise at small length scales.
Nanoparticles are identified by "and"ing the result of two separate no false negative tests. A "no false negative" test is a test which correctly tags at least one pixel location per nanoparticle as belonging to one. In earlier versions of the Nip2 workspace, a third test was used.
Main Test: Comparison of the Average Near the Center and the Average Over Octant Rings
This is the main (and only truly novel?) test used in this work. The motivation for this test is as follows.
Roughly speaking, the "center" of a nanoparticle can be described as being a local minimum of the brightness (a "darkness maximum") when compared to values located roughly a radius away. This characterization cannot be used quite as stated given the amount of noise present at the relevant scale.
Given that the smaller visible nanoparticles are the hardest to detect, the parameters are essentially tuned for them. Averaging must be used sparingly because otherwise the smallest visible nanoparticles are "washed out" or "blended together."
The approach relies on comparing the average of the values near the center (using a very tight Gaussian) with the values averaged over eight "octant rings."
The "octant ring" masks are used to answer the following question: "Are the values near the center darker than the averages, near the typical radius of the smallest visible nanoparticles, in every one of eight directions (pi/16, 3pi/16, 5pi/16, 7pi/16, 9pi/16, 11pi/16, 13pi/16 and 15pi/16)?"
The octant ring masks are constructed as follows: Construct a "ring" mask by rotating a Gaussian curve with a peak at a distance equal to roughly the radius of the smallest nanoparticles (about 3 pixels, corresponding to a diameter of about 6) and radius set so that the effective extent of the mask just about covers the largest nanoparticles (that is, so that the effective diameter of the ring mask is about 13). Now, split this ring mask into eight similar pieces, each obtained by intersecting the ring with a mask corresponding to a standard octant, with values at the lines which correspond to multiples of 45 degrees set to half of those in the interior of the octants. The resulting eight masks respect the symmetries of a square grid---that is to say that applying such a symmetry to one of the masks returns the same or another mask.
The result of this filtering is seen in 3330_octant_ring_no_thresholding.tif. In this image, nanoparticles appear as small "60s daisies" with eight white "petals."
Possible Improvements of the Octant Ring Test
It would be nice to know how to adjust the geometric parameters of the convolutions based on estimates of the parameters of the gaussian profile of typical nanoparticles in the image, as well as characteristics of the nois.
One other issue which needs to be explored is whether convolutions based on constant (characteristic function) masks should be used instead of convolutions based on gaussians. This may be a better approach, for smaller nanoparticles especially. (The new workspaces, used to detect and classify two different types of nanoparticles, do just that; see below.)
Finally, we have found that non-overlapping octant rings are more discriminatory. To restore reflexion symmetry, we intend to use 16 octant-rings (the first set being obtained from the first by reflexion).
Another Future Direction: Using the Octant Ring Test to Identify Contiguous Pairs of Nanoparticles
Contiguous pairs of nanoparticles, which should be counted as one nanoparticle since such pairs almost certainly nanoparticles that "stuck" to each other in the manufacturing process (instead of separate nanoparticles which "just happen" to have attached to nearby proteins), are tricky to detect with the "plain" octant ring test. It would seem that comparing the darkness around the center with the darkness of the second or third darkest octant ring (instead of the very darkest one), should allow one to specifically detect contiguous pairs. The result would then be used to create a mask which ensures that such pairs of nanoparticles are only counted once. (The new workspaces, used to detect and classify two kinds of nanoparticles, do just that; see below. A current version of a workspace using this technique is disks.ws, programmed by N. Robidoux, J. Cupitt and C. Racette.)
Second Test: Is the Average Near the Center Dark Enough?
The above "ring octant" test is powerful enough on its own for some images. For some others, it is complemented by the following second test: The average near the center is computed using gaussian blur, and a pixel location is "passed" if this average is dark enough.
Again, it would be nice to be able to ballpark good values of the gaussian blur (or constant disk) convolution parameters based on the typical geometric properties of the nanoparticles, their typical diameter and typical minimal distance between them, for example.
Issues Related to Criterion Alignment
If several of the criteria identify nanoparticles with very few pixels, there is a risk that none of the criteria ring "true" at the exact same pixel location---even though they ring true near the same candidate nanoparticle "center"---which leads to a false negative. This can be fixed by "dilating" the islands which identify positives for all but, say, one of the criteria. This trick is not needed for the 3330.tif test image, for which the two criteria align themselves well enough. However, it appears that it is makes the process more robust with respect to "quick and dirty" threshold adjustment is used. Given that the "octant ring" test uses a very tight gaussian, hence is less smooth than the "center is dark enough" test, this is the one to which 8-connected Erosion (which Dilates white islands) was applied.
False Positives Are Easier to Fix than False Negatives
Note that a conjunction ("this pixel location is deemed to be part of a nanoparticle if it passes the first test AND the second test AND ... the last test") of no false negative tests is also a test with no false negatives provided one ensures that the criteria are sufficiently well aligned, that is, that they simultaneously give a "pass" at at least one pixel location per nanoparticle. If alignment is not taken into account, one test could give a pass at one "true positive" pixel location, and another test could give a pass at a nearby, but not identical, pixel location, with none of them being in agreement at exactly the same location, leading to an avoidable false negative.
Likewise, a disjunction of no false positive tests is also a no false positive test. The goal of this project is to produce a robust nanoparticle detector with no false positive and no false negative. Failing that, we are willing to let a few false positives slip by.
The justification for the "no false negative" policy is that it is much easier for a human operator to identify false positives (which the program gives coordinates for) than hunt for false negatives (which may be anywhere and which, by virtue of being false negatives, are generally hard to see).
Adjustment of the Black and White Points
In order to minimize the impact of the "white ridges" near cell domains which look like "folds," the white point was changed in the earlier versions of the workspaces. The black point was changed as well, to minimize the relative variation in darkness between the smallest visible disks and the larger ones, and between them and the darkest non-disk features. We do not know yet whether this should be brought back into the code to make it better, given that it does not seem to have as much of an impact as initially thought.
Key Issue To Be Addressed
Individual TEM images, even those involving the same nominal type of nanoparticles, are quite different in terms of the visual appearance (esp. size, but also darkness) of the nanoparticles. At this point, it is not clear how the workspace(s) should take such large variations into account.
One possible approach, which assumes that the software has been adjusted for the "visual" size of the particles: Have the user visually adjust the "center is dark enough" test threshold in such a way that all the nanoparticles "pass" yet so that it is more or less as low as possible. This is not too difficult because this secondary criterion is fairly "loose" (so that this can be done somewhat sloppily) and because this can be done at interactive speed. Then, rank the detected nanoparticles in decreasing order of "score" with respect to the "octant ring" test, and ask the user to "cut off" the list at a number which they feel includes all the actual nanoparticles. Given that the software which labels detected nanoparticles runs a near interactive speed (about 2s runtime on a very ordinary laptop), this should be reasonably fast, even if the user has to give it a few tries. Using different colours for the nanoparticles which are "near" the cut off (above and below) should help.
It appears that if contiguous nanoparticles are ignored, the key thresholds are very easy to set.
Automatic Detection and Classification of Two Types of Nanoparticles
Test Image With Two Types of Nanoparticles
5-14413.tif is a typical photograph of a cell membrane with both 5nm and 10nm (nominal sizes) nanoparticles. Note that the 10nm nanoparticles appear much larger than the 5nm ones. Almost invariably, they are also more uniformly dark: a gaussian is probably not as good an approximation for the 10nm nanoparticles as it is for the 5nm nanoparticles.
Note: This particular test image is probably not the best choice, because all the nanoparticles are very dark compared to their surroundings. Further tests need to involve situations in which some nanoparticles are located within a darker region (like those found the above single particle type test picture).
General Line of Attack
The overall plan is as follows: Construct a no false negative test for the small type of nanoparticles, and also construct a no false negative test for the large type which is such that no small nanoparticle is mislabeled as being a large one. Using the result of the second test to eliminate false positives of the first test that are actually large nanoparticles, one can obtain a test with no false large negative, and no false small negative, with the two types nicely separated.
Here is an early result, with the small nanoparticles labeled with a red 3x3 cross, and the large ones with a green 3x3 cross: 5-14413_result.png. There is no false positive or false negative. This result is obtained with a very recent workspace which does not quite follow the description given above: bigNsmallDisks.ws.