METHOD AND APPARATUS FOR USE IN THE IMAGE ANALYSIS OF BIOLOGICAL SPECIMENS
The present invention relates to digital image analysis, and more particularly to a method, and associated apparatus, for use in the analysis of digital images of biological specimens in which the boundaries of features or objects within the image are delineated.
In the field of biological specimen testing, numerous assay tests are known, and under continual development, the results of which rely on the inspection or analysis of a particular part of the biological tissue. For example, a specimen may be tested with one or more markers, such as antibodies, which attach to a certain part of a diseased or affected tissue, for example, a protein in the membrane, cytoplasm or nucleus of affected cells (or a combination thereof). A detection technique may be applied to a specimen under test, whereby a "stain" chemically attaches itself to the marker (e.g. chemically reacts with the marker to produce a coloured reaction product), to visually reveal affected areas of the specimen. In this way, the affected areas of the specimen can be seen by the human eye under a microscope. Conventionally, a pathologist or other expert is able to review slides of tested biological specimens under the microscope to manually assess the degree of staining, and thus the extent to which the cells are affected by the disease or condition under test. One such example is the test for the Epidermal Growth Factor Receptor (EGFR), which resides in the cell membrane and can be indicative of increased cell division and thus malignancy, leading to cancer.
More recently software has been developed to analyse digital images of stained, biological specimens under test, in order to quantify the level or degree of staining of the relevant areas of the tissue. The processing involves the "segmentation" of areas of the image of the specimen coloured by the stain (thereby excluding areas of the specimen image not coloured by the stain), and analysing the
intensity of the defined colour in the particular areas of the specimen that are relevant to the test (e.g. membrane, cytoplasm, nucleus).
However, such techniques require an expert user to manually define intensity thresholds, which identify the boundaries of the relevant, stained areas of the specimen. The selection of this threshold may significantly affect the results of the image analysis. This is because when a biological specimen under test is stained, the stain may not be taken up uniformly by the marker. In addition, optical effects and/or the nature of the specimen may lead to the appearance of the stain in parts of the cell to which the marker is not attached. Thus, for example, if a test is concerned with analysing the intensity of a stain attached to cell membranes (such as the aforementioned EGFR test), if the threshold is too low, the areas of the specimen having intensity values above the threshold may include parts of the cytoplasm and adjacent cells, which are not relevant to the test but may nevertheless appear lightly stained in the image. Conversely, if the threshold is too high, it may be that not all of the cell membranes will be analysed, because some membranes may not have been properly stained.
Whilst the use of computerised image analysis in such assay tests has led to advances in the field, since the determination of the intensity threshold is performed manually, it involves subjective assessment, which can lead to errors, or inconsistencies between results from different specimens.
Accordingly, it would be desirable to provide a fully automated technique of image analysis for such tests, which is entirely objective, and thus leads to more consistent test results.
The present invention provides aspects and embodiments as provided in the claims.
The present invention provides a method for automatically determining the boundaries of anatomical parts of cells within a stained biological specimen image. The boundaries may represent the boundary of cells (i.e. enclosing cell membranes) or the boundaries of the cell nuclei with the cytoplasm (i.e. enclosing cell nuclei), depending on the type of image analysis. Determination of the boundaries thus delineates the cells (or cell nuclei) within the image. This delineation may then be used to determine the relevant areas of the image to be considered in the image analysis. In one embodiment, a mask is generated based on the determined boundaries.
The mask defines the relevant areas of the image, which are to be considered in the image analysis. Such image analysis may determine an intensity value, representing the degree or extent of staining of the specimen. Further features and advantages of the present invention will be apparent from the following description and accompanying claims.
Embodiments of the present invention will now be described, with reference to the accompanying the drawings, in which: Figure 1 is a flow diagram showing a method for use in the analysis of an image of a stained, biological specimen under test, according to an embodiment of the present invention; Figure 2 is an exemplary, stained biological specimen image; Figures 3 a, 3b and 3 c each show skeleton masks, determined using different intensity thresholds for the stain of the specimen image of Figure 2, for use in analysis thereof, and Figure 4 illustrates the specimen image of Figure 2, overlaid with a mask having the best selected threshold, as determined in accordance with a preferred ■ embodiment of the present invention.
Figure 1 is a flow diagram illustrating a method performed in the analysis of stained, biological specimen images in accordance with an embodiment of the present invention. The method is preferably implemented in the form of a computer program. The computer program may be stored on a computer readable medium such as a magnetic or optical disc, or may be downloaded to a computer from a remote site over a network, such as the Internet, and thus embodied on a carrier wave. It will be - appreciated that the present invention may be implemented in these and other forms.
Generally speaking, the method of Figure 1 automatically determines the intensity threshold to be used for identifying or delineating areas of a stained biological specimen image to be included in image analysis. In particular, the method determines the best intensity threshold for segmented areas of a stained biological specimen image (such as the image of Figure 2), from which a mask, defining the areas of the image for image analysis, may be generated. The illustrated method may be performed as part of such an image analysis method in the context of an assay test, although it will be appreciated that the illustrated method may be carried out separately from, but nevertheless prior to, the analysis of the intensity of the stained areas of the specimen image. The method of Figure 1 will be described with reference to the stained, biological specimen image of Figure 2. The biological specimen depicted in Figure 2 relates to the aforementioned EGFR test, which involves attachment of a brown stain to the membranes of cells, which is indicative of the tested condition (i.e. where the EGFR resides). Thus, the method of Figure 1 is concerned with identifying stained cell membranes. It will be appreciated that for other tests, the stain or colour may be associated with other parts of cells, such as the nuclei, and the method identifies the boundaries of such other features or objects within the cells.
Thus, as illustrated in Figure 2, the darkest area of staining is associated with the membranes of cells, but some staining appears in the cytoplasm of the cells, as well as adjacent areas of the specimen where cells are not clearly and distinctly
visible. Thus, in this example, it is desirable to delineate the membrane of cells from other portions of the tissue, in order to identify areas of the image which represent cell membranes, whilst excluding other tissue areas. Referring to Figure 1, the method starts at step 100, by obtaining a digital image of a stained, biological specimen under test, such as the image of Figure 2. The image is typically retrieved from memory, where it has been stored following capture of the image of the specimen on a slide, using a microscope and digital camera, in accordance with conventional techniques. At this stage, parameters specific to, or associated with, the specimen may be manually or automatically set. For example, the colour of the stain to be identified in the image analysis is set for image segmentation. The colour may be automatically, digitally defined with reference to the test or stain associated with the image (i.e. a fixed colour corresponding to the stain is associated with a given test). In addition, a threshold illumination intensity range and threshold illumination intensity increment within the range is set, as explained in more detail below. In addition, the image may be assigned an identifier for the purposes of result storage for different threshold values, again, as described in more detail below. To set the colour of the stain, the image apparatus is arranged to identify the colours associated with a particular hue. So, for example, the hue associated with brown. A hue range may be used. Colour saturation, along with hue, may be used. This process allows the apparatus to focus in on the brown stained areas and ignore non-brown areas.
At step 110, the method selects an illumination intensity value for the relevant colour, which is typically the lowest value within the threshold range set at step 100. The threshold range may be a default range, for example between 0-255 (representing all digital illumination intensity values), or more preferably, between 90-160. The illumination intensity threshold range is preferably set at step 100 based on previous
results of best threshold values for specimens undergoing the same test, thus refining the threshold range.
At step 120, the method generates a "skeleton mask" based on the illumination intensity threshold set at step 110. The skeleton mask generation involves two main process steps.
Firstly, the method determines all the areas of the segmented image having an intensity value greater than the set threshold. For example, referring to Figure 2, if the intensity threshold set is 90, then this will reveal all areas of the specimen with an intensity of brown above the intensity value of 90. This will include areas of the cytoplasm within the cells surrounding the membranes, as discussed above.
Secondly, using the identified areas meeting the threshold, the method performs a "skeletonisation" procedure which defines a median line through identified object areas. As the skilled person will appreciate, the "skeletonisation" of an object is typically performed by iteratively removing all pixels at the outer edge of the object, until a line of one pixel thickness is achieved, this line constituting the skeleton mask. The pixels at the edge of an object are determined by comparing pixels of the object with adjacent pixels; all pixels at the edge of an object will be adjacent pixels which do not form part of the object.
Thus, step 120 results in a "skeleton mask" of one pixel thickness, defining the cell boundaries (membranes), based on the particular selected threshold. The lines of the skeleton mask may or may not correspond to the correct location of the cell membranes, since the threshold value may take into account areas of the cytoplasm.
This is discussed below in more detail with reference to Figure 3a-3c.
At step 130, the skeleton mask is analysed to determine a "fitness value" according to predefined criteria indicative of the likelihood that the skeleton mask closely corresponds to the relevant anatomical boundaries, in the present example the
cell membranes. One example criterion suitable for biological specimens, in particular cell membranes or nuclei, is the number of "closed loops" within the skeleton mask. Such closed loops represent the number of complete cells (or cell nuclei). It will be appreciated that the number of closed loops is functionally related to the number of pixels in the skeleton mask. Thus, the number of pixels in the skeleton mask may be used to determine a "fitness value" (the larger the number of pixels, the better the fitness value). Alternative criteria will be apparent to the skilled person. For example, the fitness value may be determined by looking for a predetermined shape or size of closed loops in the skeleton mask (again, representing cells/cell nuclei) or a predetermined distance between lines (representing spacing between cells/nuclei).
Thus, step 130 uses an algorithm based on one or more of the discussed criteria, to determine a "fitness value" of the skeleton mask. The fitness value may be a number within a predetermined range, for example 0-1 or 0-10.
At step 140, the method stores the fitness value together with the threshold value in memory associated with the image, using the image identifier. At step 150, the method considers whether the fitness value for all thresholds
(according to the threshold range and threshold increment set at step 100) has been determined. If the fitness values for all thresholds has not been determined, the method returns to step 110, and sets a new threshold value by adding the previous threshold value to the threshold increment. The method then continues with steps 120-150 using the new threshold, and is repeated until all thresholds within the set threshold range and increment have been considered.
It will be appreciated that rather than starting at a low threshold (e.g. 90) and incrementally increasing the value, it would be possible to start at a high value and incrementally decrease the threshold value. In such cases, it will be appreciated that the threshold increment would be a negative value.
When step 150 determines that all the thresholds have been considered, the method continues with step 160 by identifying the best fitness values within the stored results, and thus the corresponding best threshold for use in identifying the relevant boundaries.
The threshold determined at step 160 may then be used to determine a mask for image analysis using conventional techniques, as described below. Figures 3a to 3c depict skeleton masks generated by step 120 of the method of
Figure 1, based on the stained, biological specimen image of Figure 2. In these diagrams, the mask is shown in white on a black background. The skeleton masks can be considered as binary pictures. In Figure 3 a, the skeleton mask has been generated using a very low intensity threshold. In this case, wide areas surrounding the cell membranes of the specimen of Figure 2 are identified and skeletonised in step 120 of Figure 1, resulting in a mask having no interconnecting lines. Thus, when analysing this mask, using the step 130 of Figure 1 as described above, a low "fitness value" will be determined. In particular, the number of the pixels in the mask is relatively low, and there are no closed loops representing cells of appropriate size. Accordingly, a low fitness value is determined for this skeleton mask.
Figure 3 c illustrates a skeleton mask generated at step 120 of Figure 1 in which the threshold is set very high. In this case, only very highly stained areas of the specimen will be identified and skeletonised. Thus, the skeleton mask does not identify some cell membranes, which have absorbed the stain to a lesser extent and so have low intensity staining. Thus, as shown in Figure 3 c, the skeleton mask includes some closed loops which enclose more than one cell nucleus, when compared with Figure 2. When the fitness value of this mask is determined at step 130 of Figure 1, whilst the fitness value will be greater than that for the mask of Figure 3 a, the number
of closed loops and the number of pixels in the skeleton mask will be lower than a best fit mask.
Finally, Figure 3b illustrates a skeleton mask generated using the correct (best) threshold. Thus, the skeleton mask of Figure 3b has a high number of closed loops, representing cell membranes boundaries, and thus at step 130 of Figure 1, a high fitness value will be generated.
Accordingly, in this example, step 160 of Figure 1 will determine that the threshold used to generate the mask of Figure 3b is the most appropriate mask to use for the image analysis of the biological specimen image of Figure 2.
In accordance with an image analysis technique, as described above, the skeleton mask is used to generate a mask for the areas of the image to be used in image analysis. In particular, in the case of a test involving cells membranes, as in the
Figure 2 test, the skeleton mask of Figure 3b is widened to an appropriate pixel width, representative of the anatomical size of the cell membranes. For example, the thickness of a cell membrane may be 10 pixels wide in the image of Figure 2. Other pixel widths may be appropriate, depending upon the magnification of the objective lens used to acquire the image, the specification of the digital camera etc.
The generated mask of 10 pixels thick is then overlaid on the image, as shown in white in Figure 4, and all areas not covered by the mask are removed. The remaining areas of the image are then analysed. For example, the intensity of each and every pixel underlying the mask is determined, and a histogram generated to show the intensity distribution, corresponding to the extent of the staining, across the membranes of the cells. Alternatively, or in addition, a mean intensity value for all of the pixels may be determined, by adding together the intensity values for all of the pixels, and dividing the sum by the number of pixels. The results can then be analysed by an expert, such as a pathologist.
It will be appreciated that the present invention, by automatically scanning through intensity thresholds at increments within an appropriate intensity threshold range, is able to objectively determine the best threshold to use in determining the mask. The technique is accordingly highly tolerant. Since the generation of a mask for each threshold is time consuming (typically about 2 minutes), it would be desirable to increase the increment between selected thresholds when a wide threshold range is specified.
Thus, the above-described example of the present invention provides a method for automatic selection of a correct membrane/cytoplasm threshold to obtain best membrane specification in EGFR testing.
A specimen image is iteratively processed by creating an image that defines a line representing cytoplasm connectivity. A threshold where this line is to most accurately represent the characteristics of the cytoplasm boundary is used to specify the intensity boundary between cytoplasm and membrane stain. The line, representing cytoplasm connectivity, is determined using a skeleton process which reduces stained structures, according to the threshold, to a one pixel thick line through the centre of the object. Thus, it is only really concerned with the structure rather than how fat/thick an object is. By reducing the structure to its skeleton, the subjective effects of selecting different manual thresholds are obviated. Furthermore, the change that is seen in the final mask used for quantification is minimised. Although the starting point can be different dependent upon the threshold selection (membrane threshold is fatter or thinner) the structure remains the same. In tests, not only was change minimised, but there was also no difference at all in the mask generated, even though the threshold selected changed by 20 grey levels.
From results, the method has been shown to detect the correct threshold for a variety of stains from different sections, and is independent of intensity of stain.
It will be appreciated that the present invention is applicable to other tests involving biological specimen images.
Various modifications and changes may be made to the described embodiments. The invention should be interpreted to embrace all such modifications, changes and equivalents, which fall within the spirit and scope of the present invention. All various combinations of embodiments of the invention are within the scope of the present invention.