The clinical implementation of optical spectroscopy in colorectal cancer surgery
Lisanne Baltussen is a PhD student in the research group Nanobiophysics (NBP). Her supervisor is prof.dr. T.J.M. Ruers from the Faculty of Science and Technology (TNW).
The standard treatment in patients with colorectal cancer is surgery. For locally advanced rectal cancer this is more often combined with neoadjuvant (chemo)-radiotherapy. During colorectal cancer surgery a balance should be found between complete removal of the tumor and sparing as much healthy tissue as possible to prevent complications. A technique that can provide real-time tissue classification during surgery might be of great benefit for the surgeon to prevent positive resection margins and complications due to too extensive surgery.
In this thesis fiberoptic diffuse reflectance spectroscopy (DRS) and diffuse reflection hyperspectral imaging (HSI) are examined as real-time tissue classification techniques that can ultimately be used during colorectal cancer surgery. In both techniques the interaction between tissue and light is examined. Providing information on tissue constituents and structure. Based on this information healthy tissue could potentially be discriminated from tumor tissue in colorectal cancer.
In Chapter 2, fiberoptic DRS is first examined on tissue samples obtained at the pathology department after resection. At the pathology department three tissue samples were obtained – fat, healthy colorectal wall and tumor tissue – which were all placed in a pathology cassette. The samples remained in the cassette during the measurements and during further processing at the pathology department. Images were taken of each measurement location, to eventually register each measurement location to the histological images obtained of each tissue sample. The fiberoptic DRS spectra were divided into spectral bands based on intensity values. The intensities of these spectral bands were used for classification of the spectra using a quadratic classifier and a support vector machine (SVM). The quadratic classifies was used to distinguish fat from healthy colorectal wall and tumor tissue. The SVM was trained to separate healthy colorectal wall from tumor tissue. To train and test both classifiers, the dataset was randomly divided into a train (80%) and test (20%) set. The training and testing of the classifiers was repeated ten times, using different randomly selected train and test datasets.
In total 38 patients were included in this study. Of these 38 patient, 36 had colorectal cancer and 2 had an adenoma. The quadratic classifier obtained a train accuracy of 1.00. The train accuracy of the SVM was 0.92. When combining the two classifiers and applying this pipeline to the test dataset, a mean accuracy of 0.95 (± 0.03) over all tissue types and the ten repetitions was obtained. The best accuracy was obtained for fat (1.00 ± 0.00), followed by healthy colorectal wall (0.93 ± 0.05) and tumor (0.92 ± 0.09). When measurement locations which were located at the border between two tissue types were classified, the classification of 80% of the locations was in accordance with the most prominent tissue type present at this location.
Similar tissue samples that were used in Chapter 2, were imaged using two hyperspectral cameras in Chapter 3. In this chapter registration of the pathology classification was done for the entire tissue sample. After registration, the diffuse reflection hyperspectral images were normalized using the standard normal variate (SNV) normalization. Thereafter, the dataset was randomly divided into a train and test set. The train dataset contained 75% of the patients and the test dataset contained the remaining 25%. Feature reduction was again done using spectral bands, based on the intensity values of the spectra. Using these features, two classifiers were trained. A quadratic classifier to classify fat and a SVM to separate tumor tissue from healthy colorectal wall.
In this study 54 patients were included. The tissue samples of 32 patients were imaged with both hyperspectral cameras, the remaining 22 patients were only imaged with the hyperspectral camera covering the near-infrared wavelength range. When data obtained with both hyperspectral cameras were used for classification an average accuracy over all tissue types of 0.88 (± 0.13) was obtained. When data of just one of the cameras was used for classification the average accuracy decreased to 0.67 and 0.83 for the camera in the visual wavelength range and near-infrared wavelength range, respectively. From this it was concluded that HSI could be used in tissue classification during colorectal cancer surgery, but that the combination of both cameras seems necessary for the most optimal classification.
In locally advanced rectal cancer, patients often receive neoadjuvant (chemo)-radiotherapy before surgery. The neoadjuvant radiotherapy causes fibrosis around the tumor area. Fibrosis is classified by the pathologist as healthy tissue and can thus remain in the patient during surgery. However, due to similar visual and tactile feedback for fibrosis and tumor it is often hard for the surgeon to discriminate fibrosis from tumor. Here, again a real-time tissue classification technology could be of great benefit to the surgeon. Therefore, in Chapter 4 a classification is trained and tested on the discrimination of fibrosis from tumor.
In this study fiberoptic DRS measurement were performed on the entire specimen of 38 patients with rectal cancer. Due to the limited number of pure tumor measurements obtained in these 38 patients (5 measurements with tumor at the surface) tumor measurements from Chapter 2 were used in a first classification. In this classification a SVM was trained and tested on pure fibrosis and pure tumor measurements. Training and testing of the classifier was performed using a ten-fold cross-validation, which was repeated ten times. The first classification resulted in a mean test accuracy of 0.88 (± 0.02), a mean sensitivity of 0.91 (± 0.01), a mean specificity of 0.86 (± 0.03) and a mean Matthews correlation coefficient (MCC) of 0.76 (± 0.03). Thereafter, a second SVM was trained and tested on data that was obtained only in the current study, resulting in almost no pure tumor measurements. This classification resulted in a mean accuracy of 0.61 (± 0.05), a mean sensitivity of 0.51 (± 0.10), a mean specificity of 0.66 (± 0.11) and a mean MCC of 0.17 (± 0.08).
The decrease in all performance measures could have several explanations. First of all, in the first classification, data of two different studies were combined. Even though, the measurement set-up and measurement probe used in both studies were the same, differences between the datasets could have been present due to the different nature of the tissues. In the study reported in Chapter 2, tissue slices were used to perform measurements on, whereas in the current study entire rectum specimen were used. This different might have caused differences between the tumor measurements and fibrosis measurements. Another explanation is the limited number of pure tumor measurement that were measured in the current study. Only 5 measurements were performed on locations at which tumor was found at the surface. Creating a classification based on such limited data is hard. In the spectral analysis performed in this chapter it was shown that there is a spectral difference between pure tumor and pure fibrosis measured in the current study. However, it is hard to discriminate tumor with a layer of fibrosis on top from pure fibrosis.
In Chapter 5, fiberoptic DRS measurements were performed during surgery. Measurements were performed by the surgeon on fat, healthy colorectal wall and on a location close to the tumor. All measurement locations were marked by the surgeon for pathology verification. The measurements were processed and classified after surgery. Before classification the spectra were normalized at 800 nm. Thereafter, two SVMs were trained and tested using a ten-fold cross validation. The first SVM was used to discriminate fat from healthy colorectal wall and tumor. The second SVM was used to discriminate healthy colorectal wall from tumor. Besides the classification, the influence of the depth of the tumor in the measurement volume on the classification was analyzed. The analysis was performed by increasing the depth of the tumor, at which a measurement location was classified as tumor. Furthermore, the classification results were compared to the clinical judgement of the surgeon to show the added value of this technology.
Measurements of 32 patients were included in the analysis. Classification of fat was done with a mean MCC of 0.83, for healthy colorectal wall the mean MCC was 0.77 and for tumor the mean MCC was 0.73. In the depth analysis it was shown that the best accuracy and MCC values were obtained when a measurement was classified as tumor, if tumor was present up to a maximum depth equal to the fiber distance. With increasing depths of the tumor, the accuracy and MCC values showed a big decrease. Which was expected because if tumor is only present at larger depths, the amount of tumor present in the measured volume is too small to detect using fiberoptic DRS. With decreasing depths, a decrease in accuracy and MCC values was shown as well. This was not expected, but could be explained by the limited number of measurement locations in which tumor was present this superficial.
Finally, the outcome of the classification was compared to the clinical judgement. For this analysis only measurement were taken into account of which the surgeon indicated not to be sure whether tumor was present, 54 out of 270 locations. No false negative classification were allowed in this analysis. It was found that with a threshold for the classification allowing no false negative predictions, 25% of the healthy locations were falsely classified as tumor. For the surgeon 69% of the healthy locations were falsely classified as tumor. This shows the potential added value of fiberoptic DRS in surgery.
In Chapter 6, a comparison is made between measurements obtained in an in vivo setting and measurements obtained ex vivo. For new technologies it is common practice to first perform ex vivo measurements, which are thereafter repeated in vivo. The ex vivo measurements are used to prove a concept, but because in vivo optical properties might change, measurements are always repeated in vivo. In this chapter it was examined whether ex vivo measurements differ significantly from in vivo measurements and if ex vivo measurements can be used to train a classification that can be used on in vivo measurements.
For the analysis a cross-correlation of the data was assumed because of the correlation of measurements performed within one patient and the correlation of measurement either performed in vivo or ex vivo. The analysis was performed on parameters obtained using an analytical fit model which is based on optical diffusion theory. It was found that only the diameter of the blood vessels and the scale factor did not differ significantly between in vivo and ex vivo for all tissue types.
Even though, almost all parameters differed significantly between in vivo and ex vivo, there was no significant difference between the classification outcomes of a classifier trained and tested on in vivo data and a classifier trained on ex vivo data and tested on in vivo data. Based on forward feature selection four parameters were selected to create a classification with a reduced number of parameters. The four selected parameters were the blood volume fraction, the saturation of the blood, the volume fraction of water and fat and the volume fraction of fat from water plus fat. With these selected parameters similar results were obtained, compared to the classification using all parameters. This shows that is it possible in colorectal cancer to include ex vivo measurements in the training of a classification, that is used for in vivo measurement classification.
In this thesis several different analysis techniques were used. In the final chapter it is examined which analysis technique performs best for the classification of healthy colorectal wall versus tumor. In this chapter, normalization techniques, feature extraction techniques, and different classifiers are examined in nine different datasets.
It was found that overfitting is a problem in complex classifiers like k-nearest neighbor (kNN), linear discriminant analysis (LDA) and decision tree classifiers. Especially when the entire spectrum (1151 wavelengths) is used for classification, overfitting is present from these three classifiers. If feature reduction techniques are applied, the chances of overfitting decrease for both LDA and kNN. For decision tree classification, even with feature reduction there is a large chance of overfitting. Overall SVM and neural network (NN) classifiers showed the best results with the least chance of overfitting.
Normalization overall did not seem to improve classification results. However, it was found that in the fiberoptic DRS datasets, in which disposable needles were used, normalization did improve the classification outcome. This might have to do with small differences between the disposable needles used in for these two dataset. Moreover, between the white reference and the actual measurements, a switch needed to be made between the white reference needle and the measurement needle. This switch might have introduced intensity differences which were not patient or tissue specific. To overcome these intensity differences, normalization might be a solution. If normalization is used, it is recommended to use normalization at 800 nm, instead of normalization using the area under the curve or SNV normalization.
Comparing different feature extraction techniques to the use of the entire spectrum, it was found that it is most useful to use features that describe the shape of the spectrum as good as possible. The parameters obtained with the analytical model based on optical diffusion theory were found to be the best parameters used for classification. However, these parameters cannot always be used due to assumptions made in the model on fiber distance and the assumption that illumination was done with a collimated beam, excluding diffuse reflection HSI. If fit parameters could not be used, shape-based features were the best feature extraction method. It was concluded that the optimal classification of healthy colorectal wall and tumor is done using a SVM or NN, using fit parameters or shape-based features, without normalization.