Detection of lung cancer in exhaled breath with electronic nose technology
Sharina Kort is a PhD student in the research group Cognition, Data and Education. Supervisor is prof.dr. J.A.M. van der Palen from the Faculty of Behavioural, Management and Social Sciences and co-supervisors are dr. M.G.J. Brusse-Keizer from MST (Medisch Spectrum Twente) and dr. H. Schouwink from the Faculty of Behavioural Management and Social Sciences.
Lung cancer is the leading cause of cancer-related mortality worldwide. Lung cancer is subdivided in two major types: non-small cell lung cancer (NSCLC) and small-cell lung cancer (SCLC), respectively accounting for approximately 85% and 15% of the cases. Furthermore, NSCLC can be categorized in many subtypes, being the two most common adenocarcinoma and squamous cell carcinoma, each with different tumour characteristics, treatment options and prognosis. The 5-year survival rate for localized stage NSCLC approximates 60%, whilst the 5-year survival rate for metastatic disease equals 5%. In case of SCLC, 5-year survival rate for localized disease approximates only 30%, whilst the 5-year survival rate for metastatic disease conforms 3%. Despite substantial progress in the treatment options, such as targeted therapies with tyrosine kinase inhibitors (TKI’s), immune therapy, improvements in surgical options, and personalized treatment, this high lung cancer related mortality reflects the fact that the majority of the patients present with advanced-stage disease, which is not curable.
In the past decades, various non-invasive technologies have been investigated as a potential tool to diagnose lung cancer. One of these technologies concerns exhaled breath analysis based on pattern recognition by electronic nose technology. Exhaled breath contains, besides inorganic compounds, such as water vapour, nitrogen, and carbon monoxide, also thousands of volatile organic compounds (VOCs) reflecting physiological and pathophysiological metabolic processes in the body. In case of a disease, metabolism alters leading to exhalation of a different composition of VOCs which can be captured by highly sensitive sensors and measured with artificial intelligence techniques. This type of technology mimics human olfaction in which one needs to be trained to recognize familiar smells and allows the electronic nose to recognize a ‘smell’ that matches lung cancer, or any other condition for which the electronic nose has been trained. In order to implement a new diagnostic tool to diagnose lung cancer, the technique should first be trained and validated to state whether the new technique is of sufficient additional value in clinical practice.
In this thesis, we investigated the potential of exhaled breath analysis to diagnose lung cancer by performing studies in which we trained and validated an electronic nose (Aeonose™) to distinguish patients with lung cancer from subjects without lung cancer. Chapter 2 and Chapter 3 mainly focus on methodological issues concerning exhaled breath analysis based on pattern recognition with machine learning techniques. Chapters 4-6 show results of clinical studies in which the Aeonose is trained and validated.
In Chapter 2 we outlined the proposed multicentre study design how to train the Aeonose as a diagnostic tool to diagnose lung cancer. This manuscript mainly focused on the technical working mechanism of the device and the statistical analyses incorporating artificial intelligence and internal validation techniques to classify subjects as having lung cancer or not. We showed how a large amount of training data could be handled in such a way to prevent the risk of overfitting the prediction model.
As stated above, after a prediction model has been developed on training data, it is fundamental to validate this prediction model on new data in order to assess reproducibility and generizability of this prediction model in independent subjects. Since current diagnostic techniques rapidly evolve due to highly innovative technologies, inclusion of subjects for external validation often takes too long to properly assess the relevance and efficiency of the developed prediction model. In Chapter 3 we proposed a methodological study design to simultaneously develop and validate prediction models based on machine learning techniques in general. We used our training study as published in 2018 (Chapter 4) as a demonstration for applying this proposed study design. This type of study design is especially suitable in case of an innovative, but highly relevant, diagnostic technique which can rapidly change due to technological developments, or in case of a rare disease where inclusion of subjects takes a lot of time.
Chapters 4-6 show results of clinical multicentre studies where the Aeonose is trained and validated. The prediction model developed on the training data has been extended with clinical data to improve the diagnosis of lung cancer. In Chapter 4 we performed an exploratory multicentre study to train the Aeonose to distinguish non-small cell lung cancer patients from subjects without lung cancer based on exhaled breath. Based on 290 subjects (144 NSCLC patients, 146 controls), the prediction model was able to discriminate both groups with a sensitivity of 94%, a specificity of 33%, a negative predictive value (NPV) of 86%, and an AUC of 0.76 (95% confidence interval (CI): 0.71-0.82). Since lung cancer is characterized by a high mortality when not timely remarked, we focused on a high negative predictive value, which was obtained. This high negative predictive value implies that a large number of subjects suspected of lung cancer could be prevented from undergoing unnecessary, probably invasive, interventions. Besides evaluation of the discriminative performance between NSCLC patients and non-NSCLC subjects, additional sub-analyses were performed on the two most common NSCLC histology types, i.e. adenocarcinoma, and squamous cell carcinoma. Squamous cell carcinoma showed an impressive high negative predictive value of 93% with an AUC of 0.78, indicating that in case of an Aeonose™ value lower than -0.015, there is a high certainty, with high clinical relevance, that squamous cell carcinoma is absent. Adenocarcinoma showed a slightly lower diagnostic accuracy with an AUC of 0.73, which might by explained by the heterogeneity of adenocarcinoma tumours. Also, in a small sub-analysis to evaluate differences in breath patterns between SCLC patients and non-lung cancer patients, we found promising results to exclude SCLC with a high NPV of 97%, and an AUC of 0.86 (95% CI: 0.78-0.95). However, it must be noted that all sub-analyses were performed in a small group of subjects and further research is needed. Besides, all analyses in the training study have been performed with a CE-unceritfied Aeonose™ device and have not been validated on independent data.
In Chapter 5 we investigated the potential additional value of adding clinical parameters, which are also known to be predictive for lung cancer, to the obtained prediction model based on exhaled breath data from the training cohort in Chapter 4. We found that variables such as age, sex, smoking status, number of pack-years, presence of COPD, and the absolute classification value of the Aeonose™ were associated with the presence of lung cancer. Two types of multivariable statistical analysis were performed to assess the additional value of the extended prediction models. First, a multivariable logistic regression analysis, in which the absolute classification value of the Aeonose™ as obtained by the neural network analysis, was entered as an independent variable together with the univariately associated clinical variables. This model showed a substantial increase in diagnostic performance with an AUC of 0.86, a sensitivity of 96%, a specificity of 60%, and an NPV of 93%, compared to the original model based on exhaled breath data only with an AUC of 0.76. Second, we added the univariately independent variables a priori as an extension of the vector containing breath data defining the input of the artificial neural network, that was used in the training study to build the prediction model. This neural network model also showed great improvement of diagnostic performance to diagnose lung cancer with an AUC of 0.84, a sensitivity of 94%, a specificity of 49%, and an NPV of 90%.
Not only did sensitivity and NPV increase, also specificity increased in the multivariable models meaning a lower number of subjects that are incorrectly classified as having lung cancer.
Since the training studies, with and without clinical variables, indicated promising results of the Aeonose to diagnose lung cancer, we performed a large, multicentre, multinational validation study with multiple Aeonose devices to assess reproducibility and robustness of the obtained results. The results of this validation study are presented in Chapter 6. Due to the issue of continuous improvements in technology as mentioned in Chapter 3, and therefore the use of a second generation, CE-certified, Aeonose, it was decided not to use the original data from the training cohort as described in Chapter 4. Instead, we recruited new subjects and performed a split-sample design which enabled development and subsequent validation of a new prediction model. The training set consisted of 376 subjects (160 lung cancer patients, 216 clinically relevant controls) and the validation set consisted of 199 subjects (79 lung cancer patients, 120 controls). We observed a moderate model performance to discriminate patients with NSCLC and controls based on exhaled-breath data only, at a cut-off probability of 20% for the diagnosis of lung cancer, with similar results in the validation set including ‘blind’ subjects. This prediction model showed a sensitivity of 88%, a specificity of 48%, a PPV of 52%, an NPV of 87%, with an AUC of 0.79 (95% CI: 0.72-0.85) in the validation set. As seen in Chapter 5, adding relevant clinical variables that are also predictive for lung cancer substantially improved the diagnosis of lung cancer. Exhaled-breath data and clinical parameters from the training set were combined in a multivariable logistic regression analysis, maintaining a cut-off of 16% probability of lung cancer, resulting in a sensitivity of 95%, a specificity of 51%, and an NPV of 94%. This corresponded to an AUC of 0.87 (95% CI 0.83-0.90). When applying the identical multivariable logistic regression model on the validation set, maintaining the selected cut-off probability of 16%, we observed a sensitivity of 95%, a specificity of 49%, a PPV of 54%, and an NPV of 94%, with a corresponding AUC of 0.86 (0.81-0.91). This would mean, in case of this cut-off probability of 16%, that 63 of the 196 subjects (32%) were classified as “no lung cancer” and could, with high certainty, be prevented from undergoing unnecessary interventions.
In Chapter 7 we place the main results of the performed studies in a broader context to discuss the relevance of the findings and future implications. Future research is needed to evaluate the value of exhaled breath analysis in lung cancer screening programmes, but also as an application to monitor treatment responses and detect early recurrence of the disease.