Skip to main content

Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers

Abstract

Background

Tumor mutation burden (TMB) has been considered a biomarker for utilization of immune checkpoint inhibitors(ICIs), but whole exome sequencing(WES) and cancer gene panel(CGP) based on next generation sequencing for TMB detection are costly. Here, we use transcriptome data of TCGA to construct a model for TMB prediction in gastrointestinal tumors.

Methods

Transcriptome data, somatic mutation data and clinical data of four gastrointestinal tumors from TCGA, including esophageal cancer (ESCA), stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ). Using R, we performed visual analysis of somatic mutation data, differentially expressed genes (DEGs) function enrichment analysis, gene set enrichment analysis (GSEA), and estimated TMB value in clinic. Finally, a deep neural network (DNN) model was constructed for TMB prediction.

Results

Visualization of somatic mutation data summarized the classification of mutation, frequency of each mutation type, and top-mutated genes. GSEA showed the enrichment of CD4+/CD8+ T cells in the high TMB group and the activation of tumor suppressing pathways. Single-sample GSEA (ssGSEA) manifested that the high-TMB group had higher level of multiple immune cells infiltration. In addition, distribution of TMB was related to clinical parameters.

Like age, M stage, N stage, AJCC stage, and overall survival(OS). After model optimization using genetic algorithm, in the training set, validation set, and testing set, the Pearson relevance coefficient r between predicted values and actual values reaches 0.98, 0.82, and 0.92, respectively; the coefficient of determination R2 is 0.95, 0.82, and 0.7, respectively.

Conclusion

TMB correlates with clinicopathological parameters in gastrointestinal carcinoma, and patients with high TMB have higher levels of immune infiltration. In addition, the DNN model based on 31 genes predicts TMB of gastrointestinal tumors in a high accuracy.

Peer Review reports

Introduction

The latest research shows that in 2020, the number of new cancer patients worldwide reached nearly 20 million, and nearly half of them died. Breast cancer accounted for the first place in all cases, and colorectal cancer and gastric cancer ranked third and fifth of which the deaths ranked second and fourth respectively. It is estimated that by 2040, the number of cancer patients will increase by nearly 50% globally, and the proportion and increase of tumor burden in transitional countries will be greater than those in post-transition countries [1]. Others predict that the global tumor burden will continue increasing in the next 50 years, and the incidence of all types of tumors will double by 2070, reaching twice of that in 2020 [2]. For China, in the past 30 years, although the overall incidence of esophageal cancer and gastric cancer has shown a descending trend [3, 4], the incidence and mortality of colorectal cancer have been rising [5]. Meanwhile, the number of patients suffering from these four types of gastrointestinal cancers and the deaths caused by them have been on an upward trend, predicted to last in the next 20 years. The prognosis of gastrointestinal tumors is closely related to the stage at diagnosis. However, since gastrointestinal endoscopy such as gastroscopy and colonoscopy are far from popularization, most patients exhibit an advanced stage when diagnosed, resulting in unsatisfactory post-operation survival. The five-year OS of esophageal cancer hardly reaches 20%.

Tumor immunotherapy relies on application of immune checkpoint inhibitors, which mainly include three immune checkpoints negatively regulating T cells activity, CTLA4 and PD-1/PD-L1. Since anti-PD-1 was applied to melanoma treatment in 2014 [6], anti-PD-1/PD-L1 drugs have been successively applied in in other tumor types, including skin cancer, non-small cell lung cancer, and gynecological tumors such as cervical cancer and ovarian cancer and gastrointestinal cancers and so on. Treatment options include the use of anti-CTLA4/PD-1/PD-L1 alone, combination within three ICIs or with other targeted agents. Although some patients achieve long-term response after receiving ICIs, especially those accepting combined immunotherapy, a considerable number of patients experience rare adverse reaction occurring in diverse organs and systems, also known as immune-related adverse events (irAE). IrAE in skin and digestive tract reactions are most commonly seen [7]. In patients with gastrointestinal tumors, the effect of using ICIs alone is not satisfactory enough, but studies have shown that combination of ICIs or combination of ICIs with other targeted agents such as anti-vascular drugs prolonged overall survival of advanced gastric cancer [8]. Hence, there is an urgent need to find suitable biomarkers to screen the beneficiaries of tumor immunotherapy, maximizing the anti-tumor effect of ICIs while minimizing frequency of occurrence of irAE.

TMB refers to the sum of mutation events per million bases, including non-synonymous mutations such as insertions, deletions, substitutions, and somatic coding errors. TMB has been approved by the FDA as a screening indicator for people who benefit from ICIs employment, because more aberrant proteins may be translated in those with high TMB, which may increase the appearance of immune neoantigens. As a result, anti-tumor effect of T cells can be enhanced after using ICIs [9]. Patients with rectal cancer of high TMB level tend to obtain higher neoantigen load and thus are more likely to acquire pathological complete response after neoadjuvant chemotherapy [10]. In colorectal cancer, microsatellite instability(MSI) combined with TMB can more accurately classify the tumor immune microenvironment of patients, thereby guiding tumor immunotherapy [11]. For microsatellite-stable(MSS) colorectal patients, those with presence of high TMB may be more likely to benefit from anti-PD-1 immunotherapy [12]. At the same time, patients with resectable gastric cancer with high TMB have been confirmed by immunohistochemical staining to have a higher level of immune infiltration [13]. Similarly, advanced gastric cancer patients with high TMB have a longer overall survival after undergoing Toripalimab therapy, indicating that TMB can be used as an efficacy indicator for anti-PD-1 therapy in gastric cancer [14]. Besides, researchers find that nearly 95% GC patients in China are characterized as MSS, and for stage II/III MSS GC patients, high TMB has been proved to be positively associated with neoadjuvant response [15].

Practically, WES is recognized as the gold standard for TMB detection, followed by CGP sequencing based on next-generation sequencing. Both detection methods require tumor tissue from patients. The former is expensive to be popularized, and the latter is more commonly used. However, results from different sequencing platforms hold different baseline, and the TMB threshold is not fixed yet. Studies have also shown that there is a strong correlation between the blood tumor mutation burden (bTMB) of circulating tumor DNA and tissue tumor mutation burden(tTMB) of WES detection. The Pearson correlation coefficient, r, reaches 0.62, and non-small cell lung cancer patients with high bTMB accepting anti-PD- 1Treatment shows higher progression-free survival rate and objective responsive rate [16]. In addition, there are also reports based on bioinformatical algorithms such as machine learning or deep learning, using lncRNA and miRNA data to build models to predict TMB [17, 18]. Intriguingly, some morphologies in H&E slides such as high tumor/lymphocytes, vasculature/red blood cells are found to be relevant to TMB, and a convolutional network built based on these characteristics can be applied for TMB level prediction [19]. However, these studies are limited within single tumor type, and target value is merely restricted to 2 levels of TMB instead of exact numerical value of TMB, much less to become the fourth TMB detection method which can be served in the clinic.

In this study, we first integrated the transcriptome data, somatic mutation data and clinical data of four gastrointestinal tumors including ESCA, STAD, COAD and READ, following by series of visualization of the somatic mutation data and investigation into association of TMB with clinical parameters and tumor microenvironment. Finally, a DNN model is constructed using the transcriptome data of 31 genes to achieve the purpose of accurately predicting TMB of patients with gastrointestinal carcinomas.

Methods

Data downloading and processing

Transcriptional data of 1194 samples encompassing ESCA, STAD, COAD and READ, somatic mutation data of 1152 samples (masked somatic mutation based on varscan software) and clinical information of 1177 samples, including age, gender, grade, T stage, N stage, M stage and total stage were downloaded from TCGA (https://portal.gdc.cancer.gov/). After removing of normal samples and incomplete data, 1152 samples were used for somatic mutation visualization analysis; 993 samples containing both clinical data and TMB were used to evaluate the clinical significance of TMB; 996 samples containing both transcription data and TMB were used for subsequent DEGs analysis, GO and KEGG functional enrichment analysis, GSEA and model construction. Figure 1 shows the work flow of this research.

Fig. 1
figure 1

Flowchart of the design including somatic mutation visualization and TMB prediction model construction. ESCA, esophageal carcinoma; STAD, stomach adenocarcinoma; COAD, colon adenocarcinoma; READ, rectum adenocarcinoma; TMB, tumor mutation burden; DEGs, differentially expressed genes; Lasso, least absolute shrinkage and selection operator; r, Pearson relevance coefficient; R2, the coefficient of determination

Visualization of somatic mutation data

The Maftools package is used for TMB extraction of all samples, including the sum of non-synonymous mutations such as somatic coding errors, base substitutions, insertions and deletions. Split by the median of TMB-3.84, 996 samples ware clarified into two groups: high- and low-TMB. We have counted the classification of somatic mutation and corresponding alteration frequency, and the top-30 mutated genes are presented in a waterfall.

DEGS and functional enrichment analysis

Using limma, the DEGs analysis was performed between 2 TMB groups, and genes with P value < 0.05 and |Log FC|≥ 1 were considered as differentially expressed genes of which the top-40 were displayed in a heatmap realized with pheatmap package. “clusterProfiler”, “org.Hs.eg.db”, “enrichplot”, and “ggplot2” were applied to perform GO and KEGG functional enrichment analysis based on 256 DEGs. GSEA was also performed with software GSEA_4.1.0, including “immunological signatures” and “kegg pathways”. “plyr”, “grid”, “gridExtra” packages were later used to visualize the first 10 pathways, with FDR < 0.05 as inclusion criteria.

Tumor-infiltrating Immune cells analysis in 2 TMB groups

Based “GSVA” package and CIBERSORT algorithm, we performed ssGSEA analysis on gastrointestinal tumors and obtained the infiltration scores of 22 immune cells in each sample. Then, we compared the difference in level of 22 tumor-infiltrating immune cells in 2 TMB groups and the result was exhibited in a violin plot realized with “vioplot” package.

Estimation of clinical significance of TMB

Based on characteristic of 993 samples, we analyzed the differential distribution of TMB under impact of various clinicopathological parameters, including age, gender, grade, T stage, N stage, M stage, AJCC stage, site, based on Wilcoxon rank-sum test and Kruskal–Wallis test. Using Kaplan–Meier survival analysis and Log-rank test, we compared the overall survival time in 2 TMB groups.

Feature values screening

After the integration of transcriptome data and somatic mutation data, 996 tumor samples with record of 56,753 genes. Using python3.5, we first conducted a Pearson relevance analysis of TMB and mRNA expression of all genes, and genes with correlation coefficients greater than 0.28 were obtained for further screening. Pearson coefficient is customarily considered as very strong (|r|> 0.7), moderate (0.7 ≤|r|< 0.5), fair (0.5 ≤|r|≤ 0.3), or poor (|r|< 0.3) in medical area [20]. To make sure there were enough liner-correlated variables for latter lasso regression analysis, cut-off value of r was set as 0.28. On account of the exclusive advantage of Pearson relevance analysis in linear fitting, lasso regression was also conducted for its trait of capturing non-linear relation amid variables. Using Lasso regression and five-fold cross-validation, 31 genes were determined as final feature values of the DNN model, which largely helps avoiding model overfitting by decreasing amount of input variables.

DNN Model construction, optimization and evaluation

For model construction, we here choose deep neural network rather than machine learning models such as support vector machines and random forest. Given that 31 features are selected as the final input, we all make it a consensus that machine learning tends to show an over-fitting phenomenon once the input variables increase and the signal-to-noise ratios of the data decrease, that is, the evaluation results in the training set are often far better than in the testing set. The feature values of the DNN model were mRNA expression of 31 genes screened in Sect. 2.6, and the target value was numerical TMB value. 996 samples were randomly split by 7:2:1, corresponding to training set, validation set and testing set to participate in the training, optimization and evaluation of the model, respectively. The basic structure of the model is a deep neural network (DNN): one input layer, one output layer and two hidden layers. "RELU" function was chosen as activation function of two hidden layers for realization of nonlinear fitting, while ‘LINEAR’ function was applied in the output layer, with “ADAM” employed as the optimizer. The hyper-parameters of the DNN model refer to the parameters that need to be manually set before model training, which cannot be automatically generated from the data and training process. Once the data and network structure are fixed, hyper-parameters of the model exhibit the greatest impact on the accuracy of the network. We first obtain the hyper-parameters exerting significant impact on predictive value of the DNN model according to a simple sensitivity training test (Table 2), and then use validation set to find the best combination of hyper-parameters under instruction of genetic algorithm. Adding a dropout layer to the hidden layer and regularization of the latter are both involved in the process of hyper-parameters optimization, to avoid model overfitting. Here we obtain another reason why we choose the DNN for model construction: the genetic algorithm is considered to be one of the best methods for the DNN optimization, which makes it possible that the prediction potential of the model can be fully explored. Finally, testing set is used to verify the effect of hyper-parameters optimization on the DNN model. Support to the genetic algorithm optimization process comes from the two: Python3.5 and the hardware called Para Cloud Supercomputing-CPU. In this study, we use the Pearson correlation coefficient, rand the determination coefficient, R2, to evaluate the predictive ability of the DNN model. The coefficient of determination (R2) is calculated as follows:

$$R^{2} = {{\sum\limits_{i = 1}^{n} {(\hat{y}_{i} - \overline{y})^{2} } } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{n} {(\hat{y}_{i} - \overline{y})^{2} } } {\sum\limits_{i = 1}^{n} {(y_{i} - \overline{y})^{2} } }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{n} {(y_{i} - \overline{y})^{2} } }}$$

where: \(\overline{y}\) = sample expectation of actual value; \(n\) = sample size of the testing set; \(y_{i}\) = actual value in the testing set;\(\hat{y}_{i}\) = predicted value in the testing set.

R2 indicates the fluctuation percentage of the actual value that can be displayed by the fluctuation of the predicted value, or in other words: how much variation of the actual TMB can be represented by the predicted ones.

Statistical analysis

The statistical analysis methods involved in this research are all realized by R packages. Spearman correlation analysis is used to evaluate the linear correlation between transcriptome data and TMB. The Deseq2 package is adopted for DEGs acquisition, FDR < 0.05 is considered to be statistically different. The comparison of tumor-infiltrating immune cells abundance between groups is fulfilled with Wilcoxon rank sum test, while the diverse distribution of TMB among different subgroups of clinical parameters was evaluated with Kruskal–Wallis test, p < 0.05 was considered statistically significant.

Results

Profiles of somatic mutation in gastrointestinal cancers

Among 1152 patients accepting WES, 1098 (95.31%) patients underwent somatic mutation. As shown in Fig. 2A, among the 9 types of mutations, missense mutation accounted for the largest proportion. Single nucleotide polymorphisms (SNPs) were far more than insertions and deletions (Fig. 2B), and C > T was most commonly seen (Fig. 2C). In addition, we calculated the number of mutations in each sample. The median of variants reached 90. Different mutation types were represented by box plots of various colors (Fig. 2D-E). Figure 2F and Fig. 2G displayed the top-10 mutated genes, with the highest mutation frequency being TP53 (58%), followed by TTN (46%), APC (41%), MUC16 (27%), SYNE1 (24%), KRAS (23%), LRP1B (19%), PIK3CA (18%), FAT4 (18%), FLG (17%).

Fig. 2
figure 2

Profile of somatic mutation of 996 gastrointestinal cancers. (A) Alteration forms in all samples within which missense mutation occupying the biggest proportion. Three variant types included for tumor mutation determination among which single nucleotide polymorphism (SNP) shown as most commonly seen form compared with deletion or insertion. (C) C > T transition displayed as the most frequent in six subclasses of SNP. D-E Variant per sample and per classification in the TCGA gastrointestinal cancer cohort. F Top 10 altered genes in all samples. Waterfall showing the top-30 mutated genes with 8 variant types

DEGs determination and functional annotation

According to the preset condition, P value < 0.05 and |Log FC|≥ 1, a total of 256 were obtained as DEGs, and the top-40 of them were shown in Fig. 3A. As shown in Fig. 3B, the GO annotation revealed that DEGs were concentrated in smooth muscle contraction and regulation of it (BP), tissue components involved in muscle contraction such as extracellular matrix and contractile fibers (CC), substantial combination and signal transduction (MF). The result of KEGG annotation showed enrichment in digestive juices secretion such as pancreatic juice, bile and gastric acid secretion pathways (Fig. 3C). CD4+/CD8+ T cells activation was demonstrated by GSEA in high-tumor TMB group (Fig. 3D), together with the P53 pathway, nucleotide excision repair and RNA degradation related pathways activation (Fig. 3E). Results mentioned above were all under the condition of FDR < 0.05.

Fig. 3
figure 3

Differentially expressed genes between high- and low-TMB group and functional enrichment analysis. A Heatmap of top 40 differentially expressed genes (DEGs). B GO and (C) KEGG functional analysis for 256 DEGs. BP, biological process; CC, cellular component; MF, molecular function. Gene set enrichment analysis between 2 TMB groups for (D) immunological signatures and (E) kegg pathways using software GSEA_4.1.0

Infiltration of Immunocytes in 2 TMB groups

After calculation of immunocytes infiltrating scores with CIBERSORT algorithm, proportion of 22 immune cells infiltrated in each patient were acquired and the detailed abundant information was documented in the supplementary material S1. In addition, we compared the fraction of each immunocytes between 2 TMB groups. As shown in Fig. 4, the high-TMB group held lower infiltrating level of resting CD4+ memory T cells (P = 0.042), regulatory T cells (P < 0.001), resting dendritic cells(P = 0.043), resting mast cells(P = 0.001), but more activated CD4+ memory T cells (P < 0.001), follicular helper T cells (P = 0.012), M1 macrophages (P = 0.001), and activated mast cells (P = 0.040). These findings suggested that patients with high TMB had higher level of microenvironment infiltration.

Fig. 4
figure 4

Levels of tumor-infiltrating immune cells in high- and low-TMB groups. Abundance of 22 immune cells in 2 TMB groups. High-TMB group showing higher level of activated CD4+ memory T cells, follicular helper T cells, resting NK cells, M1 macrophages, activated mast cells, and neutrophils, and lower level of resting CD4+ memory T cells, regulatory T cells, resting dendritic cells, resting mast cells

Association between TMB and clinical pathological parameters as well as prognosis

Integrating the mutation data of 1152 samples and the clinical data of 1171 samples, the influence of each clinicopathological parameter of 993 samples on the distribution of TMB was analyzed, details shown in Table 1. As shown in Fig. 5, the elderly had a higher level of TMB (Fig. 5A), and the later the N, M, and AJCC stages are, the higher TMB level went (Fig. 5E-G). In addition, primary site of gastrointestinal tumors also affected the size of TMB. Gastric adenocarcinoma had the largest TMB span, which was also a challenge to the predictive ability of the DNN model (Fig. 5H). Survival analysis showed that the overall survival of the high-TMB group was significantly better than that of the low-TMB group (Fig. 5I). However, gender, grade, and T stage displayed no significant impact on the distribution of TMB (Fig. 5B-D). Generally, TMB of gastrointestinal tumors correlated with multiple clinical characteristics, and the general prognosis of patients with high TMB might be better.

Table 1 Clinical characteristics of 993 TCGA samples with gastrointestinal cancers
Fig. 5
figure 5

Relevance of TMB with clinical characteristics and prognosis. Wilcoxon rank-sum test for 993 samples by age, gender, grade, M stage (A-C, F) and Kruskal Wallis test for all samples by T stage, N stage, AJCC stage as well as primary site of gastrointestinal cancers (D-E, G-H). I Kaplan–Meier survival analysis exhibiting prognostic value of TMB

Genes were determined as feature values

With Pearson correlation coefficient greater than 0.28 set as the condition, 81 genes were obtained. Using Lasso and five-fold cross-validation, 31 genes were screened out. As shown in Fig. 6A, 31 characteristic values were AADACL4, AC092811.1, AC108515.1, AC138811.2, AL160236.2, ATPAF2, CXXC1, GON7, ISOC1, LETM1, MCUB, MFAP1, MIR6744, MRM3, MTHFD2, NARS, NDUFA6, NDUFAF1, P4HA1, PHF23, PTMAP4, RF00017, RPARP.AS1, RPL22L1, SAR1A, SCO1, SFXN1, TIMM21, TIMM22, TNFSF9, and ZNF232, respectively. The relevance coefficient between feature values nd TMB ranged from 0.28 to 0.44. Linear dependance among inner 31 genes were generally low, but some of them still showed strong correlation, which proved the necessity of dimensionality reduction in model construction. Besides, to further confirm the impact of 31 genes on TMB distribution, we divided 996 samples into 31 pairs according to median of mRNA expression, and then compared the TMB level of each pair. Additionally, coefficients of the 31 features were exhibited in Fig. 6B, from which we can conclude that AC108515.1 plays the biggest part and TIMM21 counts the least. The DNN model functions as a “black box” with low interpretability, but the result of lasso regression provides a comprehensive estimation of the significance of each feature value before modeling. As shown in Fig. 6C, statistical difference was found in all of the pairs, suggesting close relationship between TMB and these feature values.

Fig. 6
figure 6

31 genes screened as feature values of deep neural network. (A) Heatmap of relevance showing intrinsic linear connection of 31 genes and association of their transcriptional level with TMB. (B) Coefficients of 31 features through lasso regression. The first 6 of them with coefficient over 0.1 were AC108515.1, MIR6744, AC092811.0, AC138811.2, MCUB and AACADL4, with TIMM21 exerting the least impact. (C) Bar plot of TMB in 2 transcriptional group in which samples were split by median of mRNA. *P < 0.05, **P < 0.01, ***P < 0.001

DNN Model optimized with genetic algorithm predicts TMB in a high accuracy

In this study, we employed a deep neural network to construct a TMB prediction model, and applied genetic algorithm to optimize the hyper-parameters of the network. And finally calculated the Pearson correlation coefficient between the predicted value and the actual value and the determination coefficient(R2) for model evaluation. As shown in Fig. 7A, 31 feature values were mapped into 10 dimensions as the input layer after PCA, during which key information of input variables was collected and redundant part was excluded to avoid overfitting. After optimization by genetic algorithm, two hidden layers had 57 and 17 neurons respectively. Figure 7B explained the process of network hyper-parameters optimization. Firstly, we set the range of hyper-parameters in Table 2, and then an individual made up of a group of hyper-parameters participated in the population initialization. Secondly, these individuals were imported to the DNN model, following the model training process, with Pearson coefficient r set as fitness function. The optimization process continued through repeated selection, crossover, and mutation operations until the maximum number of iterations is reached. Ultimately, the best individual identified throughout the optimization process is extracted as the final hyperparameter configuration for the model, and the optimized DNN model is evaluated using the testing set data. Details and optimized results were shown in Fig. 7C. With 30 iterations set as terminal, each generation contained 20 individuals (hyper-parameters combinations). After each iteration, the individual fitness was output, that is, r between the predicted value and corresponding actual value obtained after validation set was inputted into the network. By the 30th generation, we found that the fitness level was stable above 0.8, with median of about 0.84, and the difference in fitness of individuals between the first and last generation was non-negligible, suggesting that genetic algorithm had obvious optimization effect on hyper-parameters and the great impact of hyper-parameters on the DNN model as well. R2 of three datasets reached 0.95, 0.82 and 0.7, suggesting that the model had a great goodness of fit to data fluctuation, as displayed in Fig. 7D. Besides, Pearson relevance analysis demonstrated that the predicted values of the training set, validation set, and testing set fitted the actual value accurately, especially that of the testing set. R of the three datasets were 0.98, 0.82 and 0.92, as shown in Fig. 7E. These findings revealed that, with transcriptome data as input, a DNN model optimized by genetic algorithms could accurately predict numerical TMB of four gastrointestinal tumors.

Fig. 7
figure 7

Deep neural network optimized with genetic algorithm performed excellently in predicting TMB of gastrointestinal cancers. A Structure of the deep neural network (DNN) with one input layer of 10 features after PCA, two hidden layers of 57 and 17 neurons, one output layer of TMB. B Flowchart of genetic algorithm during DNN optimization. C Record of optimization after evolution of 30 generations. Fitness referring to Pearson relevance coefficient, r, between actual value and predicted value in validation set. D Bar plot of TMB showing ability of the DNN model to fit the fluctuation of the data. Bars in blue and orange refer to actual values and predicted values, respectively. E Assessment of DNN model with Pearson relevance analysis in three datasets. Black circles, blue squares and red triangles stand for samples in training set, validation set and testing set, with relevance coefficient of 0.98, 0.82 and 0.92, respectively

Table 2 Hyper-parameters optimized with genetic algorithm

Discussion

Synthesizing somatic mutation data of 996 patients with gastrointestinal tumors, we found that SNP was the most common alteration, of which C > T base substitutions played the biggest part. Additionally, TP53, TTN, APC, MUC16, SYNE1, KRAS, LRP1B, PIK3CA, FAT4, FLG were top-10 genes with the highest mutation rate. Similar to prior reports,TP53, TTN, MUC16 and SYNE1 were all highly mutated genes in esophageal cancer, gastric cancer, and colon adenocarcinoma [21,22,23], indicating that these 4 genes were in the body of somatic mutation of gastrointestinal tumors. GO analysis and KEGG functional annotation did not reflect enrichment of immune-related BP/CC/MF or pathways. However, GSEA indicated that patients with high TMB showed manifold activation of CD4 + and CD8 + T cells, and the activation of TP53 pathway, nucleotide excision repair pathway and RNA silencing pathway were also manifested. These findings revealed that high-TMB state might suggest a stronger T cells immune response and activation of multiple tumor suppressor pathways, but the activation of these pathways requires further convincing experimental verification.

Consistent with the results of GSEA, ssGSEA demonstrated that people with high TMB had larger abundance of activated CD4+ T lymphocytes, follicular helper T lymphocytes, M1 macrophages and neutrophils, but less resting CD4+ T cells and regulatory T cells. An analysis of large-scale data from GEO reported that patients with high resting CD4+ T cells had a lower overall survival rate, while those with high activated CD4+ T lymphocytes did the opposite [24]. Follicular helper T cells are derived from naive CD4+ T cells stimulated by inflammation. It has been documented that follicular helper T cells may participate in anti-tumor immunity in non-small cell lung cancer, and high follicular helper T cells infiltration is associated with better clinical outcome [25]. However, in breast cancer, CD4+ follicular helper T lymphocytes serve as a prognostic risk factor and can be viewed as a biomarker of chemotherapy [26]. The latest research has asserted that Treg infiltration correlates to progression of gastric cancer, and those with reduced Treg are more likely to achieve pathological complete remission and longer overall survival time during neoadjuvant chemotherapy [27, 28]. In addition, stable/progressive metastasizing foci of colon cancer has been founded mainly infiltrated by PD1+ T cells and Treg. The abundance of Treg in the lesions before neoadjuvant radiotherapy has been proved to negatively correlated with general prognosis. Compared with radiotherapy alone, radiotherapy combined with regulatory T cell suppression significantly improves the anti-tumor effect [29]. Consistent with researches mentioned above, in mouse inflammatory bowel disease-related colorectal cancer, regulatory T cells co-expressing the Th17-related transcription factor (RORγt) secreted by FoxP3+ Treg can regulate expression of IL16 in dendritic cells to maintain tumor growth [30]. Macrophages usually polarizes into M1 and M2 types. M1 Macrophages play an anti-inflammatory role, but M2 Macrophages exert a tumor immunosuppressive impact. Previous study has demonstrated that the injection of mimic exosomes derived from M1 macrophages into tumor-bearing mice induced the repolarization of macrophages from M2 to M1, and the combination of mimic exosomes and PD-L1 inhibitors enhanced anti-tumor effect of ICIs [31]. Bioinformatical data mining had showed that the abundance of M1 macrophages was negatively associated with T stage, and that M1 macrophage culture medium inhibited migration and invasion of esophageal cancer cells [32]. These findings indicate that patients with high TMB hold larger abundance of activated immune cells, thereby exerting a stronger anti-tumor effect in gastrointestinal cancers. The specific mechanism is not clear, or it may be stimulated by neoantigens brought by high-TMB state, which needs in-depth investigation.

Besides, TMB and common several clinical pathological parameters were also statistically connected, showing clinical significance of TMB in gastrointestinal carcinomas. We found that age, lymph node metastasis, distant metastasis, AJCC stage, and primary site of gastrointestinal tumors all influenced the distribution of TMB. The elderly had higher TMB level, and it was also positively correlated with N stage, M stage, and AJCC stage. Median and scale of TMB varies with primary sites of tumor. TMB of STAD showed the largest fluctuation range and thus added challenge to model prediction, exerting the greatest impact on R2 of the DNN. Survival analysis manifested positive association between high TMB and overall survival. As for clinical significance of TMB, it can be elaborated from 3 aspects in light of previous research, including efficacy biomarker of ICI treatment, stratification of patients undergoing neoadjuvant chemotherapy or targeted therapy, and prognosis prediction. Researchers find that TMB-high is associated with better relapse-free survival in those with colorectal cancer undergoing curative surgery followed by adjuvant fluoropyrimidine and oxaliplatin chemotherapy [33]. Besides, TMB-high GC had favorable OS and DFS, but those following postoperative chemotherapy or radiochemotherapy did the opposite. Li et al. uncovered the role of TMB in stage II/III MSS cohort, which served as an early indicator of response to neoadjuvant chemotherapy [34]. In HER2-positive advanced GCs, status of TMB could be a novel biomarker in predicting the efficacy of trastuzumab plus chemotherapy, for the TMB-H group reached higher complete response and objective responsive rate after first-line treatment using trastuzumab plus chemotherapy [35]. However, Liu et al. found that GC patients with combination of Fusobacterium nucleatum infection and high TMB tend to live a shorter OS, indicating that prognostic role of TMB can be different for specific cohort, and should be analyzed individually [36]. In addition, previous research has found that patients with high TMB in esophageal cancer had a worse prognosis, whether receiving radiotherapy or not, thus more should be discussed on this issue [23].

Artificial intelligence technology, including machine learning and deep learning, has penetrated into various fields of medicine. And models with medical-related data as input has been used to help assist diagnosis or treatment and prognostic evaluation. Deep learning, here referring to the application of DNN, has been applied for TMB dichotomous prediction with miRNA and LncRNA data as feature values in gastric adenocarcinoma and head and neck squamous cell carcinoma. Feature values ​are usually extracted from differentially expressed miRNA and LncRNA in two TMB groups [17, 18]. However, DEGs calculated among the 2 TMB groups display little sufficient transparency, as we can see in the heatmap, and thus we think DEGs here lack representative characteristics of TMB-high popularity. In this study, we innovatively performed Pearson relevance analysis between TMB and transcriptional data of 56,753 genes, with 0.28 preset as cutoff value, to initially screened 81 genes, and then Lasso and five-fold cross-validation were conducted to screen out 31 genes as final candidates according to the coefficient. Although the Lasso algorithm reduced the multicollinearity among features to a large extent, we still found that there was a strong correlation between some candidates. Therefore, during hyper-parameters optimization with genetic algorithm, we used PCA to reduce the information of 31 genes to 10 dimensions. The genetic algorithm is one commonly used method for network optimization in deep learning, with supercomputer served as hardware support and python as the programming basis. The validation set participated in the optimization process, and the Pearson relevance coefficient(r) between the predicted value and actual value is preset as the fitness of individuals (hyper-parameter combinations) in each iteration. Although the FDA has set a cutoff value of 10 for TMB in head and neck squamous cell carcinoma, due to the lack of clinical trial data as a theoretical basis, thresholds of TMB for most tumors can be used for ICIs beneficiaries screening has not been determined yet. Therefore, we believe that the construction of a continuous prediction model of TMB is more practical. Thus, this is the first time that transcriptome data has been applied for model construction to predict the numerical TMB of four gastrointestinal cancers. What's more, Pearson r of the testing set reached 0.92, which greatly proved the generalization ability of the model, suggesting the probability to popularize the model in larger cohort. And R2 of testing set was 0.7, indicating that the DNN model performed excellent in mimicking the non-negligible fluctuation of numerical TMB especially in STAD and COAD. Whereas, the application of this model has certain limitations: First, this model exclusively included TCGA gastrointestinal tumor data as input. Does it have enough generalization ability when popularized in other patients with gastrointestinal tumors? For instance, patients in Asia or Africa? The study includes training set, validation set and testing set, which are comparable to training set, internal validation set and external validation set to some extent, for the validation set takes part in super-parameters optimization only rather than model construction and the testing set participates in nether model training nor optimization. However, it would be much more convincing if another cohort was added as external validation, especially that the ability of the DNN model can be validated in each cohort with each cancer type separately. To shorten the distance between the study and clinical utility of estimating TMB, there had better be broader studies with prospective cohorts. For one hand, samples diagnosed with four types of gastrointestinal should be collected and screened from those seeking doctors’ help in Northeast of China. Then, transcriptional data of 31 genes should be investigated and inputted into the model for predicted TMB of each patient. Finally, WGS, as golden criteria, can be applied to each sample for actual TMB value, and next following assessment of the utility ability of the ANN model. Second, four cancer types were involved in this research. Does this DNN model exhibit paralleled predictive capability when applied to gastrointestinal cancer patients with particular pathological subtypes? What if we broaden the application of the TMB predicting model to a lager cohort with all tumor types? We preliminarily estimate that construction of such a model requires larger sample data, stronger data processing ability, more powerful hardware equipment, and a more solid programming language foundation. And this is also the next direction of our team.

In general, comprehensive bioinformatical analysis of somatic mutation data preliminarily shed light into landscape of mutation in gastrointestinal cancers, association of TMB with tumor micro-environment infiltration and tumor suppressing pathways, and clinical significance of TMB. Most importantly, a DNN model for continuous TMB values prediction in a high accuracy was constructed here, and we believe that it will help a lot in TMB detection once applied to clinic.

Data availability

The datasets including transcriptional, clinical and somatic mutational data are all available using the following link: https://portal.gdc.cancer.gov/repository. All the information and codes involved in this study can be obtained from the correspondence author on sensible request.

References

  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209–49. https://doiorg.publicaciones.saludcastillayleon.es/10.3322/caac.21660.

    Article  CAS  PubMed  Google Scholar 

  2. Soerjomataram I, Bray F. Planning for tomorrow: global cancer incidence and the role of prevention 2020–2070. Nat Rev Clin Oncol. 2021;18:663–72. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41571-021-00514-z.

    Article  PubMed  Google Scholar 

  3. Zhang TC, Chen H, Yin XL, He QF, Man JY, Yang XR, et al. Changing trends of disease burden of gastric cancer in China from 1990 to 2019 and its predictions: Findings from Global Burden of Disease Study. Chin J Cancer Res. 2021;33(1):11–26. https://doiorg.publicaciones.saludcastillayleon.es/10.21147/j.issn.1000-9604.2021.01.02.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Thrift AP. Global burden and epidemiology of Barrett oesophagus and oesophageal cancer. Nat Rev Gastroenterol Hepatol. 2021;18(6):432–43. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41575-021-00419-3.

    Article  PubMed  Google Scholar 

  5. Yin J, Bai ZG, Zhang J, Zheng Z, Yao HG, Ye PP, et al. Burden of colorectal cancer in China, 1990–2017: Findings from the Global Burden of Disease Study 2017. Chin J Cancer Res. 2019;31(3):489–98. https://doiorg.publicaciones.saludcastillayleon.es/10.21147/j.issn.1000-9604.2019.03.11.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Robert C, Schachter J, Long GV, Arance A, Grob JJ, Mortier L, et al. Pembrolizumab versus Ipilimumab in Advanced Melanoma. N Engl J Med. 2015;372(26):2521–32. https://doiorg.publicaciones.saludcastillayleon.es/10.1056/NEJMoa1503093.

    Article  CAS  PubMed  Google Scholar 

  7. Sharma P, Siddiqui BA, Anandhan S, Yadav SS, Subudhi SK, Gao J, et al. The Next Decade of Immune Checkpoint Therapy. Cancer Discov. 2021;11(4):838–57. https://doiorg.publicaciones.saludcastillayleon.es/10.1158/2159-8290.CD-20-1680.

    Article  CAS  PubMed  Google Scholar 

  8. Song X, Qi W, Guo J, Sun L, Ding A, Zhao G, et al. Immune checkpoint inhibitor combination therapy for gastric cancer: Research progress. Oncol Lett. 2020;20(4):46–53. https://doiorg.publicaciones.saludcastillayleon.es/10.3892/ol.2020.11905.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Riaz N, Morris L, Havel JJ, Makarov V, Desrichard A, Chan TA. The role of neoantigens in response to immune checkpoint blockade. Int Immunol. 2016;28(8):411–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/intimm/dxw019.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Stockton JD, Tee L, Whalley C, James J, Dilworth M, Wheat R, et al. Complete response to neoadjuvant chemoradiotherapy in rectal cancer is associated with RAS/AKT mutations and high tumour mutational burden. Radiation oncology (London, England). 2021;16(1):129. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13014-021-01853-y.

    Article  CAS  PubMed  Google Scholar 

  11. Xiao J, Li WY, Huang Y, Huang ML, Li SS, Zhai XH, et al. A next-generation sequencing-based strategy combining microsatellite instability and tumor mutation burden for comprehensive molecular diagnosis of advanced colorectal cancer. BMC Cancer. 2021;21(1):282–93. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12885-021-07942-1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gong J, Robertson MD, Kim E, Fakih M, Schrock AB, Tam KW, et al. Efficacy of PD-1 Blockade in Refractory Microsatellite-Stable Colorectal Cancer With High Tumor Mutation Burden. Clin Colorectal Canc. 2019;18(4):307–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.clcc.2019.08.001.

    Article  Google Scholar 

  13. Wang DQ, Wang N, Li XQ, Chen XF, Shen B, Zhu DQ, et al. Tumor mutation burden as a biomarker in resected gastric cancer via its association with immune infiltration and hypoxia. Gastric Cancer. 2021;24(4):823–34. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10120-021-01175-8.

    Article  CAS  PubMed  Google Scholar 

  14. Wang F, Wei XL, Wang FH, Xu N, Shen L, Dai GH, et al. Safety, efficacy and tumor mutational burden as a biomarker of overall survival benefit in chemo-refractory gastric cancer treated with toripalimab, a PD-1 antibody in phase Ib/II clinical trial NCT02915432. Ann Oncol. 2019;30(9):1479–86. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/annonc/mdz197.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Li Z, Jia Y, Zhu H, Xing X, Pang F, Shan F, et al. Tumor mutation burden is correlated with response and prognosis in microsatellite-stable (MSS) gastric cancer patients undergoing neoadjuvant chemotherapy. Gastric cancer : official journal of the International Gastric Cancer Association and the Japanese Gastric Cancer Association. 2021;24(6):1342–54. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s10120-021-01207-3.

    Article  CAS  PubMed  Google Scholar 

  16. Wang ZJ, Duan JC, Cai SL, Han M, Dong H, Zhao J, et al. Assessment of Blood Tumor Mutational Burden as a Potential Biomarker for Immunotherapy in Patients With Non-Small Cell Lung Cancer With Use of a Next-Generation Sequencing Cancer Gene Panel. JAMA Oncol. 2019;5(5):696–702. https://doiorg.publicaciones.saludcastillayleon.es/10.1001/jamaoncol.2018.7098.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Yang DD, Yu JL, Han B, Sun Y, Mo S, Hu J. Long Non-coding RNA Expression Patterns in Stomach Adenocarcinoma Serve as an Indicator of Tumor Mutation Burden and Are Associated With Tumor-Infiltrating Lymphocytes and Microsatellite Instability. Front Cell Dev Biol. 2021;9:618313. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fcell.2021.618313.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Xia Y, Wang Q, Huang XL, Yin XH, Song JK, Ke Z, et al. miRNA-Based Feature Classifier Is Associated with Tumor Mutational Burden in Head and Neck Squamous Cell Carcinoma. Biomed Res Int. 2020;2020:1686480. https://doiorg.publicaciones.saludcastillayleon.es/10.1155/2020/1686480.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Joshi RP, Kruger AJ, Sha L, Kannan M, Khan AA, Stumpe M. Learning relevant H&E slide morphologies for prediction of colorectal cancer tumor mutation burden using weakly supervised deep learning. Journal of Clinical Oncology. 2020;8(15_suppl):e15244–e15244. https://doiorg.publicaciones.saludcastillayleon.es/10.1200/JCO.2020.38.15_suppl.e15244.

    Article  Google Scholar 

  20. Akoglu H. User’s guide to correlation coefficients. Turkish Journal of Emergency Medicine. 2018;18(3):91–3. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.tjem.2018.08.001.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Zhou ZJ, Xie X, Wang X, Zhang X, Li WX, Sun TH, et al. Correlations Between Tumor Mutation Burden and Immunocyte Infiltration and Their Prognostic Value in Colon Cancer. Front Genet. 2021;12:623424. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fgene.2021.623424.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Zhao DY, Sun XZ, Yao SK. Mining The Cancer Genome Atlas database for tumor mutation burden and its clinical implications in gastric cancer. World J Gastro Oncol. 2021;13(1):37–57. https://doiorg.publicaciones.saludcastillayleon.es/10.4251/wjgo.v13.i1.37.

    Article  Google Scholar 

  23. Yuan C, Xiang LY, Cao K, Zhang JG, Luo Y, Sun WJ, et al. The prognostic value of tumor mutational burden and immune cell infiltration in esophageal cancer patients with or without radiotherapy. Aging (Albany NY). 2020;12(5):4603–16. https://doiorg.publicaciones.saludcastillayleon.es/10.18632/aging.102917.

    Article  CAS  PubMed  Google Scholar 

  24. Ning ZK, Hu CG, Huang C, Liu J, Zhou TC, Zong Z. Molecular Subtypes and CD4(+) Memory T Cell-Based Signature Associated With Clinical Outcomes in Gastric Cancer. Front Oncol. 2020;10:626912. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fonc.2020.626912.

    Article  PubMed  Google Scholar 

  25. Ma QY, Huang DY, Zhang HJ, Chen J, Miller W, Chen XF. Function of follicular helper T cell is impaired and correlates with survival time in non-small cell lung cancer. Int Immunopharmacol. 2016;41:1–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.intimp.2016.10.014.

    Article  CAS  PubMed  Google Scholar 

  26. Gu-Trantien C, Loi S, Garaud S, Equeter C, Libin M, de Wind A, et al. CD4(+) follicular helper T cell infiltration predicts breast cancer survival. J Clin Invest. 2013;123(7):2873–92. https://doiorg.publicaciones.saludcastillayleon.es/10.1172/Jci67428.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li K, Chen FC, Xie HJ. Decreased FOXP3+ and GARP+ Tregs to neoadjuvant chemotherapy associated with favorable prognosis in advanced gastric cancer. Onco Targets Ther. 2016;9:3525–33. https://doiorg.publicaciones.saludcastillayleon.es/10.2147/OTT.S101884.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Nakanishi Y, Hirota S, Hojo Y, Nakamura T, Kumamoto T, Kurahashi Y, et al. Pathological Complete Remission of Liver Metastases Correlates With Elimination of Tumor-infiltrating Tregs in Gastric Cancer. Anticancer Res. 2021;41(3):1571–7. https://doiorg.publicaciones.saludcastillayleon.es/10.21873/anticanres.14917.

    Article  CAS  PubMed  Google Scholar 

  29. Ji DB, Song C, Li YH, Xia JH, Wu YJ, Jia JY, et al. Combination of radiotherapy and suppression of Tregs enhances abscopal antitumor effect and inhibits metastasis in rectal cancer. J Immunother Cancer. 2020;8(2):e000826. https://doiorg.publicaciones.saludcastillayleon.es/10.1136/jitc-2020-000826.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Rizzo A, Di Giovangiulio M, Stolfi C, Franzè E, Fehling HJ, Carsetti R, et al. RORγt-Expressing Tregs Drive the Growth of Colitis-Associated Colorectal Cancer by Controlling IL6 in Dendritic Cells. Cancer Immunol Res. 2018;6(9):1082–92. https://doiorg.publicaciones.saludcastillayleon.es/10.1158/2326-6066.Cir-17-0698.

    Article  CAS  PubMed  Google Scholar 

  31. Choo YW, Kang M, Kim HY, Han J, Kang S, Lee JR, et al. M1 Macrophage-Derived Nanovesicles Potentiate the Anticancer Efficacy of Immune Checkpoint Inhibitors. ACS Nano. 2018;12(9):8977–93. https://doiorg.publicaciones.saludcastillayleon.es/10.1021/acsnano.8b02446.

    Article  CAS  PubMed  Google Scholar 

  32. Jiang CH, Liang WH, Li FP, Xie YF, Yuan X, Zhang HJ, et al. Distribution and prognostic impact of M1 macrophage on esophageal squamous cell carcinoma. Carcinogenesis. 2021;42(4):537–45. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/carcin/bgaa128.

    Article  CAS  PubMed  Google Scholar 

  33. Lee DW, Han SW, Bae JM, Jang H, Han H, Kim H, et al. Tumor Mutation Burden and Prognosis in Patients with Colorectal Cancer Treated with Adjuvant Fluoropyrimidine and Oxaliplatin. Clinical cancer research : an official journal of the American Association for Cancer Research. 2019;25(20):6141–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1158/1078-0432.Ccr-19-1105.

    Article  CAS  PubMed  Google Scholar 

  34. Huang J, Koulaouzidis A, Marlicz W, Lok V, Chu C, Ngai CH, et al. Global Burden, Risk Factors, and Trends of Esophageal Cancer: An Analysis of Cancer Registries from 48 Countries. Cancers (Basel). 2021;13(1):141. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/cancers13010141.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Kim HR, Ahn S, Jo H, Kim H, Hong J, Lee J, et al. The Impact of Tumor Mutation Burden on the Effect of Frontline Trastuzumab Plus Chemotherapy in Human Epidermal Growth Factor Receptor 2-Positive Advanced Gastric Cancers. Front Oncol. 2021;11:792340. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fonc.2021.792340.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Hsieh YY, Kuo WL, Hsu WT, Tung SY, Li C. Fusobacterium Nucleatum-Induced Tumor Mutation Burden Predicts Poor Survival of Gastric Cancer Patients. Cancers. 2022;15(1):269. https://doiorg.publicaciones.saludcastillayleon.es/10.3390/cancers15010269.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We acknowledge our use of R, Python, and GSEA software. The results are based upon data derived from TCGA database. We appreciate it a lot for the platforms and the authors uploading their data.

Funding

No funding supported this research.

Author information

Authors and Affiliations

Authors

Contributions

Beibei Hu, Guohui Yin, Jialin Zhu, and Yi Bai participated in the study design. Beibei Hu performed most of the bioinformatical analysis and wrote the original graft. Guohui Yin completed all the work on model construction, optimization and evaluation. Xuren Sun gave significant advice on the final revision of the paper, and took charge in the whole study. All authors approved the final manuscript.

Corresponding author

Correspondence to Xuren Sun.

Ethics declarations

Ethics approval and consent to participate

Not applicable. Transcriptional, clinical and somatic mutational data are downloaded from public database, and there is no privacy leakage here.

Consent for publication

Not applicable. Clinical characteristics are obtained from TCGA, and no consent for publication is needed.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, B., Yin, G., Zhu, J. et al. Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers. BMC Med Inform Decis Mak 24, 384 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-024-02794-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-024-02794-8

Keywords