- Research
- Open access
- Published:
Construction of a Wilms tumor risk model based on machine learning and identification of cuproptosis-related clusters
BMC Medical Informatics and Decision Making volume 24, Article number: 325 (2024)
Abstract
Background
Cuproptosis, a recently identified type of programmed cell death triggered by copper, has mechanisms in Wilms tumor (WT) that are not yet fully understood. This research focuses on examining the link between WT and Cuproptosis-related genes (CRGs), with the goal of developing a predictive model for WT.
Methods
Four gene expression datasets related to WT were sourced from the GEO database. Subsequently, expression profiles of CRGs were extracted for differential analysis and immune infiltration studies. Utilizing 105 WT samples, clusters related to Cuproptosis were identified. This involved analyzing associated immune cell infiltration and conducting functional enrichment analysis. Disease-characteristic genes were pinpointed using weighted gene co-expression network analysis. Finally, the WT risk prediction model was constructed by four machine learning methods: random forest, support vector machine (SVM), generalized linear and extreme gradient strength model. The best-performing machine learning model was chosen, and a nomogram was created. The effectiveness of this predictive model was validated using methods such as the calibration curve, decision curve analysis, and by appiying it to the TARGET-GTEx dataset.
Results
Thirteen differentially expressed Cuproptosis-related genes were identified. The infiltration level of CD8 + T cells in WT children was lower than that in Normal tissue (NT) children, and the level of M0 infiltration of macrophages and T follicular helper cells was higher than that in NT children. In addition, two clusters of cuproptosis-related WT were identified. Enrichment analysis results indicated that genes in cluster 2 were primarily involved in cell division, nuclear division regulation, DNA biosynthesis process, ubiquitin-mediated proteolysis. The SVM model was judged to be the optimal model using 5 genes. Its accuracy was confirmed through a calibration curve and decision curve analysis, demonstrating satisfactory performance on the TARGET-GTEx validation dataset. Additional analysis revealed that these five genes exhibited high expression in both the TARGET-GTEx validation dataset and sequencing data.
Conclusion
This research established a link between WT and Cuproptosis. It developed a predictive model for assessing the risk of WT and pinpointed five key genes associated with the disease.
Introduction
Wilms tumor (WT) accounting for 95% of pediatric kidney cancers, is predominantly seen in 0–4-year-old and is the leading type of kidney tumor in children [1]. WT can occur in either unilateral or bilateral kidneys and can be sporadic or multifocal. About 10% of children with WT have associated congenital defect syndromes. WT with bilateral and multifocal tumors or congenital defect syndromes (median age at diagnosis two years) occurs earlier than sporadic WT (median age at diagnosis three years). At present, treatment methods for WT mainly include radiotherapy, chemotherapy, and surgery. Under the leadership of the International Society of Pediatric Oncology and the American Children’s Oncology Group Renal Tumor Committee, ongoing advancements in treatment have resulted in a 90% 5-year survival rate for low-risk WT in children, yet managing bilateral, recurrent, and high-risk WT remains a complex challenge [2, 3]. The diagnosis of WT relies primarily on clinical symptoms and imaging techniques; however, these diagnostic tools are used only to the extent to which the disease has progressed. Early and accurate diagnosis and timely treatment are key to improving the long-term survival of children with WT [4]. Hence, it is vital to delve deeper into identifying molecular subtypes of WT and formulating a corresponding risk assessment model.
Immunotherapy aims to enhance natural defense mechanisms to eliminate malignant cells and represents a significant breakthrough in cancer treatment. Immune infiltration in the tumor microenvironment has been proven to play a crucial role in tumor development, and immune cells are the foundation of immunotherapy. A comprehensive analysis of tumor-infiltrating cells will reveal mechanisms of immune escape, providing opportunities for developing immunotherapy [5]. To date, except for neuroblastoma, the outcomes of immunotherapy for pediatric solid tumors have been disappointing. The lack of understanding of the immune landscape in pediatric solid tumors might be a reason for the failure of immunotherapy [6]. Common genetic alterations and unique histological features in WT, along with the tumor microenvironment, suggest that the immune system may play a significant role in this disease. Currently, various immunotherapeutic approaches, including monoclonal antibodies, adoptive cell therapy, and immune checkpoint inhibitors, are applied to WT. Evidence suggests that immunotherapy can improve the prognosis of patients with WT [7], but it is only used in patients with WT who are undergoing clinical trials [6]. Therefore, immunotherapy in WT still requires further exploration, and a comprehensive analysis of the immune infiltration landscape in WT could help identify new immunotherapeutic targets.
Cuproptosis represents a novel type of cell death initiated in which copper ionophores transport excess copper ions into cells, and the copper ions bind to lipoylated modified proteins to oligomerize them, thus affecting the tricarboxylic acid cycle. Lipoylated proteins, crucial for metabolic enzymes that control flow into the tricarboxylic acid cycle, are key cellular components. Additionally, copper can impair Fe-S cluster proteins, resulting in proteotoxic stress and cellular demise [8]. Elevated concentrations of copper have been reported in tumor tissue or serum of patients with various cancers, and elevated serum copper levels are associated with tumor stage and disease progression in patients with colorectal, lung, and breast cancers [9].Furthermore, numerous studies link cuproptosis-related genes (CRGs) to the prognosis of various types of tumors. For example, CRGs may forecast the outcome for individuals clear cell renal cell carcinoma [10], pancreatic cancer [11], liver cancer [12], glioma [13], gastric cancer [14], etc. However, CRGs have rarely been reported in solid tumors in children. In neuroblastoma (NB), Wei et al. comprehensively analyzed the expression of CRGs in NB patients, identified prognosis-related cuproptosis-related subtypes, constructed a prognostic model for NB children, and examined the interplay between CRGs and the tumor’s surrounding environment. Finally, Real-Time PCR was utilized to confirm the expression of hazard genes in NB cell lines and proved that silencing the cuproptosis-related gene PDHA1 significantly inhibited the proliferation, migration and invasion of NB cells, and promoted cell cycle arrest in S phase and apoptosis. This offers a fresh potential target for identifying and treating NB patients [15]. Similarly, another study demonstrated that PDHA1 is involved in the cell cycle and proliferation-related pathways of NB cells and is associated with NB tumor staging and NK cell infiltration. In vitro, it promotes the proliferation, invasion, and lymph node metastasis of NB cells through the cell cycle pathway [16]. Furthermore, studies have shown that copper can regulate the expression of PD-L1 and affect tumor immune escape. PD-L1 is a cancer immune escape-related checkpoint, and the depletion of copper promotes the degradation of PD-L1. The use of the copper chelator TEPA can increase tumor-infiltrating immune cells, thereby reducing tumor growth and improving the survival rate of the NB mouse model [17]. Interestingly, PD-L1 is upregulated in 29% of primary WT and 35% of metastatic WT, and is associated with advanced WT, unfavorable histology, and disease progression. The upregulation of PD-L1 in WT indicates a poor prognosis [18]. In addition, the copper death-associated gene LIPT2 is significantly upregulated in WT, and high expression of LIPT2 is associated with a poor prognosis in high-risk WT [19], suggesting that CRGs are involved in the development and progression of WT. Another study found that Amine Oxidase Copper-containing 1 (AOC1) is a downstream target gene of WT1, and electrophoretic mobility transfer assays and chromatin immunoprecipitation confirmed that the binding of cis-regulatory elements in the WT1 protein and AOC1 promoter activates the transcription of AOC1 and jointly regulates the development of kidney [20]. WT1 is a key transcription factor in Wilms tumor, with WT1 mutations found in 10–15% of Wilms tumors, and WT1 mutations are associated with genetic syndromes of glomerular and reproductive tract dysplasia [21]. Therefore, cuproptosis-related genes may be involved in the occurrence of WT as downstream target genes.
This research represents the inaugural investigation into the differential expression of CRGs in WT, alongside an analysis of the variations in immune infiltration landscapes. Two Cuproptosis-associated subtypes were identified and subjected to related analyses of immune cell infiltration and functional enrichment. Subsequently, a WT risk prediction model based on 5 disease characteristic genes was constructed using four machine learning methods: Random Forest (RF), Support Vector Machine (SVM), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGB). Finally, the expression of the 5 disease characteristic genes was validated through sequencing data, providing new directions for the personalized diagnosis and treatment of WT in the future.
Materials and methods
Data source and preprocessing
Four microarray datasets were obtained from the Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) database as follows: GSE66405 dataset (GPL17077), which comprises 28 WTs and 4 NTs, GSE73209 (GPL10558), which comprises 32 WTs and 6 NTs, GSE2712 (GPL96), which comprises 18 WTs and 3 NTs, and GSE11024 (GPL6671), which comprises 27 WTs and 12 NTs. A Perl program was used to organize the matrix of each dataset, and the original matrix of each dataset was then entered in R for data correction and log2 (x + 1) transformation. The four datasets were combined, and the batch effect was removed using the surrogate variable analysis (SVA) package [22]. The output matrix was used as the training set for this study. We used the Therapeutically Applied Research to Generate Effective Treatment (TARGET) database (https://ocg.cancer.gov/programs/target) to download 124 cases of primary renal tumors and 6 cases normal tissues (NT) of RNA-seq data. Forty-five normal kidney tissues were sourced from the Genotype-Tissue Expression (GTEx) database (https://gtexportal.org/) as additional controls. Follwing log2 (x + 1) normalization of both TARGET and GTEx data, the batch effects were eliminated using the SVA R package, resulting in the formation of the TARGET–GTEx matrix, designated as the validation set for this study.
Identification of differentially expressed cuproptosis genes
A total of 19 CRGs were obtained from the work of Tsvetko et al. [8, 16]. The expression matrix of 19 CRGs was obtained by intersecting the CRGs with the training set matrix. The limma R package [23] was employed for differential expression analysis, with differentially expressed cuproptosis genes (DECRGs) being identified by the criteria of |logFC| > 1 and jad.p < 0.05. Visualization of the results was achieved using the heatmap and ggpubr R packages, through the creation of heat maps and box plots.
Correlation analysis of CRGs and infiltrating immune cells
Utilizing the GEO dataset, the relative abundance of immune cells in each sample was analyze through the CIBERSORT algorithm and LM22 feature matrix. The total proportion of 22 immune cells in each specimen was 1, and a p-value less than 0.05 was considered statistically significant. We examined the relationship between CRG expression and immune cell relative percentages using the Spearman correlation, considering a p-value below 0.05 as significant.
Unsupervised clustering of WT
After using the ConsensusClusterPlus R package [24] and 1,000 iterations of the k-means algorithm, 105 WT specimens were categorized into various clusters. We conducted a thorough review to pinpoint the ideal number of clusters, utilizing the cumulative distribution function (CDF) curve, consensus matrix, and consistent cluster score as our primary assessment tools.
Gene set variation analysis
Gene set variation analysis (GSVA) is a bioinformatics method used to estimate the variation of pathway and gene set activity over a sample population in an unsupervised manner [25]. We employed “GSEABase”, “GSVA”, and “limma” packages to conduct a GSVA enrichment analyze. between different subtypes of CRGs. We obtained c2.cp.kegg. symbols and c5.go. symbols from the Molecular Signature Database (http://www.gsea-msigdb.org/gsea/msigdb/). We chose the top 10 from both Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) for result visualization, considering a GSVA score t with an absolute value greater than 2 as significant.
Weighted gene co-expression network analysis (WGCNA)
Utilizing the “WGCNA” R package [26], we pinpointed the weighted co-expression module. Subsequent analysis focused on the quarter of genes exhibiting the highest variability. We chose an ideal soft-thresholding power to create a weighted adjacency matrix, which was then transformed into a topological overlap matrix (TOM). Utilizing the hierarchical clustering algorithm, we derived the module using the TOM dissimilarity measure (1-TOM), with the smallest allowable module size established as 100. Modules were each given a unique color, with their characteristic genes depicting the gene profiles. Module membership denotes the proximity between a gene and its module, while gene significance indicates how strongly a gene correlates with a clinical trait.
Constructing predictive models utilizing machine learning techniques
Four distinct machine learning techniques-RF, SVM, GLM, and XGB- were employed in developing the WT prediction model. In total, 105 WT samples were divided randomly into two groups: a training set comprising 70% and a validation set making up 30%. All models underwent evaluation with default settings and were assessed through five-fold cross-validation. The DALEX package facilitated an explanatory analysis of the four models, enabling the plotting of cumulative residual distribution and boxplot. The “pROC” package [27] was utilized for plotting the ROC curve. The top five genes of the optimal learning model were selected as relevant predictors of WT. The TARGET–GTEx dataset was ultimately employed to confirm the diagnostic efficacy of the model.
Development and validation of a nomogram diagram
The “rms” R package was utilized to develop a nomogram of predicting the occurrence of WT. Each gene had a corresponding score, and the scores of the five genes were added together to obtain a total. Decision curves and calibration curves were employed the predictive accuracy of the nomograms.
Independent verification analysis
The TARGET–GTEx matrix was utilized to determine the mRNA expression in WT and NT for the initial five genes identified by the optimal model. To further verify the expression of the five genes, we acquired WT and NT from three patients who underwent WT surgery at the Department of Pediatric Surgery, First Affiliated Hospital of Guangxi Medical University. All were confirmed by histopathology as WT, and RNA sequencing was performed using the NovaSeq 6000 platform (Illumina, USA). The workflow of this research is shown in Fig. 1.
Results
CRG expression and immune activation in WT
To investigate the role of CRGs in WT progression, we integrated datasets GSE66405, GSE73209, GSE2712, and GSE11024, eliminating the batch effect (Fig. 2). The expression of 19 CRGs was assessed in WT and NT using the merge matrix. LIPT2 and GCSH were not present in this dataset, leaving 17 CRGs. Thirteen CRGs were identified as DECRGs, including NFE2L2, SLC31A1, FDX1, LIAS, DLD, DLAT, PDHA1, PDHB, MTF1, GLS, CDKN2A, DBT, and DLST. The expression of these 13 DECRGs in WT was lower than in NT (Fig. 3a, b). We then performed a correlation analysis on these DECRGs and found that DTAL, DLD, PDHB, PDHA1, and DBT had obvious synergistic effects, while GLS, MTF1 and DLAT, DLD, PDHB had antagonistic effects (Fig. 3c). The circle diagram in Fig. 3d further illustrates the close relationship between these DECRGs.
To assess the immune system differences between WT and NT, we performed an immune infiltration analysis. The results indicated that CD8 T cell infiltration in WT children was lower, while T helper cells and M0 macrophages showed higher infiltration compared to NT children (Fig. 3e, f). Meanwhile, M0 macrophages were negatively correlated with PDHB, NFE2L2, GLS, and DBT, while NK cell activation was positively correlated with NFE2L2, LIAS, and DBT. T follicular helper cells exhibited the strongest negative correlation with GLS, while resting dendritic cells showed the strongest positive correlation with NFE2L2 (Fig. 3g).
CRGs expression and immune activation in WT. a, b: Expression of CRGs in WT and NT - heat map and box diagram; (Blue: low expression Red: high expression; * : p < 0.05; ** : p < 0.01; ***:p < 0.001); c, d: Correlation between CRGs - pie chart and circle chart; (Red: positive correlation Green: negative correlation); e: Relative abundance of 22 infiltrating immune cells in WT and NT; f: Difference of immune infiltration between WT and NT - box plot; g: Correlation analysis between CRGs and immune infiltrating cells. (Red: positive correlation Blue: negative correlation)
Identification of cuproptosis-related subtypes in WT
To discern CRG-associated expression patterns in WT, we classified 105 WT samples using 13 DECRGs. At k = 2, the cluster stability was highest (Fig. 4a), with the CDF curve showing minimal fluctuation in the 0.2–0.8 consistency index range (Fig. 4b). For k values from 2 to 9, the area under the CDF curve reflected the discrepancy between the two CDF curves (k and k-1) (Fig. 4c). At k = 2, the consistency score exceeded 0.9 for each cluster (Fig. 4d). Consequently, the 105 TWs were categorized into Cluster 1 (N = 59) and Cluster 2 (N = 46). PCA analysis revealed significant distinctions between these clusters (Fig. 4e).
CRG expression and immunological traits in WT clusters
To investigate the molecular differentiators between clusters, we assessed the expression of 13 CRGs in Clusters 1 and 2. Findings indicated elevated expressions of DLD, DLAT, and DLST in Cluster 1, with NFE2L2, MTF1, GLS, and DBT more pronounced in Cluster 2 (Fig. 5a, b). Immune infiltration findings revealed substantial presences of T follicular helper cells, resting NK cells, and M0 macrophages in Cluster 1. Conversely, Cluster 2 exhibited considerable proportions of gamma–delta T cells, activated NK cells, and resting dendritic cells (Fig. 5c, d).
GSVA analysis
To examine the functions of the different clusters in GO and KEGG, we conducted a GSVA enrichment analysis. GO analysis results indicated that Cluster 2 was predominantly exhibited enrichment in establishment of RNA localization, mitotic metaphase plate congression, cell division, regulation of nuclear division, DNA replication factor C complex, the DNA biosynthetic process, microtubule cytoskeleton organization involved in mitosis, DNA recombination, positive regulation of mitotic sister chromatid separation, and cell cycle and replication (Fig. 6a). KEGG findings revealed that Cluster 2 primarily enrichment in mismatch repair, RNA polymerase, oocyte meiosis, pyrimidine metabolism, homologous recombination, nucleotide excision repair, ubiquitin-mediated proteolysis, aminoacyl tRNA biosynthesis, and protein export and proteasome (Fig. 6b).
WGCNA analysis and gene module screening
To identify gene modules significant for WT, we applied the WGCNA algorithm to construct the co-expression network and module for both NT and WT. We selected the first quartile of genes exhibiting the greatest standard deviation for analysis. At an optimal soft power value of seven, we identified co-expressed gene modules (Fig. 7a), resulting in four distinct modules, each represented by a different color. The topological overlap matrix was illustrated in Fig. 7b–d. The turquoise module, containing 877 genes, showed a strong correlation with WT, evidenced by a 0.59 correlation coefficient (Fig. 7e, f).
Co-expression network of differentially expressed genes between WT and NT children. a: Determination of soft threshold power; b: Cluster tree plot of co-expression module; c: representative of cluster of modular characteristic genes; d: Correlation heat map between 4 modules; e: Correlation analysis of module characteristic genes and clinical traits. Each row represents a module, and each column represents a clinical trait; f: Scatterplot of turquoise module membership and WT gene significance
We also identified disease signature gene modules associated with cuproptosis using WGCNA, and when the soft threshold was seven, co-expressed gene modules were constructed (Fig. 8a), resulting in seven different modules (Fig. 8b–d). Among them, the turquoise module exhibited the highest correlation with WT, indicated by a 0.89 correlation coefficient. The turquoise modules genes were positively correlated with Cluster 2 (Fig. 8f).
Co-expression network of differentially expressed genes between two Cuproptosis clusters. a: Determination of soft threshold power; b: Cluster tree plot of co-expression module; c: representative of cluster of modular characteristic genes; d: Correlation heat map between 7 modules; e: Correlation analysis of module characteristic genes and clinical traits. Each row represents a module, and each column represents a clinical trait; f: Scatterplot of turquoise module membership and significance of cluster 2 gene
Building and assessing machine learning models
We intersected the genes from two turquoise modules to identify eight common genes (Fig. 9a). Then, four machine learning algorithms—RF, SVM, GLM, and XGB—were employed to develop a WT occurrence prediction model. The model’s discriminative capability was assessed using the ROC curve. The results indicated lower residuals for the SVM and XGB models (Fig. 9b, c). Figure 9d illustrated the area under the curve (AUC) of the ROC curve for each model (RF: AUC = 0.935, SVM: AUC = 0.963, XGB: AUC = 0.940, GLM: AUC = 0.899). Therefore, the SVM model emerged as the best prediction model for WT. The first five genes, including proline and serine-rich coiled-coil 1 (PSRC1), polycomb repressive complex 1 (PRC1), RAD51-assiociated protein 1 (RAD51AP1), tubulin β class I (TUBB), and myristoylated alanine-rich C-kinase substrate-like protein 1 (MARCKSL1), were selected as predictor genes for subsequent analysis.
To further assess the predictive efficiency of the SVM model, a nomogram was created (Fig. 10a) for estimating the risk of cuproptosis clustering in 105 children with WT. Calibration curves and decision curve analysis (DCA) were utilized to assess the nomograms’ predictive efficiency. The calibration curve indicated a minimal error between the actual WT clustering risk (Fig. 10b), while the DCA curve showed a significant deviation from all models’ curves (Fig. 10c). Subsequently, we validated the SVM prediction model based on five genes using the TARGET–GTEx dataset. The ROC curve revealed an AUC value of 0.847 for the predictive model, indicating satisfactory performance (Fig. 10d). In summary, the SVM diagnostic model effectively distinguished WT from NT.
In addition, we observed the expression of PSRC1, PRC1, RAD51AP1, TUBB, and MARCKSL1 in the TARGET–GTEx dataset. Interestingly, each of the five genes exhibited high expression in WT tissues (p < 0.05) (Fig. 11a–e). In our sequencing data, PSRC1, PRC1, RAD51AP1, and MARCKSL1 exhibited high expression in WT (p < 0.05), with TUBB also showing a trend towards high expression in WT (Fig. 11f–j).
Discussion
Due to the complex molecular characteristics of WT, the effectiveness of traditional treatment methods has not been satisfactory. To improve the survival rate and treatment outcomes for WT patients, researchers are actively investigating alternative treatment approaches, such as targeted molecular therapy and immunotherapy [28]. At present, there are many reports of immunotherapy and prognosis in other diseases. For example, Bo Hu et al. used Cox regression model and LASSO model to identify prognostic related genes for predicting glycolysis status of hepatocellular carcinoma based on TCGA transcriptome data, and identified seven genes to construct glycolysis-related risk scores, and finally verified the differential expression of protein levels between hepatocellular carcinoma and normal liver tissue by immunohistochemistry [29]. In addition, Mukherjee et al. identified the hub gene of early T precursor acute lymphoblastic leukemia (ETP-ALL) through public databases, and revealed the hub gene as a new indicator of lineage specification of incompletely differentiated ETP-ALL cells, developed a personalized lineage scoring algorithm, and promoted the development of new lineage-directed precision therapy [30]. In addition, by identifying differentially expressed lncRNA, miRNA and mRNA in breast cancer, the ceRNA regulatory network was further constructed, and 10 RNA-binding proteins were found to be related to the overall survival rate of breast cancer patients, which has important clinical applications in predicting the prognosis of invasive breast cancer [31]. Yu Zhou et al. used rheumatoid arthritis samples to analyze CRG and immune infiltration, identified the characteristic genes of CRG subtypes and constructed an optimal model, and finally verified the expression of predicted genes in animal experiments [32]. Therefore, it is important to identify the molecular subtypes of WT at the molecular level, comprehensively analyze the immune infiltration landscape of WT, and establish a risk prediction model for WT for the individualized treatment of WT. Cuproptosis involves copper accumulation in cells, causing mitochondrial esterified proteins to aggregate and Fe-S cluster proteins to degrade, leading to cell death. This process is closely linked to mitochondrial metabolism [33]. A study based on pan-cancer multi-omics and single-cell sequencing analysis found that cuproptosis is associated with an immunosuppressive tumor microenvironment. Researchers constructed a Cuproptosis Score (CS), where a higher CS score indicates higher expression of CRGs. The study found that in 26 types of tumor tissues, CS was significantly higher than in NT, and CS was associated with poor tumor prognosis, tumor invasion, DNA damage, and DNA repair. Furthermore, they discovered that CS has a significant positive correlation with mast cell quiescence, eosinophil, activated memory CD4 + T cells, and resting bone marrow dendritic cells, and a significant negative correlation with T cells, resting NK cells, M0 macrophages, T follicular helper cells, memory B cells, and plasma cells. In summary, this study revealed that cuproptosis is closely related to the tumor microenvironment [34]. Nonetheless, the precise mechanism of cuproptosis’s impact on WT remains uncertain. This study was the first to detect disparities in both expression and immunological characteristics of CRGs between WT and NT. We divided 105 WT samples into two subtypes associated with cuproptosis and examined the variance in CRGs and immune cell infiltration between these categories. The WGCNA algorithm was subsequently employed to pinpoint the characteristic genes of the disease and conduct functional enrichment analysis. Finally, the WT risk prediction model was constructed by RF, SVM, GLM, and XGB, and a nomogram was constructed. The calibration curve, decision curve, and TARGET–GTEx dataset were utilized to confirm the prediction model’s accuracy, offering a new direction for WT risk prediction and Immunotherapy.
This study conducted the first comprehensive analysis of CRGs expression in both WT and NT. Most of the known CRGs were abnormally expressed in WT and were underexpressed. Consistent with our findings, most CRGs were expressed at lower levels of in pan-cancer [34]. We speculated that these CRGs were critical in WT development. To further clarify the relationship between CRGs and WT, we performed a correlation analysis, revealing that CRGs were closely interconnected, exhibiting either mutual antagonism or synergy. Subsequently, our immune infiltration analysis indicated that children with WT exhibited low levels of CD8 + T cells infiltration. The infiltration level of T cells follicular helper and Macrophages M0 was high. In pediatric solid tumors, immune infiltration is mainly composed of innate immune cells, especially macrophages. Immunohistochemistry results showed that WT could be infiltrated by adaptive immune cells and innate immune cells, and the proportion of M2 macrophages increased significantly with tumor stage (I.-III) [18]. Gajewski et al. used public data to measure tumor inflammatory signature (TIS) scores in renal solid tumors and found that the WT samples had the lowest TIS scores relative to other pediatric and adult kidney tumors. Single cell sequencing data showed a reduced percentage of CD8 + T cells in 13 cases of WT compared to normal kidney tissue [35]. This result is consistent with our study. Another study including WT in renal cancer immunoinfiltration analysis showed that M0 macrophages are significantly increased in renal cancer samples compared to adjacent normal samples, consistent with our findings. However, in contrast to our study, CD8 + T cells were also significantly increased in renal cancer [36]. Additionally, researchers conducted a comprehensive analysis of the whole-genome genetics, transcriptomics, and tumor microenvironment landscape of diffuse anaplastic Wilms tumor (DAWT) and focal anaplastic Wilms tumor, identifying a DAWT subtype with global depletion of immune or stromal cells. This subtype exhibited reduced CD8 and CD3 cell infiltration, involved oncogenic pathways related to histone deacetylation and DNA repair activity, and was associated with poor clinical prognosis. They performed unsupervised clustering of the microenvironment cells, identifying two subtypes: iWT (with higher microenvironmental cell infiltration) and dWT (lacking immune cell infiltration), finding that most dWT cases have TP53 mutations and are associated with shorter overall survival and relapse-free survival. Further, differential expression analysis of the transcriptomes of dWT and iWT revealed negative enrichment of cytoplasmic DNA sensing and innate immune pathways in dWT. GSEA analysis showed significant downregulation of the P53 signaling pathway, while HDAC deacetylation of histones and DNA double-strand break repair pathways were activated. Significant transcriptomic similarities were observed in independent cohorts of TARGET and GEO [37]. Similarly, previous research found that CD8 + T cell infiltration can predict the clinical prognosis of WT; in recurrent WT, metastatic WT, and WT with shorter DFS, CD8 + T cell infiltration was significantly reduced, indicating a poor prognosis [38]. Therefore, differences in immune cell infiltration may be related to different WT types and transcriptomic changes.
Furthermore, we discerned two different cuproptosis-related clusters using unsupervised cluster analysis, based on CRG expression profiles in 105 WT cases. The immune infiltration results showed a higher abundance of resting NK cells in Cluster 1, while Cluster 2 displayed a greater the proportion of activated NK cells. Azzarone et al. established primary WT cells from tumors in four different children, with immunohistochemistry showing that NK cells can infiltrate the germplasm and epithelial WT components. The findings indicated that activated NK cells eliminated approximately 70% of both germplasm and epithelial WT cells. In addition, they used newly isolated resting NK cells to study interactions with WT primary cultures and found that resting NK cells did not kill WT cells [39]. Similarly, activated NK cells were demonstrated to be effective in killing WT cell in another study [40]. Therefore, we hypothesize that the activation of NK cells in Cluster 2 contributes to the killing of WT cells, resulting in a better prognosis. In the GSVA enrichment analysis, GO analysis revealed that Cluster 2 predominantly showed enrichment in the DNA replication factor C complex, the DNA biosynthetic process, DNA recombination, and cell cycle and replication. Mirroring this study’s findings, the enrichment analysis of TIS-related genes upregulated in WT was mainly enriched in DNA damage, cell cycle checkpoints, and DNA repair-related pathways, revealing that high expression of DNA repair genes is associated with a lack of tumor immune invasion [35, 37].
To identify disease characteristic genes, we performed WGCNA analysis and intersected the WT module characteristic gene with the characteristic gene related to cuproptosis, obtaining eight characteristic genes. Utilizing the expression profiles of eight characteristic genes, we developed a WT prediction model using four machine learning algorithms: RF, SVM, GLM, and XGB. The findings indicated that the SVM-based model was the most effective in predicting WT. Finally, the top five key genes (PSRC1, PRC1, RAD51AP1, TUBB, and MARCKSL1) were selected to construct the SVM model nomogram. PSRC1, also referred to as DDA3, functions as a microtubule-associated protein, governing chromosome division and separation through the regulation of the mitotic spindle [41], enhances cell growth by activating the β-catenin pathway and plays a role in the regulation of neurite formation and elongation [42]. DDA3 was expressed in mouse multi-tissues, including the kidneys. Overexpression of DDA3 inhibited colony formation of lung cancer cells [43]. Elevated PSRC1 expression was identified as an independent risk factor for poor prognosis in low-grade glioma patients. PSRC1 showed a positive correlation with the abundance of six immune cells: CD4 + T cells, CD8 + T cells, B cells, macrophages, neutrophils, and dendritic cells [44]. PRC1, composed of Polycomb group (PcG) proteins, forms a multi-component complex that regulates the transcription of its target genes. PcG proteins are crucial for epigenetic regulatory in variours biological processes, encompassing embryonic development and carcinogenesis [45]. This study revealed that PRC1 plays different roles in different cellular environments, both as a tumor suppressor and as an oncogene to promote cancer [46]. PRC1 is highly expressed in liver cancer. Inhibiting PRC1 suppressed the proliferation, migration, and invasion of liver cancer cells, serving as a prognostic marker of the disease [46]. RAD51AP1 is a DNA-binding protein that stimulates RAD51 activity and is essential for maintaining genomic stability. Elevated RAD51AP1 may affect the development of cancer. Studies have found that RAD51AP1 is expressed in a variety of cancers, including liver, breast, ovarian, etc., and is associated with poor prognosis [47]. TUBB, a gene coding for β-tubulin, exhibits widespread expression in the developing central nervous system and skin [48]. MARCKSL1, broadly expressed across all germ layer-derived tissues, modulates cytoskeletal actin dynamics and vesicle trafficking. It engages in the activation of various signaling pathways and contributes to the control of cell migration, proliferation, and differentiation [49]. MARCKSL1 enhances proliferation, migration, and invasion in lung adenocarcinoma [50] and is a diagnostic marker of metastatic colon cancer [51]. Interestingly, these five genes were highly expressed in both the TARGET–GTEx dataset and the sequencing data from WT tissue. Therefore, we speculate that these five genes have a certain significance in the occurrence and development of WT, but this conjecture requires confirmation through additional in vitro and in vivo experiments.
The tumor’s microenvironment changes dramatically with the specific level of the disease. Similarly, transcriptomic profiles can vary depending on the state of disease progression. Mukherjee et al. used in silico to describe the transcriptome changes in the transition from autoimmune liver disease to hepatocellular carcinoma, and found that the expression kinetics of transcription modules in autoimmune liver disease and hepatocellular carcinoma were mainly positively correlated with the progression of liver fibrosis, which provided a new direction for the prognosis and treatment of autoimmune-related liver malignancies [52]. However, all stages of Wilms tumor were included in this study to construct the model, and the manifestations of different stages of Wilms tumor were not studied in depth, so the applicability of the model constructed in this study may vary greatly in different stages, and the personalized application is limited, which is a disadvantage in this study. Secondly, the findings of this study, rooted in bioinformatics analysis, lacked corroborative in vitro and in vivo studies. Further experiment is needed to prove our results. Additionally, due to limited data from public datasets, the inclusion of this study was small and may have been biased. Therefore, we need more WT samples to prove our results. Finally, more detailed clinical features are required to test the predictive models’ performance.
Conclusion
In conclusion, the findings indicate a potential involvement of CRG in the onset and progression of WT. Additionally, we uncovered the link between CRGs and immune infiltration, highlighting the immune diversity across various WT clusters. The SVM risk prediction model, utilizing five genes, including PSRC1, PRC1, RAD51AP1, TUBB, and MARCKSL1, offers a novel approach and therapeutic target for the future WT diagnosis and treatment.
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
- WT:
-
Wilms tumor
- CRGs:
-
Cuproptosis-related genes
- NB:
-
Neuroblastoma
- GEO:
-
Gene Expression Omnibus
- SVA:
-
Surrogate variable analysis
- TARGTE:
-
Therapeutically Applied Research to Generate Effective Treatment
- NT:
-
Normal tissues
- GTEx:
-
Genotype-Tissue Expression
- DECRGs:
-
Differentially expressed cuproptosis genes
- CDF:
-
Cumulative distribution function
- GO:
-
Gene Ontology
- KEGG:
-
Kyoto Encyclopedia of Genes and Genomes
- TOM:
-
Topological overlap matrix
- RF:
-
Random Frest
- SVM:
-
Support Vector Machine
- GLM:
-
Generalized Linear Model
- XGB:
-
Extreme Gradient Boosting
- AUC:
-
Area under the curve
- PSRC1:
-
Proline and serine-rich coiled-coil 1
- PRC1:
-
Polycomb repressive complex 1
- RAD51AP1:
-
RAD51-assiociated protein 1
- TUBB:
-
Tubulin β class I
- MARCKSL1:
-
Myristoylated alanine-rich C-kinase substrate-like protein 1
- DCA:
-
Decision curve analysis
- TIS:
-
Tumor inflammatory signature
- PcG:
-
Polycomb group
- AOC1:
-
Amine Oxidase Copper-containing
- GSVA:
-
Gene set variation analysis
- WGCNA:
-
Weighted gene co-expression network analysis
- ETP-ALL:
-
Early T precursor acute lymphoblastic leukemia
- CS:
-
Cuproptosis Score
- DAWT:
-
Diffuse anaplastic Wilms Tumor
References
Cunningham ME, Klug TD, Nuchtern JG, Chintagumpala MM, Venkatramani R, Lubega J et al. Global disparities in Wilms Tumor. J Surg Res.2020;34–51.
Treger TD, Chowdhury T, Pritchard-Jones K. and Behjati S.The genetic changes of Wilms tumour. Nat Rev Nephrol.2019;240–51.
Spreafico F, Fernandez CV, Brok J, Nakata K, Vujanic G, Geller JI et al. Wilms Tumour Nat Rev Dis Primers.2021;75.
Jia ZK, Wang JX, Yang JJ, Xue R, Zhang D, Wang GN et al. Discovery and identification of serum biomarkers of Wilms’ tumor in mice using proteomics technology. Chin Med J (Engl).2012;1727–32.
Zhang Y, Zhang. Z.The history and advances in cancer immunotherapy: understanding the characteristics of tumor-infiltrating immune cells and their therapeutic implications. Cell Mol Immunol.2020;807–21.
Hong B, Dong. R.Research advances in the targeted therapy and immunotherapy of Wilms tumor: a narrative review. Transl Cancer Res.2021;1559–67.
Sanatkar SA, Heidari A, Arya S, Ghasemi M, Rezaei N. The Potential Role of Immunotherapy in Wilms’ Tumor: Opportunities and Challenges.Curr Pharm Des.2023;1617–27.
Tsvetkov P, Coy S, Petrova B, Dreishpoon M, Verma A, Abdusamad M et al. Copper induces cell death by targeting lipoylated TCA cycle proteins.Science.2022;1254–1261.
Chen L, Min J, Wang F. Copper homeostasis and cuproptosis in health and disease.Signal Transduct Target Ther.2022;378.
Zhang F, Lin J, Feng D, Liang J, Lu Y, Liu Z et al. Cuprotosis-related signature predicts overall survival in clear cell renal cell carcinoma. Front Cell Dev Biol.2022;922995.
Chi H, Peng G, Wang R, Yang F, Xie X, Zhang J et al. Cuprotosis programmed-cell-death-related lncRNA signature predicts prognosis and Immune Landscape in PAAD Patients.Cells.2022;.
Xiao J, Liu Z, Wang J, Zhang S, Zhang. Y.Identification of cuprotosis-mediated subtypes, the development of a prognosis model, and influence immune microenvironment in hepatocellular carcinoma.Front Oncol.2022;941211.
Wu S, Ballah AK, Che W, Wang XA. Novel cuprotosis-related lncRNA signature effectively predicts prognosis in Glioma patients. J Mol Neurosci.2023;185–204.
Han C, Zhang K, Mo. X.Construction of a cuprotosis-related gene-based model to improve the prognostic evaluation of patients with gastric Cancer. J Immunol Res.2022;8087622.
Tian XM, Xiang B, Yu YH, Li Q, Zhang ZX, Zhanghuang C et al. A novel cuproptosis-related subtypes and gene signature associates with immunophenotype and predicts prognosis accurately in neuroblastoma.Front Immunol.2022;999849.
Zhou R, Huang D, Fu W, Shu. F. Comprehensive exploration of the involvement of cuproptosis in tumorigenesis and progression of neuroblastoma. BMC Genomics. 2023;715.
Voli F, Valli E, Lerra L, Kimpton K, Saletta F, Giorgi FM et al. Intratumoral Copper modulates PD-L1 expression and influences Tumor Immune Evasion. Cancer Res.2020;4129–44.
Hont AB, Dumont B, Sutton KS, Anderson J, Kentsis A, Drost J et al. The tumor microenvironment and immune targeting therapy in pediatric renal tumors. Pediatr Blood Cancer.2023;e30110.
Wang W, Li S, Huang Y, Guo J, Sun L. and Sun G.Comprehensive analysis of the potential biological significance of cuproptosis-related gene LIPT2 in pan-cancer prognosis and immunotherapy.Sci Rep.2023;22910.
Kirschner KM, Braun JF, Jacobi CL, Rudigier LJ, Persson AB, Scholz. H.Amine oxidase copper-containing 1 (AOC1) is a downstream target gene of the Wilms tumor protein, WT1, during kidney development. J Biol Chem.2014;24452–62.
Torban E, Goodyer. P.Wilms’ tumor gene 1: lessons from the interface between kidney development and cancer.Am J Physiol Renal Physiol.2024;F3-f19.
Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey. JD.The sva package for removing batch effects and other unwanted variation in high-throughput experiments.Bioinformatics.2012;882–883.
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.2015;e47.
Wilkerson MD, Hayes. DN.ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking.Bioinformatics.2010;1572–1573.
Hänzelmann S, Castelo R, Guinney. J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013;7.
Langfelder P, Horvath. S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;559.
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinformatics. 2011;77.
Zhao X, Chu X, Song L, Tang. W. A novel model incorporating chromatin regulatory factors for risk stratification, prognosis prediction, and characterization of the microenvironment in Wilms tumor. J Gene Med. 2024;e3574.
Hu B, Qu C, Qi WJ, Liu CH, Xiu. DR. Development and verification of the glycolysis-associated and immune-related prognosis signature for hepatocellular carcinoma. Front Genet. 2022;955673.
Mukherjee S, Kar A, Paul P, Dey S, Biswas A, Barik SI. Silico Integration of Transcriptome and Interactome Predicts an ETP-ALL-Specific Transcriptional Footprint that Decodes its Developmental Propensity. Front Cell Dev Biol. 2022;899752.
Dong Y, Xiao Y, Shi Q, Jiang CD. lncRNA-miRNA-mRNA network reveals patient Survival-Associated modules and RNA binding proteins in invasive breast carcinoma. Front Genet. 2019;1284.
Zhou Y, Li X, Ng L, Zhao Q, Guo W, Hu J et al. Identification of copper death-associated molecular clusters and immunological profiles in rheumatoid arthritis. Front Immunol. 2023;1103509.
Tang D, Chen X, Kroemer G. Cuproptosis: a copper-triggered modality of mitochondrial cell death. Cell Res. 2022;417–8.
Qin Y, Liu Y, Xiang X, Long X, Chen Z, Huang X et al. Cuproptosis correlates with immunosuppressive tumor microenvironment based on pan-cancer multiomics and single-cell sequencing analysis. Mol Cancer. 2023;59.
Higgs EF, Bao R, Hatogai K, Gajewski. TF. Wilms tumor reveals DNA repair gene hyperexpression is linked to lack of tumor immune infiltration. J Immunother Cancer. 2022.
Chen L, Yin L, Qi Z, Li J, Wang X, Ma K et al. Gene expression-based immune infiltration analyses of renal cancer and their associations with survival outcome. BMC Cancer. 2021;595.
Su X, Lu X, Bazai SK, Dainese L, Verschuur A, Dumont B et al. Delineating the interplay between oncogenic pathways and immunity in anaplastic Wilms tumors. Nat Commun. 2023;7884.
Mardanpour K, Rahbar M, Mardanpour S, Mardanpour N, Rezaei. M. CD8 + T-cell lymphocytes infiltration predict clinical outcomes in Wilms’ tumor. Tumour Biol. 2020;1010428320975976.
Fiore PF, Vacca P, Tumino N, Besi F, Pelosi A, Munari E et al. Wilms’ Tumor Primary Cells Display Potent Immunoregulatory Properties on NK Cells and Macrophages. Cancers (Basel). 2021;.
Pelosi A, Fiore PF, Di Matteo S, Veneziani I, Caruana I, Ebert S, et al. Pediatric tumors-mediated Inhibitory Effect on NK cells. The Case of Neuroblastoma and Wilms’ Tumors.Cancers (Basel); 2021.
Jang CY, Coppinger JA, Yates JR 3. rd and Fang G.Phospho-regulation of DDA3 function in mitosis.Biochem. Biophys Res Commun. 2010;259–63.
Hsieh PC, Chiang ML, Chang JC, Yan YT, Wang FF, Chou. YC.DDA3 stabilizes microtubules and suppresses neurite formation. J Cell Sci. 2012;3402–11.
Lo PK, Chen JY, Lo WC, Chen BF, Hsin JP, Tang PP et al. Identification of a novel mouse p53 target gene DDA3.Oncogene.1999;7765–7774.
Liu Z, Liang W, Zhu Q, Cheng X, Qian R, Gao. Y.PSRC1 regulated by DNA methylation is a Novel Target for LGG Immunotherapy. J Mol Neurosci.2023;516–28.
Geng Z, Gao Z. Mammalian PRC1 complexes: compositional complexity and diverse molecular mechanisms. Int J Mol Sci.2020.
Liao S, Wang K, Zhang L, Shi G, Wang Z, Chen Z et al. PRC1 and RACGAP1 are diagnostic biomarkers of early HCC and PRC1 drives Self-Renewal of Liver Cancer Stem cells.Front Cell Dev Biol.2022;864051.
Pires E, Sung P, Wiese. C.Role of RAD51AP1 in homologous recombination DNA repair and carcinogenesis.DNA repair (Amst).2017;76–81.
Sferra A, Petrini S, Bellacchio E, Nicita F, Scibelli F, Dentici ML et al. TUBB variants underlying different phenotypes result in altered vesicle trafficking and Microtubule dynamics. Int J Mol Sci.2020.
El Amri M, Fitzgerald U, Schlosser G. MARCKS and MARCKS-like proteins in development and regeneration.J Biomed Sci.2018;43.
Liang W, Gao R, Yang M, Wang X, Cheng K, Shi X et al. MARCKSL1 promotes the proliferation, migration and invasion of lung adenocarcinoma cells. Oncol Lett.2020;2272–80.
Rong W, Shao S, Pu Y, Ji Q, Zhu. H.Circulating extracellular vesicle-derived MARCKSL1 is a potential diagnostic non-invasive biomarker in metastatic colorectal cancer patients. Sci Rep.2023;9957.
Mukherjee S, Kar A, Khatun N, Datta P, Biswas A, Barik S. Familiarity breeds strategy: in Silico Untangling of the Molecular Complexity on Course of Autoimmune Liver Disease-to-Hepatocellular Carcinoma Transition Predicts Novel Transcriptional Signatures.Cells.2021;.
Acknowledgements
The authors appreciate all researchers, patients, and affiliations involved in these studies, especially the Gene Expression Omnibus (GEO) Database, the Therapeutically Applied Research to Generate Effective Treatment (TARGET) database and the Genotype-Tissue Expression (GTEx) database.
Funding
This study was supported by Guangxi Natural Science Foundation Program (grant number 2024GXNSFAA999062, 2022GXNSFAA035641) and Innovation Project of Guangxi Graduate Education (grant number YCSW2023246) and Guangxi Zhuang Autonomous Region Health Committee Self-financing Scientific Research Project (grant number Z20200317).
Author information
Authors and Affiliations
Contributions
The study was conceived and designed by JC and JH. JH and XP analyzed the data. The manuscript was written by JH, JW and YZ. The manuscript was revised by YL, QX, PC and JC. The study was supervised by JC. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethical approval and consent to participate
The study had been approved by the Institutional Review Board of The First Affiliated Hospital of Guangxi Medical University.
Consent for publication
The study had been approved by the Institutional Review Board of The First Affiliated Hospital of Guangxi Medical University (approval number:2023-E734-01).
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Huang, J., Li, Y., Pan, X. et al. Construction of a Wilms tumor risk model based on machine learning and identification of cuproptosis-related clusters. BMC Med Inform Decis Mak 24, 325 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-024-02716-8
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-024-02716-8