Your privacy, your choice

We use essential cookies to make sure the site can function. We also use optional cookies for advertising, personalisation of content, usage analysis, and social media.

By accepting optional cookies, you consent to the processing of your personal data - including transfers to third parties. Some third parties are outside of the European Economic Area, with varying standards of data protection.

See our privacy policy for more information on the use of your personal data.

for further information and to change your choices.

Skip to main content

Identifying effective immune biomarkers in alopecia areata diagnosis based on machine learning methods

Abstract

Background

Alopecia areata (AA) is a common non-scarring hair loss disorder associated with autoimmune conditions. However, the pathobiology of AA is not well understood, and there is no targeted therapy available for AA. 

Methods

In this study, differential gene expression analysis, immune status assessment, weighted correlation network analysis (WGCNA), and functional enrichment analysis were performed to identify shared genes associated with both immunological response and AA. Machine learning methods were then used to identify three hub genes as potential diagnostic markers for AA. External validation was performed, and the correlation of hub genes with immune infiltration, immune checkpoint genes, and key marker genes and pathways were evaluated.

Results

Three hub genes were identified, which accurately predicted the progression of AA and the immune status. The hub genes were found to be diagnostic markers for AA with high predictive accuracy. External validation confirmed the efficacy of these markers in identifying AA patients.

Conclusion

Overall, the study provides a novel approach for the diagnosis, prevention, and treatment of AA. The findings could potentially lead to the development of targeted therapies for AA based on the identified hub genes. The study also highlights the potential of machine learning and bioinformatics analysis in identifying new biomarkers for autoimmune diseases.

Graphical Abstract

Peer Review reports

Introduction

Alopecia areata (AA) is a common non-scarring hair loss disorder that is believed to be associated with autoimmune [1]. AA is typified by well-defined patchy hair loss and has a prevalence of 0.1% to 0.2%, with a lifetime risk of 2% to 3%, as indicated by previous studies [2, 3]. Despite the potential for self-resolution, AA can progress to totalis or universalis, thereby causing significant physical and psychological distress to patients [4, 5]. Presently, there are no targeted therapies for AA, and optimal treatment plans remain under investigation. It is noteworthy that AA patients are generally diagnosed based on apparent clinical features, with early diagnostic and prognostic measures for AA being presently absent.

The potential mechanisms of AA are intricate, and immune is intricately linked to the entire process of alopecia areata [6]. AA is commonly considered an autoimmune disorder, where inflammation around hair follicles and changes in the immune environment cause self-directed attack and damage of hair follicles [7]. The pathobiology of AA has long been linked to CD8 T cells, which are directly involved in the infiltration of hair follicles [2, 8, 9]. Although CD8 T cells are indispensable for the development of AA, other immune-related cells also provide auxiliary functions, but their interplay requires further exploration [10,11,12]. The immunological mechanisms of AA remain incompletely understood, and there is a pressing need for a systematic approach to assess immune cell contributions and explore critical genes related to immune cells.

A significant heterogeneity exists in alopecia areata, similar to other autoimmune diseases [13]. Therefore, it is urgent to explore the hallmark genes closely related to the development of alopecia areata, in order to provide better options for its early diagnosis and treatment. With the development and widespread use of microarrays and high-throughput sequencing technologies, bioinformatics analysis can be used to identify new genes and biomarkers for many diseases, including autoimmune diseases [14,15,16]. Machine learning (ML) based on a series of complex algorithm processes has recently been used to identify biomarkers and predict various diseases [17,18,19]. Therefore, using machine learning to analyze multiple microarrays and screen out key genes for disease diagnosis may be a promising method to address this heterogeneity.

In this study, we performed differential analysis on the dataset using R and identified differentially expressed genes (DEGs). We assessed the immune status using MCPcounter, ESTIMATE, and CIBERSORT analyses on the dataset. Subsequently, we conducted weighted correlation network analysis (WGCNA) to identify shared genes that are associated with both immune and alopecia areata (AA), followed by functional enrichment analysis on these genes. We then employed machine learning methods, including the least absolute shrinkage and selection operator (LASSO), support vector machine-recursive feature elimination (SVM-RFE), and random forest analysis to identify three hub genes. We evaluated the predictive accuracy of these hub genes by constructing column line charts and ROC curves, and confirmed that they could serve as diagnostic markers for AA. Furthermore, we performed external validation on two additional datasets and assessed the correlation of the hub genes with immune infiltration and immune checkpoint genes. We also evaluated the correlation of the hub genes with key marker genes and pathways, with a focus on the CD8A gene and its association with AA. Finally, we analyzed normal samples, perilesional samples, and AA samples to confirm that these three genes not only serve as diagnostic markers for AA, but also accurately predict the progression of AA and the immune status. Our results provide a novel approach for the diagnosis, prevention, and treatment of AA.

Article types

Original research.

Materials and methods

Acquisition and variance analysis of AA datasets

In this study, we downloaded all datasets from the GEO database (http://www.ncbi.nlm.nih.gov/geo), including the training set GSE68801, test set GSE80342, and test set GSE45512. The RNA sample information of the GSE68801 dataset comprised 36 normal control samples, 26 nearly-lesional samples, and 60 AA patient samples. Similarly, the GSE80342 dataset included 3 normal samples and 12 AA patient samples, while the GSE45512 dataset included 5 normal samples and 5 AA patient groups. To convert the probe expression matrix to the gene expression matrix, we utilized the platform annotation file and applied the average value method to handle cases with multiple probes for a gene. Subsequently, we normalized the dataset using the "Limma" package and performed difference analysis. We identified differentially expressed genes (DEGs) using a significance level of p ≤ 0.05 and an absolute value of log-fold change (|logFC|) > 0.2 as the critical values. To facilitate the interpretation and visualization of gene expression data, we employed the "ggplot" and "pheatmap" packages, which are widely used in R for creating high-quality graphics and heatmaps, respectively.

Assessment of immune landscape and immune cell infiltration

To comprehensively assess the immune landscape of our dataset and compare immune scores and immune cell infiltration profiles under different conditions, three different algorithms were used. The ESTTIMER algorithm was used to derive a composite immune score, while the MCPcounter algorithm was used to quantify the absolute abundance of eight immune cells and two stromal cells from transcriptome data. Additionally, the CIBERSORT algorithm was applied to analyze the composition of infiltrating immune cells based on gene expression matrices. Furthermore, several R packages, including "corpplot", "vioplot", "ggplot2", and "glment", were used to visualize the results.

Weighted gene co-expression network analysis (WGCNA)

The WGCNA method was employed to identify gene modules with similar expression patterns and subsequently investigate the correlation between gene modules and specific features. We utilized the "WGCNA" package to perform WGCNA analysis on the data, identifying disease-related gene modules and immune-related gene modules. We further extracted the gene information from the respective modules for subsequent analysis.

Functional analysis

The enrichment analysis of Disease Ontology (DO), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) was carried out using the R package "clusterProfiler". DO enrichment analysis was used to annotate genes from the perspective of diseases, and the DO enrichment results could link sequencing results to clinical implications. KEGG enrichment analysis was commonly used to explore potential signaling pathways. GO enrichment analysis, which includes three categories: biological processes (BP), cellular components (CC), and molecular functions (MF), can aid in a better understanding of the biological functions of genes. Significant different pathways were filtered with thresholds of p-value < 0.05 and q-value < 0.05, and the results were visualized using the R packages "enrichplot" and "ggplot2".

Identification of key diagnostic hub genes using ML

Support vector machine recursive feature elimination (SVM-RFE) is a machine learning technique that can reduce feature sets and identify the most predictive features. Least absolute shrinkage and selection operator (LASSO) regression can calculate and select linear models while retaining valuable variables. Random forest analysis was used to rank genes, and genes with significance values greater than 0.5 were considered to be major influencing genes. The shared genes were analyzed separately using the above three methods, and the results were intersected to obtain the hub genes. The expression differences of these central genes between alopecia areata and normal individuals were shown using boxplots.

Diagnostic value of hub gene in alopecia areata

The alopecia areata prediction nomogram based on the hub gene was constructed by the "rms" package. The diagnostic value and prediction accuracy of nomogram and genes were respectively displayed through the ROC curve.

Gene set enrichment analysis (GSEA)

The functional enrichment of key genes between the normal group and the alopecia areata group was determined by GSEA (http://www.broadinstitute.org/gsea/index.jsp). The parameters were set to 1000 permutations, the maximum and minimum screened genomes were 500 and 15, respectively, P < 0.05 determined significant enrichment. Finally, the top 5 KEGG signaling pathways enriched from different risk groups were visualized.

Correlation analysis

Correlations of the identified hub genes with infiltrating immune cell levels, immune checkpoints, hallmark genes and hallmark pathways were explored by correlation analysis and the results were visualized using the "ggplot2" package.

Collection of clinical scalp samples and fluorescent staining analysis

Approval was obtained from the Medical Ethics Committee of Hangzhou Third People’s Hospital (No. 2022KA058), and informed consent was acquired from all participants. Scalp samples were collected from three patients with alopecia areata and three healthy controls. The alopecia areata patients exhibited hair loss covering 40–60% of the scalp area and had not received any treatment in the past six months.

After paraffin embedding, the scalp samples were sectioned. The sections underwent a standard deparaffinization process and antigen retrieval. Endogenous peroxidase was blocked using 3% hydrogen peroxide, followed by blocking with BSA. The sections were incubated overnight at 4°C with anti-CD8A antibody (66,868–1-Ig, Proteintech), anti-PIK3CG antibody (WL05069, Wanleibio), and anti-SKAP1 antibody (bs-13702R, Beijing Biosynthesis Biotechnology Co.,Ltd.). Afterward, they were treated with the corresponding secondary antibodies, and nuclei were stained with DAPI before mounting. Fluorescence was observed and recorded using a Nikon inverted fluorescence microscope (NIKON ECLIPSE CI-S). Additionally, semi-quantitative analysis of the fluorescence staining results was performed using ImageJ.

Statistical analysis

All data processing, statistical analysis and graphing in this study were performed in R software v4.1.2 and GraphPad Prism 5. The Wilcox test and test or T test were used to compare the differences between the two groups. P < 0.05 indicated that the difference was statistically significant.

Results

DEGs screening and enrichment analysis

The datasets was normalized and analyzed for differential expression by the "limma" package, using p ≤ 0.05 and |logFC|> 0.2 as the threshold. Selected DEGs were visualized by the "ggplot" package and "pheatmap" package (Fig. 1A). The results showed that there were 806 genes with statistically significant differences in expression, of which 502 genes were up-regulated and 304 genes were down-regulated (Fig. 1B). Enrichment analysis of these 806 genes showed a high enrichment of various inflammation-related and immune-related pathways, especially Chemokine signaling pathway, T cell receptor signaling pathway, Th1 and Th2 cell differentiation, primary immunodeficiency primary immunodeficiency, and JAK-STAT signaling pathway, indicating that inflammation is closely related to immune and AA (Fig. 1C, D).

Fig. 1
figure 1

Screening and enrichment analysis results of DEGs. A Heatmap of DEGs. B Volcano plot of DEGs in normal and AA patients. The threshold of screening DEGs is set at |fold change|≥ 0.2 and p (p.adjust) < 0.05. Green dots represent down-regulated genes and red dots represent up-regulated genes. C KEGG analysis results of DEGs. (D) GO analysis results of DEGs. * p < 0.05; ** p < 0.01; *** p < 0.001

Immune landscape in AA

The enrichment results of DEGs showed that AA was related to immune, so we combined a variety of immune algorithms to evaluate the differences in immune landscape between the control group and the AA group. The ESTIMATE results indicated that AA had a higher immune score compared to the control group (Fig. 2A). The MCPcounter results showed that most immune-related cells were significantly different between the normal and AA groups, especially T cells, NK cells, etc. (Fig. 2B). Then we assessed the proportion of 22 types of immune cells in each group by the CIBERSORT algorithm. The results showed that the proportion of 22 kinds of immune cells in each group was different, and there are correlations between most immune cells (Fig. 2C-E).

Fig. 2
figure 2

Immune landscape of AA based on multiple immune algorithms. A ESTIMATE results of normal and AA patients. B MCPcounter results of normal and AA patients. C Correlation analysis results of 22 immune cells between normal and AA patients based on CIBERSORT. D Distribution of 22 immune cells between normal and AA patients. E Heatmap of the proportions of 22 immune cells in normal and AA patients. * p < 0.05; ** p < 0.01; *** p < 0.001

AA and immune-related DEGs screening and functional enrichment

To screen the shared genes related to AA and immune, the relationship between DEGs and clinical features and the relationship between DEGs and immune environment were analyzed by WGCNA respectively, with 60 as the minimum gene set. The WGCNA results showed that there were finally 4 modules associated with the clinical features of AA, of which the blue module had the highest correlation with the pathogenesis of AA (Fig. 3A-C). The WGCNA results of DEGs and immune environment showed that there were 4 modules related to immune, of which the blue module had the highest correlation with immune (Fig. 3D-F).

Fig. 3
figure 3

WGCNA results for screening AA-related or immune-related DEGs. A Exploration of soft threshold in AA. B, C WGCNA results related to AA. D Exploration of soft threshold in immune. E, F WGCNA results related to immune. Blue color represents negative correlation and red color represents positive correlation. * p < 0.05; ** p < 0.01; *** p < 0.001

As a result, 136 shared genes were obtained by taking the intersection of AA-related blue modules and immune-related blue modules (Fig. 4A). We performed enrichment analysis on these 136 shared genes. The results showed that a variety of immune-related diseases were enriched in these shared genes, such as Human immunodeficiency virus infectious disease, primary immunodeficiency disease, acquired immunodeficiency syndrome, multiple sclerosis, etc. (Fig. 4B, C). KEGG results showed that a variety of AA-related immune pathways were highly enriched, especially Chemokine signaling pathway, T cell receptor signaling pathway, Cell adhesion molecules, Th1 and Th2 cell differentiation, Natural killer cell mediated cytotoxicity, Toll-like receptor signaling pathway, etc. (Fig. 4D–F). Go enrichment results showed that various biological processes such as differentiation and chemotaxis of T cells and leukocytes were enriched in BP, various immune complexes were enriched in CC, and various immune factor activities and receptor activities were enriched in MF (Fig. 4G-I).

Fig. 4
figure 4

Screening and enrichment analysis results of EDGs associated with AA and immune. A Venn analysis of the shared DEGs between AA-related DEGs and immune-related DEGs. Bar chart (B) and bubble chart (C) showing the DO enrichment results of shared genes. Circular chart (D), bar chart (E), and bubble chart (F) showing the KEGG enrichment results of shared genes. Circular chart (G), bar chart (H), and bubble chart (I) showing the GO enrichment results of shared genes

Hub gene screening by ML

To further identify hub genes associated with AA and immune, three ML analysis methods were applied to assess the importance of these 136 shared genes. First, we performed random forest analysis on 136 shared genes, with 0.5 as the importance value, and 21 genes were selected (Fig. 5A, B). Subsequently, we performed SVM-REF analysis on these 136 shared genes. Based on the top 40 genes ranked by importance, the results showed that the highest accuracy was achieved when the top 27 genes were truncated (Fig. 5C, D). We also performed lasso regression analysis on these 136 shared genes, and a total of 19 genes were shown to be strongly associated with AA and immune (Fig. 5E, F). Finally, we took the intersection of the above three screening results, and 3 hub genes were selected, SKAP1, PIK3CG and CD8A (Fig. 5G). These 3 genes will be considered as hub genes very relevant to AA and immune.

Fig. 5
figure 5

Feature biomarkers selection. A, B RandomForest (RF) analysis of shared genes, the filter condition for screening feature variates is set at: importance > 0.5. C, D SVM-REF analysis of shared genes. E, F LASSO analysis of shared genes. G Venn analysis of RF, SVM-REF and LASSO. The overlapping genes are considered as hub genes. * p < 0.05; ** p < 0.01; *** p < 0.001

Establishment of nomogram and diagnostic accuracy analysis

We analyzed the expression differences of these 3 hub genes in normal group and AA, and the results showed that all of the 3 hub genes were highly expressed in AA (Fig. 6A–C). A nomogram model was constructed based on the 3 hub genes, which could be used for the prediction of AA (Fig. 6D). The prediction accuracy of the nomogram and the hub genes were assessed by ROC curves. The results show that the nomogram our constructed has good prediction accuracy (ROC = 0,941, Fig. 6E), compared to single hub gene prediction (ROCCD8A = 0,934, ROCSKAP1 = 0.831, ROCPIK3CG = 0.883, Fig. 6F -H).

Fig. 6
figure 6

Establishment of nomogram and diagnostic accuracy analysis based on hub genes. The expression of CD8A (A), SKAP1 (B) and PIK3CG (C) in normal and AA groups. D Nomogram construction of hub genes. Diagnostic accuracy analysis results of nomogram (E), CD8A (F), SKAP1 (G) and PIK3CG (H). * p < 0.05; ** p < 0.01; *** p < 0.001

External validation and enrichment analysis of hub genes

We verify the results with two other datasets (GSE08342, GSE45512). Heatmaps of the 3 hub genes are shown in Fig. 7A, B. The ROC results show that our constructed nomograms have good prediction accuracy for both datasets (Fig. 7C, G). We also analyzed the expression differences of these three hub genes between the normal group and the AA group, and the results were consistent with those of the training group. All three hub genes were highly expressed in AA and had statistical differences (Fig. 7D-F, H-J).

Fig. 7
figure 7

A Heatmap plot shows the hub genes in test datasets GSE08342. B Heatmap plot shows the hub genes in test datasets GSE45512. C Diagnostic accuracy analysis results in test datasets GSE08342. The expression result of CD8A (D), PIK3CG (E) and SKAP1 (F) in test datasets GSE08342. G Diagnostic accuracy analysis results in test datasets GSE08342. The expression result of CD8A (H), PIK3CG (I) and SKAP1 (J) in test datasets GSE45512. * p < 0.05; ** p < 0.01; *** p < 0.001

Correlation of hub genes with immune cell, immune checkpoint genes, hallmark genes and hallmark pathways

To further determine the relationship between the 3 hub genes and immune, we performed multiple correlation analyses. The immune cell correlation results showed that the hub gene had obvious correlation with immune cells, especially Macrophages M1, Macrophages M2, Neutrophils, T cells gamma delta, etc. (Fig. 8A). Immune checkpoints also play a key role in preventing autoimmune, and it is worth noting that the hub gene was significantly associated with most immune checkpoints (Fig. 8B). Then we performed correlation analysis on the 3 hub genes with hallmark genes and hallmark pathways, and the results showed that various inflammatory immune pathways were different between the normal group and the AA group, especially TNFα signaling, WNT signaling, TGF-β signaling, IL6-JAK-STAT3 signaling, inflammatory response et al. (Fig. 8C, D).

Fig. 8
figure 8

Correlation analysis of hub genes with immune. Correlation heatmaps of hub genes with immune cell (A), immune checkpoint genes (B), hallmark genes (C) and hallmark pathways(D). * p < 0.05; ** p < 0.01; *** p < 0.001

CD8A may be the culprit in the pathogenesis of AA

Through nomogram and ROC analysis, we found that CD8A had the best specificity and sensitivity for the diagnosis of AA compared with the other two hub genes (ROCCD8A = 0,934, ROCSKAP1 = 0.831, ROCPIK3CG = 0.883), which indicated that CD8A may have played a decisive role in AA. Therefore, we divided the training set and the test set into CD8A high expression group and CD8A low expression group according to the median of CD8A expression, respectively. Notably, most or nearly all of CD8A high expression group were AA patients (Fig. 9A). We further evaluated the immune differences between the CD8A high and low expression groups, and the results showed that the CD8A high expression groups all had higher immune scores (Fig. 9B-D) and a higher degree of immune cell infiltration (Fig. 9E-G).

Fig. 9
figure 9

Effects of CD8A on AA pathogenesis. A Distribution of high CD8A expression in normal and AA patients. Immune score results of different CD8A expression groups in training set GSE68801 (B), test set GSE80342 (C) and test set GSE45512 (D). MCPcounter results of different CD8A expression groups in training set GSE68801 (E), test set GSE80342 (F) and test set GSE45512 (G). * p < 0.05; ** p < 0.01; *** p < 0.001

Hub genes can diagnose AA and is also associated with disease process

We re-analyzed the training set (GSE68801) and divided it into normal group (36 samples), near-lesional group (26 samples) and lesion group (60 samples). The expression of the 3 hub genes in these three groups was displayed by heatmap (Fig. 10A). The results indicated that the 3 hub genes had a linear relationship among the three groups. The expression of CD8A was the highest in the lesion group, followed by the near-lesional group, and the lowest in the normal group (Fig. 10B). This pattern also exists for PIK3CG (Fig. 10C). However, SKAP1 was not significantly different between the normal group and the near-lesional group, but was highly expressed and statistically different in the lesion group (Fig. 10D). The immune score also showed a linear relationship with some key immune cells in these three groups (Fig. 10E–H). The different results of the 3 hub genes in different groups indicate that they can not only diagnose AA, but also evaluate the progress and immune infiltration, which will help doctors in the early diagnosis and immune intervention of AA.

Fig. 10
figure 10

Effects of hub genes on disease process in AA. A Expression heatmap of hub gene in different lesions of AA. The expression of CD8A (B), PIK3CG (C) and SKAP1 (D) in different lesions of AA. E Immune score results of different lesions of AA. F T cell infiltration of different lesions of AA. G NK cell infiltration of different lesions of AA. H Neutrophils infiltration of different lesions of AA. N: normal; NL: near-lesional; L: lesional. * p < 0.05; ** p < 0.01; *** p < 0.001

Validation of hub gene expression in clinical scalp samples of alopecia areata

To further validate the accuracy and reliability of our constructed predictive model, we collected scalp tissue samples from AA patients and HC for immunofluorescence staining of hub genes. The results, illustrated in Figs. 11A-C, show that the expression levels of CD8A, PIK3CG, and SKAP1 in the scalp tissue of AA patients were significantly higher than those in the HC. Furthermore, localized magnification revealed a more pronounced expression around the hair follicles. Analysis using ImageJ software indicated that the differences in expression of CD8A, PIK3CG, and SKAP1 between AA patients and HC were statistically significant (Figs. 11 D-F, p < 0.05). These results suggest that the hub genes we identified play a critical role in AA and that our predictive model demonstrates good reliability.

Fig. 11
figure 11

Expression Differences of Hub Genesin clinical scalp samples of alopecia areata. A Expression of CD8A in alopecia areata samples and normal control samples (DAPI: blue; CD8A: red). B Expression of PIK3CG in alopecia areata samples and normal control samples (DAPI: blue; PIK3CG: yellow). C Expression of SKAP1 in alopecia areata samples and normal control samples (DAPI: blue; SKAP1: green). D-F Quantitative analysis of immunofluorescence images, n = 3. * p < 0.05; ** p < 0.01; *** p < 0.001

Discussion

AA is a complex autoimmune disease with pathological complexity and individual heterogeneity [20]. The immune response plays a crucial role in the pathogenesis of AA [21]. Studies have shown that changes in the immune environment lead to the attack and destruction of hair follicles, which is the root cause of AA [22]. However, the complex interactions of infiltrating immune cells still require further research [23]. Due to the heterogeneity and clinical characteristics of AA, effective biomarkers have not been established, particularly for early diagnosis and disease progression analysis. Therefore, there is an urgent need to explore hallmark genes closely related to the development of alopecia areata to provide better choices for early diagnosis and treatment. In this study, we comprehensively, objectively, and accurately explored immune changes in AA using bioinformatics methods and identified diagnostic markers for AA, providing a new approach for the diagnosis, prevention, and treatment of AA.

In this study, we performed differential analysis on the dataset, and a total of 806 DEGs were identified. Functional enrichment analysis revealed that inflammation and immune responses are closely associated with AA, particularly in terms of T cell activation, T cell differentiation, chemokine-mediated signaling pathway, immune receptor activity, chemokine activity, and chemokine receptor binding. MCPcounter results showed a higher immune score in the AA group. ESTIMATE and CIBERSORT were further used to investigate the differences in immune cell infiltration between AA and normal groups, with T cells, NK cells, monocytic lineage, and neutrophils highly infiltrated in AA. Based on the results of DEGs and CIBERSORT, we used WGCNA to identify both AA-related differential genes and immune-related genes, and 136 shared genes were discovered. Enrichment analysis showed that these 136 shared genes are related to many autoimmune diseases. Pathways related to immune responses such as chemokine signaling pathway, T cell receptor signaling pathway, cell adhesion molecules, Th1 and Th2 cell differentiation, natural killer cell-mediated cytotoxicity, and Toll-like receptor signaling pathway were enriched in KEGG. GO enrichment analysis revealed that inflammation and immune responses were also enriched. Next, three machine learning methods were employed to identify hub genes, and three hub genes (SKAP1, PIK3CG, and CD8A) were selected. The disease prediction heatmap was established using these three hub genes, and the ROC results showed good accuracy (AUC = 0.941). Furthermore, we validated our results using two additional datasets (GSE08342 and GSE45512), and all hub genes were highly expressed in AA compared to the normal group, which adds to the credibility of our results.

SKAP1 is a regulatory gene intimately associated with T cells and capable of regulating multiple functions within these cells [24]. It encodes an adaptor protein that activates TCR-mediated LFA-1 signaling [24], which promotes the interaction between T cells and antigen-presenting cells (APCs) [25]. SKAP1 can directly regulate serine and threonine kinases in T cells, thereby influencing T cell activation [26, 27]. SKAP1 also regulates PLK1 kinase activity, and the SKAP1-PLK1 complex is required for T cell cycling and cell expansion [26, 27]. In the pathological process of AA, T-cell-mediated immune responses are considered a key factor. Studies have shown that a significant infiltration of CD8 + effector memory T cells is present around the hair follicles of AA patients. These T cells may directly damage hair follicle structures by expressing cytotoxic molecules [28]. SKAP1 plays a central role in T cell activation, mainly by regulating integrin LFA-1 adhesion and migration, which are critical for T cell immune responses [29]. Dysregulation of SKAP1 may enhance T cell attacks on hair follicles [29]. By modulating T cell adhesion and activation, SKAP1 potentially influences the immune-mediated destruction of hair follicles, contributing to the onset of AA.

In our study, there is a close relationship between T cells and AA occurrence, and T cell activation may directly affect the progression of AA. Although SKAP1 has not been studied in immunosuppression in AA, we believe that SKAP1 can demonstrate T cell function and is a promising immunotherapeutic target for improving T cell function.

PIK3CG gene encodes the γ subunit of the PI3K family, and its involvement in processes such as cell growth, differentiation, and survival has been widely studied [30,31,32]. Research has demonstrated that the PI3K/Akt signaling pathway is critical for hair follicle development and regeneration, as it regulates the proliferation and apoptosis of follicular cells, which in turn impacts the hair growth cycle [33]. Thus, PIK3CG may directly influence the onset and progression of AA by modulating the PI3K/Akt pathway. In addition, PIK3CG acts as a switch that regulates immune responses. Through PI3Kγ signaling via Akt and mTOR, it inhibits NFκB activation while promoting C/EBPβ activation, thereby facilitating immune suppression. Selective inactivation of PIK3CG, however, prolongs NFκB activation and inhibits C/EBPβ, promoting an immune-stimulatory transcription program that restores CD8 + T cell activation and cytotoxicity [34]. Furthermore, PIK3CG plays a critical role in regulating innate immune functions and development in mice [35, 36], where its absence leads to reduced NK cell numbers and developmental defects [37]. PIK3CG also impacts T cell activation, which is crucial in the immune environment of AA, where both CD8 + T cells and NK cells are key players [38]. This may explain the elevated expression of PIK3CG observed in AA. Therefore, the abnormal activation of immune cells driven by PIK3CG may be a key factor in triggering the immune system's attack on hair follicles. Notably, NKG2D is a critical receptor on the surface of CD8 + T cells and NK cells, playing an important role in the immune mechanisms of AA [39]. NKG2D + cells, such as CD8 + T cells and NK cells, promote immune attacks on hair follicles by recognizing stress-induced ligands within the follicles [40]. Moreover, activated NKG2D + cells release cytokines like interferon-γ, further exacerbating the local inflammatory response, leading to hair follicle damage and hair loss [41, 42]. In our study, we also observed elevated NKG2D expression in AA samples (data not shown), which was consistent with the expression of hub genes. This suggests a potential link between the two, possibly in a synergistic relationship or as part of an upstream–downstream interaction. Overall, PIK3CG alterations leading to abnormal activation of T cells and NK cells, along with NKG2D receptor-mediated immune attacks, may jointly drive the onset and progression of AA.

CD8A encodes the CD8α chain of the CD8 protein dimer, which plays a crucial role in cell-mediated immune defense and T cell development [43, 44]. CD8a has been identified as a diagnostic and prognostic biomarker for various diseases, including cancer and inflammatory disorders [45,46,47]. CD8A also plays an indispensable role in AA, and in our study, it demonstrated superior specificity and sensitivity for diagnosing AA. In all three datasets, we observed that the group with high CD8A expression was predominantly composed of AA patients. The high CD8A expression group exhibited higher immune scores and immune cell infiltration, suggesting that CD8A is likely a central player in the pathogenesis of AA.

One of the longest-standing challenges in AA research has been to elucidate the pathogenesis and complex interplay of immune cells in AA [48]. Previous studies have suggested that CD8 T cells are crucial for AA development, but recent evidence suggests that they are not the sole drivers of the disease [9]. Therefore, in our study, we further explored the correlation between these three hub genes and immune cell infiltration as well as immune checkpoints. The results indicated that other cells such as NK cells and neutrophils also play important roles, possibly due to their ability to produce IFN-γ [49]. This suggests that the attack on hair follicles may be related to changes in the immune environment. Immune checkpoints, as regulators of the immune system, have been shown to be a significant factor in the pathogenesis of many diseases due to their abnormal expression and function [50]. We found a high correlation between hub genes and immune checkpoints, including CD27, CD28, CD40, and ICOS, which are associated with T cells. Immunological checkpoint analysis of AA can provide insights into its immunotherapy. We assessed the correlation between hub genes and the marker gene set using GSEA analysis, which revealed that hub genes were closely related to many inflammatory and immune signaling pathways and showed significant differences between the normal group and AA.

In general, our study has developed a model that accurately diagnoses and evaluates the immune response in AA. Further analysis of this model revealed that the expression of these three hub genes is also different among the normal group, perilesional group, and lesion group. Based on the differential expression of these three hub genes, not only can AA be diagnosed, but the progress of AA can also be evaluated, which can greatly assist doctors in the diagnosis and early intervention of AA.

Of course, this study also has some limitations. First, the clinical characteristic information of the research data is very limited and lacks more detailed information about the patients, such as medication information. Second, insufficient validation is a common type of limitation in bioinformatics research. Although we used data from two other test groups to validate the predictive accuracy, further validation through in vitro or in vivo studies will be the focus of our future work.

Conclusion

Our study identified SKAP1, PIK3CG, and CD8A as novel diagnostic biomarkers for AA patients. Moreover, they not only enable accurate diagnosis of patients but also enable analysis of the disease progression in AA patients and evaluation of the immune cell landscape, providing a new perspective on the early diagnosis of AA and the molecular mechanisms of immune cells.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Abbreviations

AA:

Alopecia areata

WGCNA:

Weighted correlation network analysis

ML:

Machine learning

DEGs:

Differentially expressed genes

MCPcounter:

Microenvironment cell populations-counter

LASSO:

Least absolute shrinkage and selection operator

SVM-RFE:

Support vector machine-recursive feature elimination

GEO:

Gene expression omnibus

FC:

Fold change

DO:

Disease ontology

GO:

Gene ontology

KEGG:

Kyoto encyclopedia of genes and genomes

BP:

Biological processes

CC:

Cellular components

MF:

Molecular functions

ROC:

Receiver operating characteristic

GSEA:

Gene set enrichment analysis

PIK3CG:

Phosphatidylinositol-4,5-bisphosphate 3-kinase Catalytic Subunit Gamma

SKAP1:

Src kinase-associated phosphoprotein 1

TNFα:

Tumor necrosis factor α

TGF-β:

Transforming growth factor-β

APCs:

Antigen-presenting cells

LFA-1:

Lymphocyte function-associated antigen-1

PI3K:

Phosphatidylinositol 3-kinase

mTOR:

Mammalian target of rapamycin

C/EBPβ:

CCAAT Enhancer Binding Protein beta

NFκB:

Nuclear factor kappa-B

NKG2D:

Natural killer cell group 2D

ICOS:

Inducible T-cell co-stimulator

References

  1. Simakou T, Butcher JP, Reid S, Henriquez FL. Alopecia areata: a multifactorial autoimmune condition. J Autoimmun. 2019;98:74–85. https://doi.org/10.1016/j.jaut.2018.12.001.

    Article  PubMed  CAS  Google Scholar 

  2. Gilhar A, Etzioni A, Paus R. Alopecia areata. N Engl J Med. 2012;366:1515–25. https://doi.org/10.1056/NEJMra1103442.

    Article  PubMed  CAS  Google Scholar 

  3. Villasante Fricke AC, Miteva M. Epidemiology and burden of alopecia areata: a systematic review. Clin Cosmet Investig Dermatol. 2015;8:397–403. https://doi.org/10.2147/CCID.S53985.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Kim JC, Lee ES, Choi JW. Impact of alopecia areata on psychiatric disorders: a retrospective cohort study. J Am Acad Dermatol. 2020;82:484–6. https://doi.org/10.1016/j.jaad.2019.06.1304.

    Article  PubMed  Google Scholar 

  5. Lintzeri DA, et al. Alopecia areata - Current understanding and management. J Dtsch Dermatol Ges. 2022;20:59–90. https://doi.org/10.1111/ddg.14689.

    Article  PubMed  Google Scholar 

  6. Bertolini M, McElwee K, Gilhar A, Bulfone-Paus S, Paus R. Hair follicle immune privilege and its collapse in alopecia areata. Exp Dermatol. 2020;29:703–25. https://doi.org/10.1111/exd.14155.

    Article  PubMed  CAS  Google Scholar 

  7. Bodemer C, et al. Role of cytotoxic T cells in chronic alopecia areata. J Invest Dermatol. 2000;114:112–6. https://doi.org/10.1046/j.1523-1747.2000.00828.x.

    Article  PubMed  CAS  Google Scholar 

  8. Xing L, et al. Alopecia areata is driven by cytotoxic T lymphocytes and is reversed by JAK inhibition. Nat Med. 2014;20:1043–9. https://doi.org/10.1038/nm.3645.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Trueb RM, Dias M. Alopecia areata: a comprehensive review of pathogenesis and management. Clin Rev Allergy Immunol. 2018;54:68–87. https://doi.org/10.1007/s12016-017-8620-9.

    Article  PubMed  CAS  Google Scholar 

  10. Ghraieb A, et al. iNKT cells ameliorate human autoimmunity: Lessons from alopecia areata. J Autoimmun. 2018;91:61–72. https://doi.org/10.1016/j.jaut.2018.04.001.

    Article  PubMed  CAS  Google Scholar 

  11. Paus R, Bulfone-Paus S, Bertolini M. Hair Follicle Immune Privilege Revisited: The Key to Alopecia Areata Management. J Investig Dermatol Symp Proc. 2018;19:S12–7. https://doi.org/10.1016/j.jisp.2017.10.014. Symposium proceedings.

    Article  PubMed  Google Scholar 

  12. Rajabi F, Drake LA, Senna MM, Rezaei N. Alopecia areata: a review of disease pathogenesis. Br J Dermatol. 2018;179:1033–48. https://doi.org/10.1111/bjd.16808.

    Article  PubMed  CAS  Google Scholar 

  13. Dudda-Subramanya R, Alexis AF, Siu K, Sinha AA. Alopecia areata: genetic complexity underlies clinical heterogeneity. Eur J Dermatol. 2007;17:367–74. https://doi.org/10.1684/ejd.2007.0231.

    Article  PubMed  CAS  Google Scholar 

  14. Shen-Orr SS, Gaujoux R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr Opin Immunol. 2013;25:571–8. https://doi.org/10.1016/j.coi.2013.09.015.

    Article  PubMed  CAS  Google Scholar 

  15. Qiu L, Liu X. Identification of key genes involved in myocardial infarction. Eur J Med Res. 2019;24:22. https://doi.org/10.1186/s40001-019-0381-x.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Zhao X, et al. Identification of key biomarkers and immune infiltration in systemic lupus erythematosus by integrated bioinformatics analysis. J Transl Med. 2021;19:35. https://doi.org/10.1186/s12967-020-02698-x.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Ding H, et al. In vivo analysis of mucosal lipids reveals histological disease activity in ulcerative colitis using endoscope-coupled Raman spectroscopy. Biomed Opt Express. 2017;8:3426–39. https://doi.org/10.1364/BOE.8.003426.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Gubatan J, et al. Artificial intelligence applications in inflammatory bowel disease: Emerging technologies and future directions. World J Gastroenterol. 2021;27:1920–35. https://doi.org/10.3748/wjg.v27.i17.1920.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Kraszewski S, Szczurek W, Szymczak J, Regula M, Neubauer K. Machine Learning Prediction Model for Inflammatory Bowel Disease Based on Laboratory Markers. Working Model in a Discovery Cohort Study. J Clin Med. 2021;10. https://doi.org/10.3390/jcm10204745.

  20. Hordinsky MK. Overview of alopecia areata. J Investig Dermatol. 2013;16:S13–15. https://doi.org/10.1038/jidsymp.2013.4. Symposium proceedings.

    Article  CAS  Google Scholar 

  21. Islam N, Leung PS, Huntley AC, Gershwin ME. The autoimmune basis of alopecia areata: a comprehensive review. Autoimmun Rev. 2015;14:81–9. https://doi.org/10.1016/j.autrev.2014.10.014.

    Article  PubMed  CAS  Google Scholar 

  22. Strazzulla LC, et al. Alopecia areata: disease characteristics, clinical evaluation, and new perspectives on pathogenesis. J Am Acad Dermatol. 2018;78:1–12. https://doi.org/10.1016/j.jaad.2017.04.1141.

    Article  PubMed  Google Scholar 

  23. Abou Rahal J, Kurban M, Kibbi AG, Abbas O. Plasmacytoid dendritic cells in alopecia areata: missing link? J Eur Acad Dermatol Venereol. 2016;30:119–23. https://doi.org/10.1111/jdv.12932.

    Article  PubMed  CAS  Google Scholar 

  24. Raab M, et al. T cell receptor “inside-out” pathway via signaling module SKAP1-RapL regulates T cell motility and interactions in lymph nodes. Immunity. 2010;32:541–56. https://doi.org/10.1016/j.immuni.2010.03.007.

    Article  PubMed  CAS  Google Scholar 

  25. Witte A. et al. D120 and K152 within the PH Domain of T Cell Adapter SKAP55 Regulate Plasma Membrane Targeting of SKAP55 and LFA-1 Affinity Modulation in Human T Lymphocytes. Molecular and cellular biology. 2017; 37. https://doi.org/10.1128/MCB.00509-16.

  26. Raab M, Strebhardt K, Rudd CE. Immune adaptor SKAP1 acts a scaffold for Polo-like kinase 1 (PLK1) for the optimal cell cycling of T-cells. Sci Rep. 2019;9:10462. https://doi.org/10.1038/s41598-019-45627-9.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Zhang Z, et al. Identification of important modules and biomarkers that are related to immune infiltration cells in severe burns based on weighted gene co-expression network analysis. Front Genet. 2022;13:908510. https://doi.org/10.3389/fgene.2022.908510.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Lee EY, et al. 065 Longitudinal analysis of T cell dynamics in alopecia areata at single-cell resolution. Journal of Investigative Dermatology. 2022. https://doi.org/10.1016/j.jid.2022.05.119.

  29. Liu C, et al. Multi-functional adaptor SKAP1: regulator of integrin activation, the stop-signal, and the proliferation of T cells. Front Immunol. 2023;14:1192838. https://doi.org/10.3389/fimmu.2023.1192838.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Hawkins PT, Stephens LR. PI3Kgamma is a key regulator of inflammatory responses and cardiovascular homeostasis. Science. 2007;318:64–6. https://doi.org/10.1126/science.1145420.

    Article  PubMed  CAS  Google Scholar 

  31. Han J, et al. Knockdown of lncRNA H19 restores chemo-sensitivity in paclitaxel-resistant triple-negative breast cancer through triggering apoptosis and regulating Akt signaling pathway. Toxicol Appl Pharmacol. 2018;359:55–61. https://doi.org/10.1016/j.taap.2018.09.018.

    Article  PubMed  CAS  Google Scholar 

  32. Chang J, et al. Targeting PIK3CG in combination with paclitaxel as a potential therapeutic regimen in claudin-low breast cancer. Cancer Manag Res. 2020;12:2641–51. https://doi.org/10.2147/CMAR.S250171.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Fang, Tingting, et al. Caizhixuan hair tonic regulates both apoptosis and the PI3K/Akt pathway to treat androgenetic alopecia. PloS One. 2023;18:e0282427. https://doi.org/10.1371/journal.pone.0282427.

  34. Kaneda, Megan M. et al. PI3Kγ is a molecular switch that controls immune suppression. Nature. 2016; 539:437–442. https://doi.org/10.1038/nature19834

  35. Tassi I, et al. p110gamma and p110delta phosphoinositide 3-kinase signaling pathways synergize to control development and functions of murine NK cells. Immunity. 2007;27:214–27. https://doi.org/10.1016/j.immuni.2007.07.014.

    Article  PubMed  CAS  Google Scholar 

  36. Kaneda MM, et al. PI3Kgamma is a molecular switch that controls immune suppression. Nature. 2016;539:437–42. https://doi.org/10.1038/nature19834.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Henter JI, et al. HLH-2004: diagnostic and therapeutic guidelines for hemophagocytic lymphohistiocytosis. Pediatr Blood Cancer. 2007;48:124–31. https://doi.org/10.1002/pbc.21039.

    Article  PubMed  Google Scholar 

  38. Thian M, et al. Germline biallelic PIK3CG mutations in a multifaceted immunodeficiency with immune dysregulation. Haematologica. 2020;105:e488. https://doi.org/10.3324/haematol.2019.231399.

    Article  PubMed  Google Scholar 

  39. Luzhou. et al. Alopecia areata is driven by cytotoxic T lymphocytes and is reversed by JAK inhibition. Nat Med. 2014;20:1043–9. https://doi.org/10.1038/nm.3645.

  40. Fukuyama M, et al. Alopecia areata: current understanding of the pathophysiology and update on therapeutic approaches, featuring the japanese dermatological association guidelines. J Dermatol. 2022;49:19–36. https://doi.org/10.1111/1346-8138.16207.

    Article  PubMed  CAS  Google Scholar 

  41. Amos Gilhar A, et al. Alopecia areata: Animal models illuminate autoimmune pathogenesis and novel immunotherapeutic strategies. Autoimmunity Rev. 2016;15:726–35. https://doi.org/10.1016/j.autrev.

    Article  Google Scholar 

  42. Ito T, et al. Understanding the significance of cytokines and chemokines in the pathogenesis of alopecia areata. Exp Dermatol. 2020;29:726–32. https://doi.org/10.1111/exd.14129.

    Article  PubMed  CAS  Google Scholar 

  43. Tregaskes CA, et al. Identification and analysis of the expression of CD8 alpha beta and CD8 alpha alpha isoforms in chickens reveals a major TCR-gamma delta CD8 alpha beta subset of intestinal intraepithelial lymphocytes. J Immunol. 1995;154:4485–94.

    Article  PubMed  CAS  Google Scholar 

  44. Xu Q, et al. DNA methylation and regulation of the CD8A after duck hepatitis virus type 1 infection. PLoS One. 2014;9:e88023. https://doi.org/10.1371/journal.pone.0088023.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Kristensen LK, et al. CD4(+) and CD8a(+) PET imaging predicts response to novel PD-1 checkpoint inhibitor: studies of Sym021 in syngeneic mouse cancer models. Theranostics. 2019;9:8221–38. https://doi.org/10.7150/thno.37513.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Kristensen LK, et al. Monitoring CD8a(+) T Cell Responses to Radiotherapy and CTLA-4 Blockade Using [(64)Cu]NOTA-CD8a PET Imaging. Mol Imag Biol. 2020;22:1021–30. https://doi.org/10.1007/s11307-020-01481-0.

    Article  CAS  Google Scholar 

  47. Ma K, Qiao Y, Wang H, Wang S. Comparative expression analysis of PD-1, PD-L1, and CD8A in lung adenocarcinoma. Ann Transl Med. 2020;8:1478. https://doi.org/10.21037/atm-20-6486.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Anzai A, Wang EHC, Lee EY, Aoki V, Christiano AM. Pathomechanisms of immune-mediated alopecia. Int Immunol. 2019;31:439–47. https://doi.org/10.1093/intimm/dxz039.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Waggoner SN, et al. Roles of natural killer cells in antiviral immunity. Curr Opin Virol. 2016;16:15–23. https://doi.org/10.1016/j.coviro.2015.10.008.

    Article  PubMed  CAS  Google Scholar 

  50. Zhang Y, Zheng J. Functions of immune checkpoint molecules beyond immune evasion. Adv Exp Med Biol. 2020;1248:201–26. https://doi.org/10.1007/978-981-15-3266-5_9.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

No.

Funding

This work was supported by the Zhejiang Provincial Health Science and Technology Program (2021KY906), the National Natural Science Foundation of China (82473549) and Construction Fund of Medical Key Disciplines of Hangzhou. The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

ZQ and WW conceived the study. ZQ, WW and LL collected and analyzed the data. WW and XX evaluated the findings. ZQ and WW wrote and presented the manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Wei Wang or Xinchang Xu.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Ethics Committee of the Third People's Hospital of Hangzhou (Approval number: 2022KA058). Written informed consent was obtained from all participants involved in the study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Q., Lan, L., Wang, W. et al. Identifying effective immune biomarkers in alopecia areata diagnosis based on machine learning methods. BMC Med Inform Decis Mak 25, 23 (2025). https://doi.org/10.1186/s12911-025-02853-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12911-025-02853-8

Keywords