This is a cache of https://www.uibk.ac.at/en/disc/blog/diagnosing-depression/. It is a snapshot of the page at 2025-04-23T21:31:51.950+0200.
Exploring Biomarkers for Depression in Hair – Universität Innsbruck
An AI generated picture, displaying molecule-like structures, a DNA strand and a brain on futuristic looking screens.

Explor­ing Biomark­ers for Depres­sion in Hair

To explore the feasibility of biomarker research with hair strands for depression, metabolites from 32 hair samples were analyzed with uni- and multivariate methods regarding differences between depressed and non-depressed individuals.

Introduction

Depression is a widespread mental health challenge, affecting one in twenty people worldwide. This makes depression the third greatest cause of disease burden - with increasing tendency (WHO, 2012; WHO, 2017). Especially the high heterogeneity in treatment response poses a challenge, a prominent example being the differing response rates to antidepressants: While they work in most cases, up to 30% of people with depression show no response (Trivedi et al., 2006). Therefore, to develop better treatment, a more thorough understanding of the mechanisms behind depression is required. To this end, I used an exploratory, data-driven approach, to discover potential biomarkers of depression from metabolites in hair.

Sample description and preprocessing

The sample consisted of 32 female participants, 20 of whom were diagnosed with Major Depressive Disorder (MDD) and 12 serving as healthy controls. The selection of participants was based on the structured clinical interview for the DSM-IV (SCID), whereby DSM-IV stands for Diagnostic and Statistical Manual of Mental Disorders, an internationally accepted reference book in psychology and psychiatry. Exclusion criteria for the healthy control group were previous (known) depressive episodes for the participant or in the previous two generations. Participants in the MDD group were excluded if they had other neurological or mental disorders, or diseases that are likely to affect the metabolism. Traditionally, metabolic analyses for psychiatric conditions have relied on blood samples. However, hair sampling offers several advantages: It is non-invasive, easier to store and transport, and reflects biochemical changes over an extended period. These practical benefits could open new possibilities for large-scale screenings.

For identification of the metabolites from the hair strands, quadrupole time-of-flight mass spectrometry (qTOF-MS) was used. Raw data from the mass spectrometry experiments initially included nearly 430 detected metabolites. However, this data set required further cleaning: The dataset was filtered by excluding metabolites with more than 50% missing values in either the control or MDD group. In addition, compounds identified as exogenous (such as metabolites of contaminants or medications) were removed, reducing the dataset to 91 endogenous metabolites for data analysis. For implementing the data cleaning steps, I could build on my knowledge of the R programming language, which I consolidated throughout the courses of the Minor Digital Science.

Analysis

Before deploying multivariate methods, I conducted Mann-Whitney-U-tests to compare each metabolite’s concentration between depressed and non-depressed participants. U-tests are non-parametric statistical tests for comparing groups, and an example of a statistical method taught in the module Data Analysis I, which I attended as part of my DiSC minor studies. To account for the multiple comparisons, p-values were adjusted using False Discovery Rate (FDR) correction on a level of q < .2. A more liberal level of correction was chosen, as I consider the present study as an exploratory and hypothesis generating study which needs to be followed up by statistically conservative hypothesis confirmatory studies. Therefore, for the present study, I consider the issue of missing potentially relevant findings (beta errors) to be more important than the issue of reporting false findings. Statistical thresholding identified nine metabolites that differed significantly between the groups (Table 1).

Complementary to the univariate analysis, two multivariate approaches were used: Partial Least Squares-Discriminant Analysis (PLS-DA) and Random Forest (RF) were implemented. Metabolites were included in the list of potential biomarkers if they appeared in the 20 most relevant compounds for both models.

For multivariate analysis, case wise exclusion of missing values was not possible, as PLS-DA requires a full data set. After comparison of different missing value imputation methods, based on simulated missing data, non-linear iterative partial least squares (NIPALS) emerged as the most accurate approach. Therefore, NIPALS was chosen as method for imputation.

I trained both models, using 10-fold cross validation for selection of the optimal parameters, with 75% of the data. Due to the tendency of cross validation to overestimate the accuracy in datasets such as the one used in this study (Rodríguez-Pérez et al., 2018), the best models were afterwards used for prediction on a holdout test set comprising the remaining 25% of the data.

PLS-DA is a supervised dimensionality-reduction technique for classification. It works by projecting the high-dimensional data into a lower-dimensional space where the differences between groups (MDD vs. control) become more pronounced. Values of one to five for ncomp (the number of components) were tested with cross validation to find the best model. The model using just one component achieved the highest accuracy (71%) and was therefore chosen as the final model.

The line graph shows the accuracy, sensitivity and specificity of the PLS-DA model with varying numbers of ncomp on the x-axis. The y-axis represents the values of these metrics (range 0-1). The highest value is reached for ncomp = 1.

Additionally, I employed Random Forest — an ensemble method that builds multiple decision trees and aggregates their predictions. RF is more resistant to overfitting than PLS-DA which makes it a good complement. Key parameters for optimization with cross validation were mtry (the number of variables sampled at each split) and ntree (the number of trees in the forest). The optimal RF model was reached for ntree = 50 and mtry = 13 and achieved an accuracy of 71%, with a sensitivity of 51% and a specificity of 85%.

The line graph displays the accuracy, sensitivity and specificity of the RF-model with different numbers of mtry on the x-axis. The y-axis represents the values of these metrics (range 0-1). The peak value for accuracy is reached at mtry = 13.

As described above, to further evaluate the model performance, I tested their performance on a holdout test set that was not included during parameter optimization with cross validation. PLS-DA reached an accuracy of 71%, a specificity of 75% and a sensitivity of 67% on this test set. RF reached an accuracy of 81%, a specificity of 100% and a sensitivity of 67%. However, due to the very small sample size, these results should be taken with a grain of salt.

The potential biomarkers

Metabolites that were significant in the univariate analysis were included as potential biomarkers. Furthermore, those among the 20 most relevant compounds from both PLS-DA and RF were also included (Table 1).

Table 1
Potential BiomarkerSubstance Class
(1S,2S)-3-oxo-2-pentyl-cyclopentanehexanoic acid (a)Eicosanoids
Prostaglandin A1-Biotin (b)Eicosanoids
12,13S-epoxy-9Z,11-octadecadienoic acid (a)Fatty Acids
4,12-Dimethyl-tridecanoic acid (b)Fatty Acids
PI(12:0/12:0) (a)Glycerophospholipids
PI(13:0/0:0) (b)Glycerophospholipids
PG(6:0/6:0)[U] (a)Glycerophospholipids
PI(18:1(9Z)/20:5(5Z,8Z,11Z,14Z,17Z)) (a)Glycerophospholipids
PI(P-16:0/22:6(4Z,7Z,10Z,13Z,16Z,19Z)) (a)Glycerophospholipids
PI(20:4(8Z,11Z,14Z,17Z)/16:0) (b)Glycerophospholipids
Coproporphyrin (a)Porphyrins
11-Oxo-androsterone-glucuronide (a)Steroid Glucosiduronic Acids
Testosterone-glucuronide (a)Steroid Glucosiduronic Acids

Note. (a) Metabolite appeared in univariate and multivariate analyses as potential biomarker for depression; (b) Metabolite appeared only in multivariate analysis as a potential biomarker for depression.

 

Potential biomarkers included metabolites from the following substance classes: Steroid glucusiduronic acids (specifically metabolites of testosterone), porphyrines (molecules involved in heme biosynthesis), glycerophospholipids (mainly cell membranes), eicosanoids (lipids involved in inflammatory processes) and fatty acids. These results indicate possible connections to the energy metabolism, inflammation, stress and the neuroendocrine system. Regarding the dysregulation of fatty acids, food intake of saturated and (poly-)unsaturated fatty acids could play a role as well.

Both univariate and multivariate results supported the feasibility of biomarker identification based on metabolites from hair.

In the future, this could be used for depression screenings based on metabolic profiles which could in turn lead to more effective interventions, treatment monitoring and personalized medicine. For that, of course, further studies with randomized-control-trial and longitudinal designs are necessary.

 

References

Rodríguez-Pérez, R., Fernández, L., & Marco, S. (2018). Overoptimism in cross-validation when using partial least squares-discriminant analysis for omics data: a systematic study. Analytical and Bioanalytical Chemistry, 410(23), 5981–5992. https://doi.org/10.1007/s00216-018-1217-1

Trivedi, M. H., Rush, A. J., Wisniewski, S. R., Nierenberg, A. A., Warden, D., Ritz, L., Norquist, G., Howland, R. H., Lebowitz, B., McGrath, P. J., Shores-Wilson, K., Biggs, M. M., Balasubramani, G. K., Fava, M., & STAR*D Study Team. (2006). Evaluation of Outcomes With Citalopram for Depression Using Measurement-Based Care in STAR*D: Implications for Clinical Practice. American Journal of Psychiatry, 163(1), 28–40. https://doi.org/10.1176/appi.ajp.163.1.28

Wieser, L. (2024). Eine massenspektrometrische Vergleichsanalyse des biochemischen Stoffwechselprofils zwischen Haarproben und Blutserum zur hypothesenfreien Identifikation neuer Biomarkerkandidaten für die Depression. [Master’s thesis, University of Innsbruck]. Ulb:Dok. https://resolver.obvsg.at/urn:nbn:at:at-ubi:1-145863

World Health Organization (‎2012)‎. Global burden of mental disorders and the need for a comprehensive, coordinated response from health and social sectors at the country level: report by the Secretariat. World Health Organization. https://iris.who.int/handle/10665/78898

World Health Organization. (2017). Depression and other common mental disorders: global health estimates. http://apps.who.int/iris/bitstream/10665/254610/1/WHO-MSD-MER-2017.2-eng.pdf

This guest blog post is based on a master's thesis by Lennart Wieser, who completed the Minor Digital Science in 2023. With his thesis, which was supervised by Ass.-Prof. Dr. Alexander Karabatsiakis, he won the 2024 Award for Digitalization Research, sponsored by BE-terna, in the Master's or Diplom thesis category.

Portrait photo of Lennart Wieser in front of a white wall

Lennart Wieser

Student at the University of Innsbruck until March 2024

Master’s program in Psychology

About the author

I graduated in Psychology and completed the Minor Digital Science at the University of Innsbruck. My research interest is at the intersection of biological psychology and data science.


    Nach oben scrollen