PCOS Subtypes & Outcomes: A Data-Driven Approach

by Archynetys Health Desk

Ethics statement

Study protocols were approved by the relevant ethics review committee (China, [2021] IRB-No.140; Singapore, 2011/01716-SRF0012)22,23,24,25. All participants provided written informed consent. For information purpose, all cohorts were registered as an international multicenter study on clinicaltrials.gov (NCT06124391).

Study populations

This study involved a discovery cohort and five validation cohorts of participants with PCOS. All participants in the cohorts were between 20 and 45 years old. No transgender participant was included. PCOS was diagnosed using the broader Rotterdam diagnostic criteria26which requires the presence of any two of the following: (1) menstrual cycle length of <21 days or >35 days, and/or fewer than 8 cycles per year; (2) hyperandrogenism defined as an elevated total testosterone level according to local laboratory criteria, and/or a modified Ferriman–Gallwey score ≥5; (3) the presence of 12 or more follicles measuring 2–9 mm in diameter in each ovary and/or an ovarian volume >10 ml as determined by ultrasound. All the clinical features were performed and recorded at the time of diagnosis.

The discovery cohort conducted at the Center for Reproductive Medicine, Shandong University, Shandong Province, China, comprised 47,071 women with PCOS diagnosed using the broader Rotterdam diagnostic criteria between December 2013 and June 2020.

Among the full cohort, 11,908 women were not receiving any therapy that could alter hormone levels or the ovulation cycle—such as oral contraceptives, metformin or weight loss regimens—at the time of their first visit and were therefore included in the subsequent statistical analysis. Women with PCOS who had received such therapies at the time of enrollment, and therefore did not meet the Rotterdam diagnostic criteria in terms of clinical manifestations or test parameters were excluded (n = 35,163).

The validation cohorts were from China, the USA, Europe, Singapore and Brazil. The China validation cohort comprised 3,081 PCOS cases from various regions in China, including East China, South China, North China, Northwest China, and Southwest China except cases from the Center for Reproductive Medicine, Shandong University, Shandong Province, China. The US cohort had 750 participants with PCOS from the Pregnancy in Polycystic Ovary Syndrome II (PPCOS II, NCT00719186) trial including European, African American and other races and ethnicities based on their country or area of birth23. The Europe cohort had 392 cases from Turkey27 and 180 cases from Sweden28,29,30,31. The Singapore cohort consisted of 428 cases32. The Brazil cohort had 100 cases24,25.

Feature selection

A total of 29 baseline clinical features routinely tested in the clinic and associated with endocrine or metabolic parameters in PCOS, were initially examined in the discovery cohort (detailed in Extended Data Table 1). Categorical variables were excluded, because they cannot be accommodated in most unsupervised clustering methods. Features with more than 30% missing data, which might be caused by the subjective bias of the physicians, were excluded from further analysis. K-nearest neighbor imputation with k = 10 was performed to impute the missing value for the remaining features, using the DMwR (v.0.1.4) package in R33. Spearman correlation was performed to analyze the correlations among the clinical features. Principal component analysis was then used to evaluate each feature’s contribution to the overall variance in the dataset, identifying those variables that provided the most distinctive information relevant to clustering. Seeking to minimize covariance and redundancy, the features with lower contribution (value of contributions <10% in the first three principal components) of two strongly correlated features (correlation coefficient >0.7) were also excluded. Finally, an exploratory factor analysis was performed. Features with factor loadings <0.4 were excluded to ensure that only those strongly associated with the identified factors were retained. This approach resulted in the selection of nine continuous variables—BMI, LH, FSH, testosterone, SHBG, DHEA-S, AMH, fasting insulin and fasting glucose—for clustering analysis.

Feature measurements

Blood samples were drawn at the first consultation, and the fasting plasma glucose and insulin were analyzed after overnight fasting. A total of 29 baseline clinical features that related to endocrine or metabolic disorders were included in the study, comprising age, height, weight, BMI, systolic pressure, diastolic pressure, LH, FSH, estradiol, testosterone, prolactin, progesterone, SHBG, AMH, DHEA-S, thyroid stimulating hormone, alanine transaminase, aspartate transaminase, gamma-glutamyl transferase, albumin, triglyceride, total cholesterol, high-density lipoprotein, low-density lipoprotein, fasting glucose, fasting insulin, ultrasound antral follicle counts, menstrual cycle history and age at menarche.

Reproductive steroid hormone levels were measured during days 1–3 of menstruation (related to early follicular phase) for ovulatory women or at any time for anovulatory women by chemiluminescence immunoassay. The levels of AMH and biochemical parameters were measured by enzyme-linked immunosorbent assay. The antral follicle count was assessed by transvaginal ultrasound.

Unsupervised clustering analysis

An initial cluster analysis was conducted in the discovery cohort of Chinese PCOS cases. All continuous variables were normalized using z-score transformation. K-means clustering analysis was chosen for unsupervised clustering based on methods in a previous study34. A range of k values (from 3 to 8) was used to first identify the maximum average silhouette widths (with 30 iterations) using the fpc package (v.2.2-10) in R v.4.0.3 to determine the optimal number for classifications. We found that k = 4 resulted in the highest average silhouette width in sensitivity analyses assessing the fit of individual objects in the classification, thus indicating that four clusters provided the most stable classifications (Supplementary Fig. 2). Ultimately, k = 4 was selected by Manhattan distance as the dissimilarity measure using the cclust function in the flexclust package (v.1.4-0) in R. Cluster stability was assessed by computing the Jaccard similarities through bootstrap resampling 1,000 iterations35.

Subtype validation

Likewise, cases with eligible data from the validation cohorts were used to assess the reproducibility of the clustering results, as in previous studies34,36,37. The same nine clinical features in each validation cohort were first standardized using z-score transformation. Each cohort was individually clustered with the nine standardized features using k-means with the same parameter setting (k = 4) as the discovery cohort.

Ridge regression analysis

To maximize its potential for clinical application in PCOS populations, we performed ridge regression analysis in the discovery cohort using the glmnet R package (v.4.1-3) with nine normalized model variables and multinomial outcome prediction for four subtypes. The lambda value was determined by assessing the model performance using ‘cv.glmnet’ function, which used tenfold cross-validation methods. The ridge regression analysis output comprised four ridge regression equations, each of which corresponded to a subtype, such as ‘Y = b1 × 1 + b2 × 2…b9 × 9’ (in which Y is the probability a given individual belongs to this subtype, with values ranging from zero to one).

To verify the accuracy of these ridge regression equations for each subtype in the discovery cohort, we used nine features from each participant in the validation cohorts. The features were used as inputs for each of the four equations to compute the probability of a participant belonging to that subtype. Each individual was then assigned a corresponding label of subtype based on the highest value among the four calculated probabilities.

Following this process, the subtype labels obtained from the previous validation cohort via unsupervised clustering with k-means served as a reference for ROC curve analysis. Each individual’s subtype label, assigned by predictive values obtained with the ridge regression equations, were then used to generate ROC curves. The AUC for each subtype was subsequently calculated to evaluate the accuracy of the predictions made by the ridge regression equations. The closer the AUC is to 1 indicates better consistency between the results of the ridge regression and the unsupervised results.

Longitudinal follow-up

A total of 9,601 women with PCOS from the discovery cohort were diagnosed between 2014 and 2018. Of these, 4,542 women voluntarily participated in telephone follow-ups between March 2021 and August 2024. In addition, 523 participants volunteered to undergo a physical examination, which included blood sample collection at the hospital. The median follow-up period for this cohort was 6.5 years (interquartile range 6.16–6.85). Remission of PCOS and its associated endocrinal features was defined as the percentage of people who still had the disease or features at the time of the follow-up. The diagnostic criteria of PCOS were still based on the same Rotterdam criteria as the baseline. Use of hormonal contraception in past six months was adjusted for in the hyperandrogenic and oligo-ovulation assessments.

Content of follow-up

The follow-up telephone interview included: (1) current height, weight and menstrual cycle status; (2) number of pregnancies and births within the past years; (3) whether the participant had received IVF treatment during the follow-up years; and (4) current disease status of PCOS, T2DM, hypertension and dyslipidemia based on whether the participant had received treatment or been diagnosed through physical examination within the last six months.

Physical examination included the following measurements: (1) anthropometric measurements such as height, weight, blood pressure, waist and hip circumference; (2) clinical evaluation for signs of hyperandrogenism; (3) laboratory tests for endocrine and metabolic disorders (same as at baseline); (4) ultrasound examination of the ovaries (antral follicle number and ovarian volume) and liver (fat content); and (5) medication use within the past six months.

Outcomes of chronic metabolic diseases

BMI was calculated as weight divided by the square of height (m2) and was classified based on criteria for the Chinese population38: (1) normal weight, BMI of 18.5–23.9 kg m2; (2) overweight, BMI of 24–27.9 kg m2; and (3) obesity, BMI ≥ 28 kg m−2. Degree of MASLD was assessed using abdominal ultrasound performed by an experienced ultrasonographer. Hypertension was defined as a systolic blood pressure ≥140 mmHg and/or diastolic blood pressure ≥90 mmHg. T2DM was defined as a fasting glucose ≥7.0 mmol l−1 (ref. 39). Dyslipidemia was defined as the presence of any of the following abnormalities: (1) total cholesterol ≥5.2 mmol l−1; (2) triglycerides ≥1.7 mmol l−1; (3) high-density lipoprotein <1.0 mmol l−1; and (4) low-density lipoprotein ≥3.35 mmol l−1 (ref. 40).

Outcomes of IVF

In the discovery cohort, IVF data were available for 5,418 participants. To compare the IVF outcomes of different PCOS subtypes with those of a control population, we included a control group that underwent IVF treatment in the same clinic and met one of the following criteria: (1) infertility due to fallopian tubal adhesion or blockage, without any PCOS features; (2) infertility due to oligozoospermia, asthenospermia or abnormal spermatozoa in their male partner.

The live birth rate, pregnancy rate and pregnancy loss were calculated as the primary outcomes of IVF. Conception is diagnosed by serum human chorionic gonadotropin ≥10 mIU ml−1. Clinical pregnancy is defined as detection of a gestational sac in the uterine cavity. We define first trimester pregnancy loss as pregnancy loss before the end of the 11th gestational week by miscarriage or stillbirth, and second trimester pregnancy loss as pregnancy loss during the 12th gestational week to the end of the 27th gestational week by miscarriage or stillbirth arising from fetal abnormalities or maternal factors, extreme spontaneous preterm birth or iatrogenic preterm birth.

Secondary outcomes were maternal and neonatal complications. Preterm delivery is defined as a live birth during the 28th gestational week to the end of the 36th gestational week, including iatrogenic preterm delivery and spontaneous preterm delivery. Premature rupture of membrane is the membrane rupture after the 28th gestational week, including preterm premature rupture of membrane, which also belongs to the spontaneous preterm delivery. SGA and LGA were determined on the basis of birth weight reference percentiles for Chinese populations, which was adjusted for sex and gestational age41. Birth weight lower than the 10th percentile of reference was defined as SGA and birth weight higher than the 90th percentile as LGA.

Logistic regression was used to calculate the ORs for each subtype compared to controls.

Statistical analysis

All statistical analyses were performed using SPSS v.26 and R v.4.0.3. Shapiro–Wilk tests were used for analyzing the normality of the variables. Continuous variables were compared using Student’s t-test or analysis of variance with the natural logarithmic conversion for nonnormal distribution data. Post-hoc comparisons among subtypes were performed using either Bonferroni or Dunnett T3 correction. Categorical variables were compared with either h2 or Fisher’s exact test. ORs were calculated using logistic regression analysis, comparing cases with controls while controlling for two confounding models: (1) age and ovarian stimulation methods, and (2) age and fresh or frozen embryo transfer.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Related Posts

Leave a Comment