Effect of variable selection strategy on the predictive models for adverse pregnancy outcomes of pre-eclampsia: A retrospective study
DOI:
https://doi.org/10.54844/prm.2024.0318Keywords:
pre-eclampsia, feature selection, variable selection, logistic regression, forward stepwise, LASSO, recursive feature eliminationAbstract
Objectives: The improvement of prediction for adverse pregnancy outcomes is quite essential to the women suffering from pre-eclampsia, while the collection of predictive indicators is the prerequisite. The traditional knowledge-based strategy for variable selection confronts challenge referring to dataset with high-dimensional or unfamiliar data. In this study, we employed five different automatic variable selection methods to screen out influential indicators, and evaluated the performance of constructed predictive models. Methods: Seven hundreds and thirty-three Han-Chinese women were enrolled and 56 clinical and laboratory variables were recorded. After grouping based on binary pregnancy outcomes, statistical description and analysis were performed. Then, utilizing forward stepwise logistic regression (FSLR) as the reference method, another four variable selection strategies were included for filtering contributing variables as the predictive subsets, respectively. Finally, the logistic regression prediction models were constructed by the five subsets and evaluated by the receiver operator characteristic curve. Results: The variables confirmed statistical significance between the adverse and satisfactory outcomes groups did not overlap with the variables selected by selection strategies. “Platelet” and “Creatinine clearance rate” were the most influential indicator to predict adverse maternal outcome, while “Birth weight of neonates” was the best indicator for predicting adverse neonatal outcome. In average, the predictive models for neonatal outcomes achieved better performance than models for maternal outcomes. “Mutual information” and “Recursive feature elimination” were the best strategy under current dataset and study design. Conclusions: Variable selection strategies may provide an alternative approach besides picking influential indicators by statistical significance. Future work will focus on applying different variable selection methods to the high-dimensional dataset, which includes novel or unfamiliar variables. This aims to identify the most appropriate collection of predictors that can enhance prediction ability and clinical decision-making.
References
Giannakou K. Prediction of pre-eclampsia. Obstet Med 2021;14:220-224.[PMID: 34880934 DOI: 10.1177/1753495X20984015]
Shipe ME, Deppen SA, Farjah F, Grogan EL. Developing prediction models for clinical use using logistic regression: an overview. J Thorac Dis 2019;11:S574-S584.[PMID: 31032076 DOI: 10.21037/jtd.2019.01.25]
Tsai TL, Huang MH, Lee CY, Lai WW. Data Science for Extubation Prediction and Value of Information in Surgical Intensive Care Unit. J Clin Med 2019;8(10):1709. [PMID: 31627316 DOI: 10.3390/jcm8101709]
Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A. A review of feature selection methods on synthetic data. Knowl Inf Syst 2013;34:483-519.[DOI: 10.1007/s10115-012-0487-8]
ACOG Practice Bulletin No. 202: Gestational Hypertension and Preeclampsia. Obstet Gynecol 2019;133(1):1. [PMID: 30575675 DOI: 10.1097/AOG.0000000000003018]
Dong W, Fong DYT, Yoon JS, et al. Generative adversarial networks for imputing missing data for big data clinical research. BMC Med Res Methodol 2021;21(1):78. [PMID: 33879090 DOI: 10.1186/s12874-021-01272-3]
Duffy J, Cairns AE, Richards-Doran D, et al. A core outcome set for pre-eclampsia research: an international consensus development study. BJOG 2020;127(12):1516-1526. [PMID: 32416644 DOI: 10.1111/1471-0528.16319]
Fernandez-Prado R, Castillo-Rodriguez E, Velez-Arribas FJ, Gracia-Iguacel C, Ortiz A. Creatinine Clearance Is Not Equal to Glomerular Filtration Rate and Cockcroft-Gault Equation Is Not Equal to CKD-EPI Collaboration Equation. Am J Med 2016;129(12):1259-1263. [PMID: 27612441 DOI: 10.1016/j.amjmed.2016.08.019]
Triguero I, Garcia-Gil D, Maillo J, Luengo J, Garcia S, Herrera F. Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wires Data Min Knowl 2019;9:e1289.[DOI:10.1002/widm.1289]
Arunajadai SG. Stepwise logistic regression. Anesth Analg 2009;109:285, 285-286.[PMID: 19535724 DOI: 10.1213/ane.0b013e3181a7b51a]
Lee CY, Chen BS. Mutually-exclusive-and-collectively-exhaustive feature selection scheme. Appl Soft Comput 2018;68:961-971. [DOI: https://doi.org/10.1016/j.asoc.2017.04.055]
Pace NL. Independent predictors from stepwise logistic regression may be nothing more than publishable P values. Anesth Analg 2008;107(6):1775-1778. [PMID: 19020117 DOI: 10.1213/ane.0b013e31818c1297]
Wang ZH, Liang SL, Xu LZ, Song W, Wang DX, Huang DM. Dimensionality reduction method for hyperspectral image analysis based on rough set theory. Eur J Remote Sens 2020;53:192-200.
Ray P, Reddy SS, Banerjee T. Various dimension reduction techniques for high dimensional data analysis: a review. Artif Intell Rev 2021;54:3473-3515.
Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci 2016;374(2065):20150202. [PMID: 26953178 DOI: 10.1098/rsta.2015.0202]
Battiti R. Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 1994;5(4):537-550. [PMID: 18267827 DOI: 10.1109/72.298224]
Cheng J, Sun J, Yao K, Xu M, Cao Y. A variable selection method based on mutual information and variance inflation factor. Spectrochim Acta A Mol Biomol Spectrosc 2022;268:120652. [PMID: 34896682 DOI: 10.1016/j.saa.2021.120652]
Subbiah SS, Chinnappan J. Deep learning based short term load forecasting with hybrid feature selection. Electr Pow Syst Res 2022;210:108065.[DOI:10.1016/j.epsr.2022.108065]
Chen Q, Meng Z, Liu X, Jin Q, Su R. Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE. Genes (Basel). 2018;9(6):301. [PMID: 29914084 DOI: 10.3390/genes9060301]
Ueno D, Kawabe H, Yamasaki S, Demura T, Kato K. Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana. BMC Bioinformatics. 2021;22(1):380. [PMID: 34294042 DOI: 10.1186/s12859-021-04291-5]
Zhou Y, Uddin MS, Habib T, Chi GT, Yuan KP. Feature selection in credit risk modeling: an international evidence. Econ Res-Ekon Istraz 2021;34:3064-3091.[DOI: 10.1080/1331677X.2020.1867213]
Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc B. 1996;58:267-288. [DOI: 10.1111/j.2517-6161.1996.tb02080.x]
Tirzïte M, Bukovskis M, Strazda G, Jurka N, Taivans I. Detection of lung cancer with electronic nose and logistic regression analysis. J Breath Res. 2018;13(1):016006. [PMID: 30221629 DOI: 10.1088/1752-7163/aae1b8]
Wong TT, Yang NY. Dependency Analysis of Accuracy Estimates in k-Fold Cross Validation. IEEE Trans Knowl Data Eng 2017;29:2417-2427.[DOI: 10.1109/TKDE.2017.2740926]
Liu M, Yang X, Chen G, Ding Y, Shi M, Sun L, et al. Development of a prediction model on preeclampsia using machine learning-based method: a retrospective cohort study in China. Front Physiol 2022;13:896969. [PMID: 36035487 DOI: 10.3389/fphys.2022.896969]
Melinte-Popescu AS, Vasilache IA, Socolov D, Melinte-Popescu M. Predictive Performance of Machine Learning-Based Methods for the Prediction of Preeclampsia-A Prospective Study. J Clin Med 2023;12(2):418. [PMID: 36675347 DOI: 10.3390/jcm12020418]
I Maric, A Tsur, N Aghaeepour, A Montanari, DK Stevenson, GM Shaw, et al. Early prediction of preeclampsia via machine learning. Am J Obstet Gynecol MFM 2020;2(2):100100. [PMID: 33345966 DOI: 10.1016/j.ajogmf.2020.100100]
Espinola-Sánchez M, Sanca-Valeriano S, Campaña-Acuña A, Caballero-Alvarado J. Prediction of neonatal death in pregnant women in an intensive care unit: Application of machine learning models. Heliyon 2023;9(10):e20693. [PMID: 37860503 DOI: 10.1016/j.heliyon.2023.e20693]
Wang G, Zhang Y, Li S, Zhang J, Jiang D, Li X, et al. A Machine Learning-Based Prediction Model for Cardiovascular Risk in Women With Preeclampsia. Front Cardiovasc Med 2021;8:736491. [PMID: 34778400 DOI: 10.3389/fcvm.2021.736491]
Villalaín C, Herraiz I, Domínguez-Del Olmo P, Angulo P, Ayala JL, Galindo A. Prediction of Delivery Within 7 Days After Diagnosis of Early Onset Preeclampsia Using Machine-Learning Models. Front Cardiovasc Med 2022;9:910701. [PMID: 35845049 DOI: 10.3389/fcvm.2022.910701]
Hackelöer M, Schmidt L, Verlohren S. New advances in prediction and surveillance of preeclampsia: role of machine learning approaches and remote monitoring. Arch Gynecol Obstet 2023;308(6):1663-1677. [PMID: 36566477 DOI: 10.1007/s00404-022-06864-y]
Scherr S, Zhou J. Automatically Identifying Relevant Variables for Linear Regression with the Lasso Method: A Methodological Primer for its Application with R and a Performance Contrast Simulation with Alternative Selection Strategies. Commun Methods Meas 2020;14:204-211.[DOI: 10.1080/19312458.2019.1677882]
Gibbins JM. Adding fuel to the flames in preeclampsia: the platelet connection. J Thromb Haemost 2023;21:1750-1752.[PMID: 37330264 DOI:10.1016/j.jtha.2023.03.031]
Piani F, Agnoletti D, Baracchi A, Scarduelli S, Verde C, Tossetta G, et al. Serum uric acid to creatinine ratio and risk of preeclampsia and adverse pregnancy outcomes. J Hypertens 2023;41(8):1333-1338. [PMID: 37260263 DOI: 10.1097/HJH.0000000000003472]
Pecoraro V, Trenti T. Predictive value of serum uric acid levels for adverse maternal and perinatal outcomes in pregnant women with high blood pressure. A systematic review and meta-analysis. Eur J Obstet Gynecol Reprod Biol 2020;252:447-454. [PMID: 32736271 DOI: 10.1016/j.ejogrb.2020.07.042]
Lei T, Qiu T, Liao W, Li K, Lai X, Huang H, et al. Proteinuria may be an indicator of adverse pregnancy outcomes in patients with preeclampsia: a retrospective study. Reprod Biol Endocrinol 2021;19(1):71. [PMID: 33990220 DOI: 10.1186/s12958-021-00751-y]
Morikawa M, Mayama M, Saito Y, Nakagawa-Akabane K, Umazume T, et al. Severe proteinuria as a parameter of worse perinatal/neonatal outcomes in women with preeclampsia. Pregnancy Hypertens 2020;19:119-126. [PMID: 31972468 DOI: 10.1016/j.preghy.2019.12.013]
Alhassan AM, Zainon WMNW. Review of Feature Selection, Dimensionality Reduction and Classification for Chronic Disease Diagnosis. IEEE ACCESS 2021;9:87310-87317.[DOI: 10.1109/ACCESS.2021.3088613]
Vanhatalo J, Li Z, Sillanpää MJ. A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data. Bioinformatics 2019;35(19):3684-3692. [PMID: 30850830 DOI: 10.1093/bioinformatics/btz164]
Zheng D, Hao X, Khan M, Wang L, Li F, Xiang N, et al. Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study. Front Cardiovasc Med 2022;9:959649. [PMID: 36312231 DOI: 10.3389/fcvm.2022.959649]
Published
Issue
Section
Downloads
License
Copyright (c) 2024 Placenta and Reproductive Medicine

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.