Using an innovative stacked ensemble algorithm for the accurate prediction of preterm birth
    PDF
    Cite
    Share
    Request
    Original Investigation
    P: 70-78
    June 2019

    Using an innovative stacked ensemble algorithm for the accurate prediction of preterm birth

    J Turk Ger Gynecol Assoc 2019;20(2):70-78
    1. Department of Computer Science and Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India
    2. Department of Computer Science and Engineering, B. S. Abdur Rahman Crescent Institute of Science and Technology, Chennai, India
    No information available.
    No information available
    Received Date: 01.08.2018
    Accepted Date: 30.11.2018
    Publish Date: 28.05.2019
    PDF
    Cite
    Share
    Request

    ABSTRACT

    Objective:

    A birth before the normal term of 38 weeks of gestation is called a preterm birth (PTB). It is one of the major reasons for neonatal death. The objective of this article was to predict PTB well in advance so that it was converted to a term birth.

    Material and Methods:

    This study uses the historical data of expectant mothers and an innovative stacked ensemble (SE) algorithm to predict PTB. The proposed algorithm stacks classifiers in multiple tiers. The accuracy of the classiffication is improved in every tier.

    Results:

    The experimental results from this study show that PTB can be predicted with more than 96% accuracy using innovative SE learning.

    Conclusion:

    The proposed approach helps physicians in Gynecology and Obstetrics departments to decide whether the expectant mother needs treatment. Treatment can be given to delay the birth only in patients for whom PTB is predicted, or in many cases to convert the PTB to a normal birth. This, in turn, can reduce the mortality of babies due to PTB.

    Keywords: Preterm birth, neonatal death, risk factors of preterm birth, stacked ensemble, stacked generalization, meta-learning

    Introduction

    Births that happen after 37 weeks of gestation and before 39 weeks are termed as normal birth (TB). Babies born before 37 weeks of gestation are considered as premature babies and such births are termed as preterm birth (PTB) (1,2). Premature babies typically have many severe complications such as breathing/respiratory problems (apnea), chronic lung disease, jaundice, anemia, infections, bleeding in the brain (intraventricular hemorrhage). In the worst cases, premature babies die in the early days of life. Such deaths are termed as neonatal death (3). The United Nations International Children's Emergency Fund report published in 2015 stated that PTB was a major cause of the neonatal death (4,5,6). Due to PTB, some women also have poor mental health, and in some extreme cases have mental disorders (7). The long-term consequences of PTB for the babies are cognitive problems (intellectual disability and learning disability), asthma, intestinal problems, vision problems, hearing loss problems, dental problems, poor growth, and increased risk of sudden infant death syndrome.

    When the expectant mother undergoes prenatal checkups, the clinical pathologic status may indicate the possibility of a PTB. In the Obstetrics and Gynecology (O&G) world, these indicators are called risk factors of PTB (8,9,10,11). The physician analyzes these risk factors and diagnoses the birth as either TB or PTB. While diagnosing PTB, the physician also takes into consideration the behavioral and social characteristics of the expectant mother (12). Hence, they are also considered as risk factors of PTB. All risk factors are not critical in nature and they do not contribute equally to PTB. Hence, risk factors are categorized as primary risk factors and secondary risk factors based on their criticality. The primary and secondary risk factors associated with PTB are listed in Table 1.

    Table 1

    Obtaining evidence for PTB in clinical pathology is a challenging task. More than that, some clinical tests are too expensive to for patients from developing countries. Accordingly, predictive analytics is the way forward. Predicting a PTB as a TB can lead to fatal consequences, thus learning algorithms with high accuracy are very much needed. Ensemble learning gives better accuracy than individual learning algorithms and hence is suitable for predicting PTB (15,16). Ensembles perform effectively, especially if the base learners are diverse and are moderately performing (17). Using a trainable combiner to learn from the predictions of base learners generalizes better than traditional ensembles (18). Such learning systems are termed as stacked ensemble (SE) systems (19,20). They use base classifiers to train level-0 models and a generalizer to learn from the predictions of level-0 models. Thus, the predictions of base classifiers form the input space for the generalizer. These predictions are termed as meta-features and the generalizer is said to perform meta-learning (21,22,23).

    This study uses an innovative SE algorithm for the accurate prediction of PTB. It differs from traditional SEs in producing the meta-features. Rather than using the predictions of level-0 models as meta-features, it combines them using multiple combination schemes to produce meta-features. The meta-features along with the critical features are used to train the generalizer. The combination schemes produce the joint distributions of the level-0 predictions. The predictions from level-0 models are the abstraction of the mapping between the input space and the actual labels. Hence, these joint distributions map the level-0 predictions to the actual label and indirectly map the input space with the actual label. This in turn produces meta-features that better abstracts the relationship between the input space and the actual labels. In doing so, the proposed algorithm performs better than traditional SE algorithms. The performance of the algorithm is measured using its accuracy and recall.

    The following are the contributions of this study: (i) the introduction of an innovative SE algorithm to improve prediction accuracy, (ii) the algorithm enables accurate predictions of PTB, and (iii) motivation for the research community to use this algorithm for classification problems. Organization of the remaining sections: Section II describes the work conducted in predicting PTB. Section III depicts the proposed algorithm in detail. Experimental results along with the inferences are reported in section IV. Section V is the conclusion and the scope for future work.

    Material and Methods

    When the traditional SE is used, the level-1 generalizer inherits some level of bias and variance from level-0 models. Hence the problem of overfitting or underfitting is not eliminated to the maximum possible extent. To address this issue, this study proposes an innovative SE algorithm by stacking classifiers in multiple tiers (31). The proposed algorithm stacks the classifiers in three tiers namely (i) base tier, (ii) ensemble tier, and (iii) generalization tier (32). The base tier focuses on training a set of suitable learners to achieve moderate accuracy. The second tier uses a set of combinations schemes to combine the predictions from the base learners. The outputs from the combination schemes form the input space for the next tier. The third tier does the meta-learning using the newly formed input space. The performance of this algorithm is optimized using a suitable number of base learners and a suitable number of combination schemes. The choice of meta-learner in the third tier also plays a vital role in improving the accuracy of this algorithm. The base tier ensures a reduction of bias, the ensemble tier and the generalization tier ensure the reduction of variance. As a result, the bias is perfectly balanced with the variance. Due to this, the proposed algorithm improves the classification accuracy. Hence this algorithm is suitable for classifying PTB based on the historical data of expectant mothers.

    In general, for any classifier, cross-validation helps in reducing bias (33,34). In the proposed algorithm, a 10-fold cross validation is also used to train the base learners. The dataset is partitioned into 10 disjoint sets. Each of these 10 sets is used one after the other as a test set. Each fold is used nine times as a training set. As a result, the base learners produce the cross-validated predictions. Figure 1 depicts the cross-validation of the base learners. The output of the base tier is multiple sets of cross-validated predictions because this process is repeated for each base learner. This serves as the input for the next tier.

    Figure 1

    As depicted in Figure 2, the set of cross-validated predictions from each base learner are used as input in the second tier. The goal of the second tier is to combine the predictions from the base tier and to map them with one of the class labels. Hence, it creates a joint distribution of the base learners’ predictions. The output from each combination scheme provides a meta-feature. The quality of the meta-features depends on the choice of the combination schemes used in this tier. The better the combination schemes, the better the meta-features capture the inherent relationship between the input space and the actual labels. Therefore, the meta-features play an important role in the accuracy of this innovative SE algorithm. The combination schemes to be used in this tier are decided depending upon the problem on hand. Popular combination schemes such as averaging and majority voting work well for most of problems.

    Figure 2

    As depicted in Figure 3, the meta-features and the top three critical features selected from the original input space form the input space for the meta-learner. The top three features are selected by analyzing the correlation of each feature with the class labels. The features that have high correlation with the class label are selected.

    Figure 3

    After analyzing the historical data of the patients, we decided to use the following list of base learners, the combination schemes, and the meta-learner in this study. This is shown in Table 2.

    Table 2

    The experiment was implemented using the Python and Scikit-learn library (35). A dataset consisting of the historical data of 2600 patients was used to carry out this study. The dataset was a masked dataset without any reference to the personal details of the patients. Accordingly, the need to obtain informed consent and ethics committee approval did not arise. The details about the data set are given in Table 3. The data were thoroughly reviewed to check if the dataset had a good mix of all the possible cases: (i) mother with risk factors and had a PTB, (ii) mother without risk factors and had a PTB, (iii) mother with risk factors but had a TB, and (iv) mother without risk factors and had a TB. This mix of all the possible cases was also ensured in the training and testing data.

    Table 3

    The distribution analysis of the class labels in the dataset reveals that the dataset was asymmetric one and skewed towards TB. PTB was the minority class and TB was the majority class. Hence, the SMOTE (Synthetic Minority Over-sampling Technique) algorithm was used to balance the dataset. Balancing the dataset increases the number of minor instances to match with the number of major instances. This in turn increases the total number of instances. The count of TB and PTB in the dataset before and after SMOTE is shown in Figure 4.

    Figure 4

    The dataset was thoroughly analyzed for missing data or null values because missing data plays a large role in pulling down the accuracy of models. If missing data or null values were found in any of the features, the criticality of the feature in which it is found was analyzed. For all critical features, mean values were used to replace missing data. For all non-critical features, the default values were used. A scatter plot of the historical data was plotted to reveal the outliers. The top 5 critical features were selected and concatenated because most of the features were binary in nature. The concatenated feature is taken along the x-axis and the class label is taken along the y-axis. The central mass of the plot was identified and the points that were further away from this central mass were analyzed to identify the outliers. The identified outliers were removed from the dataset. Normalizing the dataset also helps in improving the performance of the learning algorithm. Accordingly, different normalization methods were analyzed to select a suitable one for the problem on hand. In this study, the dataset was normalized by performing mean cancellation.

    To assess the impact of primary and secondary factors on the classification accuracy, multiple experiments were conducted with different subsets of features in the dataset. The list of experiments conducted is depicted in Table 4. These experiments help to understand the contributions of primary and secondary risk factors for PTB. Each experiment was repeated 10 times to ensure the consistency of the results. The average values of the accuracy, precision, recall, and the F-1 score across the trials are reported in this study. Receiver operating characteristic (ROC) curves were also drawn for each experiment and these are also reported in this study.

    Table 4

    Results

    A comparative analysis of different performance metrics for SE and the proposed algorithm was conducted. In addition, analysis of how the feature subsets improved or degraded the performance metrics was also performed. This analysis helps in understanding the factors that make a major contribution to PTB. The results reveal that the performance metrics reached the maximum when all the features in the dataset were used for training the algorithms. Irrespective of the number of risk factors used for training, the performance of the proposed algorithm is better than the performance of SE. The results of the experiments in which all the risk factors are used for training is summarized in Table 5.

    Table 5

    The analysis of accuracy for SE and the proposed algorithm is depicted in Figure 5. Among the five experiments conducted, the accuracy was at the minimum for the experiment conducted with only the top five secondary risk factors. There was a big jump in accuracy when other secondary risk factors were also used for training the algorithms. The improvement in accuracy of the proposed algorithm reached the maximum of 12% when only the secondary risk factors were used for training. This implies that, when only trivial features are available, the proposed algorithm can still perform much better than SE. This is mainly due to the reason that the proposed algorithm is not affected much by overfitting or underfitting. When all the factors are used for training, there is an improvement of more than 3% in accuracy over SE. When only the primary risk factors were used for training, the accuracy of the proposed algorithm was just 1.3% below the accuracy of the proposed algorithm when all the risk factors were used. Hence the contribution of the secondary risk factors in PTB is not significant. Even the maximum accuracy of 93.8% achieved by SE when all the factors are used for training is 1.8% less than the accuracy achieved by the proposed algorithm with only primary factors. Hence, for high-dimension datasets also, the proposed algorithm can use minimal features and achieve better accuracy than SE.

    Figure 5

    The analysis of precision for SE and the proposed algorithm is depicted in Figure 6. The observed values of precision are also in similar lines of accuracy. The precision of the proposed algorithm reached the peak value of 98.56% when all the risk factors were used for training. The high value of precision for the proposed algorithm implies that the number of false positives was less. The high precision implies that most of the TB cases were predicted as NBs. This avoids unnecessary treatment being given to expectant mothers who would otherwise have undergone treatment. The improvement in precision reached a maximum of 10% when only secondary risk factors were used for training. Even with trivial factors, the proposed algorithm performed better than SE.

    Figure 6

    The analysis of sensitivity for SE and the proposed algorithm is depicted in Figure 7. The high value of sensitivity for the proposed algorithm implies that the number of false negatives was less. The high sensitivity implies that most of the PTB cases were predicted as PTBs. This indicates that the patients who need immediate medication are not ill-affected by the predictions of the proposed algorithm. The improvement in sensitivity reached the maximum of 8.5% when only primary risk factors were used for training. When the top five secondary risk factors were used for training, there was no improvement in sensitivity.

    Figure 7

    The analysis of F1 scores for SE and the proposed algorithm is depicted in Figure 8. The F1 score is the harmonic mean of precision and sensitivity. As the proposed algorithm achieved improvement in both precision and sensitivity, its F1 score was also better than that of SE for all five experiments. The F1 score reached the minimum when only the top 5 secondary risk factors were used to train the algorithms. The difference in the F1 score of SE and the proposed algorithm was as high as 13% when only the secondary risk factors were used for training. The F1 score reached the maximum when all the risk factors were used for training.

    Figure 8

    ROC curves were drawn to analyze the AUC. The ROC for SE and the proposed algorithm for the five experiments are depicted in Figure 9. The set of graphs in the first row correspond to SE and the set of graphs in the second row correspond to the proposed algorithm. In the below graphs, the middle way mark of 50% is represented as dotted lines. As expected, the AUC reached the minimum when only the top five secondary risk factors were used to train the algorithms. The AUC increased with the number of critical factors used for training. The greater the number of critical factors used for training, the greater is the AUC. It reached a peak for both SE and the proposed algorithm when all risk factors were used for training. The minimum values of AUC for the top 5 secondary risk factors imply that the true positive rate did not reach the peak even if the false positive rate reached the minimum. This means that false negatives were high in the prediction. From the perspective of PTB, this is alarming. High values of false negatives imply that a patient who needs immediate attention and treatment may not receive treatment.

    Figure 9

    The application of the innovative SE algorithm for predicting PTB achieved better performance than SE for all the experiments conducted in this study. For all the performance metrics considered in this study, the innovative SE algorithm is way ahead when compared with the traditional SE algorithm. Primary risk factors play a major role in predicting PTB. When secondary factors were used along with primary risk factors, the performance metrics improved marginally (little more than 1%). Hence, using only primary risk factors with the proposed algorithm is the efficient method for PTB prediction. The time complexity of the proposed algorithm with different sets of factors can be considered for future work. The accuracy can be further improved by using a large number of base learners and combination schemes because the proposed algorithm is scalable in these terms. Finding the optimal number of base learners and combination schemes is also an interesting area to explore further. In order to increase the clinical use of this algorithm, we are considering the possibility of designing a mobile app with a wrapper around the algorithm. The mobile app allows physicians to enter the results of clinical tests of expectant mothers using an interface and provides the corresponding prediction. This mobile app hides the complexities of the statistical methods from the end user and thus greater numbers of physicians can benefit from this algorithm. We are also exploring if this algorithm can be enhanced and extended to analyze other maternal complications.

    References

    1
    Di Renzo GC, Roura LC; European Association of Perinatal Medicine-Study Group on Preterm Birth. Guidelines for the management of spontaneous preterm labor. J Perinat Med 2006; 34: 359-66.
    2
    World Health Organization. “Preterm Birth,” November 2015.
    3
    Rahman S, “Neonatal Mortality: Incidence, Correlates and Improvement Strategies,” in Perinatal Mortality, O. Ezechi, Ed. Rijeka: InTech, 2012.
    4
    Loftin RW, Habli M, Snyder CC, Cormier CM, Lewis DF, Defranco EA. Late preterm birth. Rev Obstet Gynecol 2010; 3: 10-9.
    5
    UNICEF, “Levels and trends in child mortality” 2015.
    6
    MacDorman MF, Matthews TJ, Mohangoo AD, Zeitlin J. International Comparisons of infant mortality and related factors: United States and Europe, 2010. Nat Vital Stat Rep 2014; 63: 1-6.
    7
    Misund AR, Nerdrum P, Diseth TH. Mental health in women experiencing preterm birth. BMC Pregnancy Childbirth 2014; 14: 263.
    8
    Perry RJ, Samuel VT, Petersen KF, Shulman GI. The role of hepatic lipids in hepatic insulin resistance and type 2 diabetes. HHS Public Access 2014; 510: 84-91.
    9
    Chistyakova G, Gazieva I, Remizova I, Ustyantseva L, Lyapunov V, Bychkova S. Risk factors vary early preterm birth and perinatal complications after assisted reproductive technology. Gynecol Endocrinol 2016; 32(Suppl 2): 56-61.
    10
    Rao CR, Bhat P, KE V, Kamath V, Kamath A, Nayak D, Shenoy RP, Bhat SK. Assessment of risk factors and predictors for spontaneous pre-term birth in a South Indian antenatal cohort. Clin Epidemiol Glob Heal 2018; 6: 10-6.
    11
    Fuchs F, Monet B, Ducruet T, Chaillet N, Audibert F. Effect of maternal age on the risk of preterm birth: A large cohort study. PLoS One 2018; 13: e0191002.
    12
    Morisaki N, Togoobaatar G, Vogel JP, Souza JP, Rowland Hogue CJ, Jayaratne K, et al. Risk factors for spontaneous and provider-initiated preterm delivery in high and low Human Development Index countries: a secondary analysis of the World Health Organization Multicountry Survey on Maternal and Newborn Health. BJOG 2014; 121(Suppl 1): 101-9.
    13
    Di Renzo GC, Pacella E, Di Fabrizio L, Giardina I. Preterm Birth: Risk Factors, Identification and Management,” in Management and Therapy of Late Pregnancy Complications, Cham: Springer International Publishing, 2017: 81-94.
    14
    Ahumada-Barrios ME, Alvarado GF. Risk Factors for premature birth in a hospital. Rev Lat Am Enfermagem 2016; 24: 2750.
    15
    Wang S, Yao X. Relationships between diversity of classification ensembles and single-class performance measures. IEEE Trans Knowl Data Eng 2013; 25: 206-19.
    16
    Polikar R. Ensemble based systems in decision making. IEEE Circuits Syst Mag 2006; 3: 21-45.
    17
    Brown G, Wyatt J, Harris R, Yao X. Diversity creation methods: A survey and categorisation. Inf Fusion 2005; 6: 5-20.
    18
    Tumer K, Ghosh J. Error correlation and error reduction in ensemble classifiers. Taylor Francis Online 1996: 1-24.
    19
    Wolpert DH. Stacked Generalization. Neural Networks 1992; 5: 241-59.
    20
    Opitz DW, Maclin R. Popular Ensemble Methods: An Empirical Study. J Artif Intell Res 1999; 11: 169-98.
    21
    Breiman L. Stacked regressions. Mach Learn 1996; 24: 49-64.
    22
    Vilalta R, Drissi Y. A perspective view and survey of meta-learning. Artif Intell Rev 2002; 18: 77-95.
    23
    Todorovski L, Džeroski S. Combining multiple models with meta decision trees. Zhurnal Ekspi Teor Fiz 2000.
    24
    Bittar RE, da Fonseca EB, de Carvalho MH, Martinelli S, Zugaib M. Predicting preterm delivery in asymptomatic patients with prior preterm delivery by measurement of cervical length and phosphorylated insulin-like growth factor-binding protein-1. Ultrasound Obstet Gynecol 2007; 29: 562-7.
    25
    Care AG, Sharp AN, Lane S, Roberts D, Watkins L, Alfirevic Z. Predicting preterm birth in women with previous preterm birth and cervical length ≥25 mm. Ultrasound Obstet Gynecol 2014; 43: 681-6.
    26
    Catley C, Frize M, Walker CR, Petriu DC. Predicting high-risk preterm birth using artificial neural networks. IEEE Trans Inf Technol Biomed 2006; 10: 540-9.
    27
    Ren P, Yao S, Li J, Valdes-Sosa PA, Kendrick KM. Improved prediction of preterm delivery using empirical mode decomposition analysis of uterine electromyography signals. PLoS One 2015; 10: 1-16.
    28
    Naeem SM, Seddik AF, Eldosoky MA. New technique based on uterine electromyography nonlinearity for preterm delivery detection. J Eng Technol Res 2014; 6: 107-14.
    29
    Wang SQ, Yang J, Chou KC. Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition. J Theor Biol 2006; 242: 941-6.
    30
    Chou KC, Elrod DW. Prediction of membrane protein types and subcellular locations. Proteins Struct Funct Genet 1999; 34: 137-53.
    31
    Pari R, Sandhya M, Sankar S. A Multi-Tier Stacked Ensemble Algorithm to Reduce the Regret of Incremental Learning for Streaming Data. IEEE Access 2018; 6: 48726-39.
    32
    Pari R, Sandhya M, Sankar S. A Multi-Tier Stacked Ensemble Algorithm for Improving Classification Accuracy. Comput Sci Eng 2018: 1.
    33
    Kohavi R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Appear Int Jt Conf Articial Intell 1995; 5: 1-7.
    34
    Stone M. Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B 1974: 111-47.
    35
    Pedregosa F. Scikit-learn: Machine Learning in Python. J Mach Learn Res 2011; 12: 2825-30.
    2024 ©️ Galenos Publishing House