TY - JOUR
T1 - Using Multivariate Imputation by Chained Equations to Predict Redshifts of Active Galactic Nuclei
AU - Gibson, Spencer James
AU - Narendra, Aditya
AU - Dainotti, Maria Giovanna
AU - Bogdan, Malgorzata
AU - Pollo, Agnieszka
AU - Poliszczuk, Artem
AU - Rinaldi, Enrico
AU - Liodakis, Ioannis
PY - 2022
Y1 - 2022
N2 - Redshift measurement of active galactic nuclei (AGNs) remains a time-consuming and challenging task, as it requires follow up spectroscopic observations and detailed analysis. Hence, there exists an urgent requirement for alternative redshift estimation techniques. The use of machine learning (ML) for this purpose has been growing over the last few years, primarily due to the availability of large-scale galactic surveys. However, due to observational errors, a significant fraction of these data sets often have missing entries, rendering that fraction unusable for ML regression applications. In this study, we demonstrate the performance of an imputation technique called Multivariate Imputation by Chained Equations (MICE), which rectifies the issue of missing data entries by imputing them using the available information in the catalog. We use the Fermi-LAT Fourth Data Release Catalog (4LAC) and impute 24% of the catalog. Subsequently, we follow the methodology described in Dainotti et al. (ApJ, 2021, 920, 118) and create an ML model for estimating the redshift of 4LAC AGNs. We present results which highlight positive impact of MICE imputation technique on the machine learning models performance and obtained redshift estimation accuracy.
AB - Redshift measurement of active galactic nuclei (AGNs) remains a time-consuming and challenging task, as it requires follow up spectroscopic observations and detailed analysis. Hence, there exists an urgent requirement for alternative redshift estimation techniques. The use of machine learning (ML) for this purpose has been growing over the last few years, primarily due to the availability of large-scale galactic surveys. However, due to observational errors, a significant fraction of these data sets often have missing entries, rendering that fraction unusable for ML regression applications. In this study, we demonstrate the performance of an imputation technique called Multivariate Imputation by Chained Equations (MICE), which rectifies the issue of missing data entries by imputing them using the available information in the catalog. We use the Fermi-LAT Fourth Data Release Catalog (4LAC) and impute 24% of the catalog. Subsequently, we follow the methodology described in Dainotti et al. (ApJ, 2021, 920, 118) and create an ML model for estimating the redshift of 4LAC AGNs. We present results which highlight positive impact of MICE imputation technique on the machine learning models performance and obtained redshift estimation accuracy.
KW - AGNs
KW - BLLs
KW - FERMI 4LAC
KW - FSRQs
KW - imputation
KW - machine learning regressors
KW - MICE
KW - redshift
U2 - 10.3389/fspas.2022.836215
DO - 10.3389/fspas.2022.836215
M3 - Article
AN - SCOPUS:85127149465
VL - 9
JO - Frontiers in Astronomy and Space Sciences
JF - Frontiers in Astronomy and Space Sciences
SN - 2296-987X
M1 - 836215
ER -