Ground-level fine particulate matter (PM2.5) and ozone (O3) are air pollutants that can pose severe health risks. Surface PM2.5 and O3 concentrations can be monitored from satellites, but most retrieval methods retrieve PM2.5 or O3 separately and disregard the shared information between the two air pollutants, for example due to common emission sources. Using surface observations across China spanning 2014–2021, we found a strong relationship between PM2.5 and O3 with distinct spatiotemporal characteristics. Thus, in this study, we propose a new deep learning model called the Simultaneous Ozone and PM2.5 inversion deep neural Network (SOPiNet), which allows for daily real-time monitoring and full coverage of PM2.5 and O3 simultaneously at a spatial resolution of 5 km. SOPiNet employs the multi-head attention mechanism to better capture the temporal variations in PM2.5 and O3 based on previous days’ conditions. Applying SOPiNet to MODIS data over China in 2022, using 2019–2021 to construct the network, we found that simultaneous retrievals of PM2.5 and O3 improved the performance compared with retrieving them independently: the temporal R2 increased from 0.66 to 0.72 for PM2.5, and from 0.79 to 0.82 for O3. The results suggest that near-real time satellite-based air quality monitoring can be improved by simultaneous retrieval of different but related pollutants. The codes of SOPiNet and its user guide are freely available online at https://github.com/RegiusQuant/ESIDLM.