Integrating Machine Learning for Predictive Epidemiological Modeling of Toxoplasma Gondii Infection in Occupationally Exposed Populations of Lahore

Authors

  • Namrah Salahudin Lahore College for Women University, Lahore, Pakistan
  • Sheikh Muhammad Ibraheem The University of Lahore, Lahore, Pakistan
  • Irfa Khurram Information Technology University, Lahore, Pakistan
  • Raheel Muzzammel The University of Lahore, Lahore, Pakistan

DOI:

https://doi.org/10.69591/jcihs.3.1.1

Keywords:

Toxoplasma gondii, Seroprevalence, , Machine Learning , Occupational Health, AI Epidemiology, Risk Prediction, MATLAB

Abstract

Background: Toxoplasma gondii infection poses a significant health risk, particularly among occupationally exposed populations. Early detection and risk prediction are crucial for effective public health interventions.

Objective: This study aims to develop a machine learning-driven approach to predict and evaluate the risk of Toxoplasma gondii infection among occupationally exposed employees in Lahore, Pakistan.

Methods: A total of 120 participants, including 60 sewage workers, 30 gardeners, and 30 construction workers, were assessed using biological assays (ELISA and PCR) alongside socio-demographic information, including age, education, hygiene practices, and pet ownership. Three supervised learning algorithms, Logistic Regression, Decision Tree, and Random Forest were applied to model infection risk. Model performance was evaluated using accuracy, precision, recall, F1-score, and ROC-AUC. Analysis of feature importance identified the key predictors of infection

Results: The Random Forest classifier outperformed other models, achieving 92% accuracy and an AUC of 0.90. The analysis revealed that cat ownership cat ownership, poor hygiene, and low educational level were the strongest predictors of infection risk.

Conclusion: Integrating machine learning with traditional serological assays provides a reliable, data-driven framework for early detection and risk stratification of Toxoplasma gondii infection. This approach can inform targeted public health interventions for high-risk occupational groups.

Author Biographies

Namrah Salahudin, Lahore College for Women University, Lahore, Pakistan

 

 

Raheel Muzzammel, The University of Lahore, Lahore, Pakistan

 

 

References

1. Chen S, Yu J, Chamouni S, Wang Y, Li Y. Integrating machine learning and artificial intelligence in life-course epidemiology: pathways to innovative public health solutions. BMC Med. 2024;22(1). doi:10.1186/s12916-024-03566-x.

2. Saingam P, et al. Integrating socio-economic vulnerability factors improves neighborhood-scale wastewater-based epidemiology for public health applications. Water Res. 2024;254:121415. doi:10.1016/j.watres.2024.121415.

3. Shturmin S, Mangalathu S, Jeon JS. Application of latent variable models for hidden pattern identification and machine learning prediction improvement in structural engineering. Eng Appl Artif Intell. 2025;156:111282. doi:10.1016/j.engappai.2025.111282.

4. Magazzino C, Haroon M. The interrelation among environmental quality, public accounts, and macroeconomic fundamentals: an analysis of OECD countries using machine learning techniques. Environ Dev. 2025;101175. doi:10.1016/j.envdev.2025.101175.

5. Giddings R, et al. Factors influencing clinician and patient interaction with machine learning-based risk prediction models: a systematic review. Lancet Digit Health. 2024;6(2):e131–e144. doi:10.1016/S2589-7500(23)00241-8.

6. Liza IA, et al. Heart disease risk prediction using machine learning: a data-driven approach for early diagnosis and prevention. Br J Nurs Stud. 2025;5(1):38–54. doi:10.32996/bjns.2025.5.1.5.

7. Sagastabeitia G, Doncel J, Aguilar J, Fernández Anta A, Ramírez JM. COVID-19 seroprevalence estimation and forecasting in the USA from ensemble machine learning models using a stacking strategy. Expert Syst Appl. 2024;258:124930. doi:10.1016/j.eswa.2024.124930.

8. Salari N, et al. Global seroprevalence of Toxoplasma gondii in pregnant women: a systematic review and meta-analysis. BMC Pregnancy Childbirth. 2025;25(1). doi:10.1186/s12884-025-07182-2.

9. Akindahunsi T, Olulaja O, Ajayi O, Onyenegecha IP, Hanson U, Fadojutimi B. Analytical tools in diseases epidemiology and surveillance: a review of literature. Int J Appl Res. 2024;10(9):155–161. doi:10.22271/allresearch.2024.v10.i9c.12018.

10. Saini J, et al. Diagnostic and prognostic accuracy of MMPs and TIMPs in oral cancer patients on ELISA as compared to immunohistochemistry. Indian J Surg Oncol. 2024. doi:10.1007/s13193-024-02113-7.

11. Morales SV, Coelho GM, Ricciardi-Jorge T, Dorl GG, Zanluca C, Duarte dos Santos CN. Development of a quantitative NS1 antigen enzyme-linked immunosorbent assay for Zika virus detection using a novel virus-specific mAb. Sci Rep. 2024;14(1):2544. doi:10.1038/s41598-024-52123-2.

12. Chen PK, Lu PL, Ito E, Yang TY. Enzyme-linked immunosorbent STI assays: development, current status and future perspective. J Microbiol Immunol Infect. 2025. doi:10.1016/j.jmii.2025.08.018.

13. Holzhauser T, Röder M. Polymerase chain reaction (PCR) methods for detecting allergens in foods. In: Elsevier eBooks. 2025. p. 211–227. doi:10.1016/B978-0-12-821733-7.00022-7.

14. Wang H, Song Y, Bi H. Optimizing public health management with predictive analytics: leveraging the power of random forest. Front Big Data. 2025;8. doi:10.3389/fdata.2025.1574683.

15. Nnaemeka J, Kadiri NC, Williams, Oluwamayowa N A, Samson NA. Applying AI and machine learning for predictive stress analysis and morbidity assessment in neural systems: a MATLAB-based framework. World J Adv Res Rev. 2024;23(3):063–081. doi:10.30574/wjarr.2024.23.3.2645.

16. Jones L, Barnett A, Vagenas D. Linear regression reporting practices for health researchers: a cross-sectional meta-research study. PLoS One. 2025;20(3):e0305150. doi:10.1371/journal.pone.0305150.

17. Abdulqader HA, Abdulazeez AM. Review on decision tree algorithm in healthcare applications. Indones J Comput Sci. 2024;13(3). doi:10.33022/ijcs.v13i3.4026.

18. Iorhemen AS. Random forest ensemble machine learning model for early detection and prediction of weight category. J Data Sci Intell Syst. 2023. doi:10.47852/bonviewjdsis32021149.

19. Erazo BJ, Knoll LJ. Toxoplasma gondii at the host interface: immune modulation and translational strategies for infection control. Vaccines. 2025;13(8):819. doi:10.3390/vaccines13080819.

20. Amiri Z, Khademvatan S, Kazemi T, Yousefi E. Seroprevalence and risk factors associated with toxoplasmosis and hydatidosis among butchers of Tabriz city, northwest Iran: a case-control study. J Occup Med Toxicol. 2024;19(1). doi:10.1186/s12995-024-00427-4.

21. Cukurova M. The interplay of learning, analytics and artificial intelligence in education: a vision for hybrid intelligence. Br J Educ Technol. 2024. doi:10.1111/bjet.13514.

22. EBSCO. Results – OpenURL connection. 2025 [cited 2025 Oct 10]. Available from: https://openurl.ebsco.com/openurl?sid=ebsco:plink:scholar&id=ebsco:gcd:181835921&crl=c (accessed Oct. 10, 2025.

23. Vadisetty R. Advancing predictive modelling in healthcare: a data science approach utilizing AI-driven algorithms. In: Proc OITS Int Conf Inf Technol (OCIT); 2024 Dec. p. 363–368. doi:10.1109/OCIT65031.2024.00070.

Downloads

Published

2025-06-30

Issue

Section

Articles