Normative and Toxic Reference Values for 7-Ring mfERG in Hydroxychloroquine Retinopathy: Machine Learning Prediction of Diagnostic Thresholds

Walker C1,2, Eiffert S3, Ali H1,2, Grigg P1,2,5, McCluskey P1,2,4, Cornish E1,2,6

1Save Sight Institute, University of Sydney, 2Sydney Eye Hospital, 3School of Computer Science. University of Sydney, 4Royal Prince Alfred Hospital, 5Westmead Children’s Hospital, 6Westmead Hospital

Biography:

Charles Walker is a medical officer at Royal North Shore Hospital, and previous senior resident at Sydney Eye Hospital. He serves as an Associate Clinical Lecturer at the Save Sight Institute at the University of Sydney, with a dedicated focus on pursuing a career in ophthalmology.

Abstract:

Background

Hydroxychloroquine retinopathy is a dose-dependent, irreversible complication of autoimmune therapy. Multifocal electroretinography (mfERG) is widely used for detecting hydroxychloroquine retinopathy. Diagnostic mfERG reference values have been restricted to lower-resolution protocols. We aimed to establish the first normative and toxic thresholds for high-resolution 103-hexagon, 7-ring mfERG and to evaluate whether machine learning (ML) classifiers outperform traditional ratio-based interpretation.

Methods

In this retrospective case-control study, we analyzed 103-hexagon mfERG responses from 59 hydroxychloroquine-toxic and 39 non-toxic eyes at a quaternary referral center. We systematically evaluated all 42 possible pairwise ring amplitude ratios to determine optimal diagnostic cut-offs. The diagnostic performance of these standard clinical indices was then benchmarked against three ML models—Random Forest, Logistic Regression, and Support Vector Machines—trained on standardized ring amplitude data.

Results

Parafoveal-to-foveal ratios (R1/R2 and R1/R3) emerged as the strongest univariate markers (AUC ~0.69). We defined specific “rule-in” thresholds (R1/R2 ≥1.83; R1/R3 ≥3.31) that achieved >92% specificity, though sensitivity remained moderate (~49%). Consequently, standard ratio-based classification yielded a maximum balanced accuracy of only 0.70. In contrast, ML models significantly outperformed traditional methods; Random Forest and Logistic Regression achieved balanced accuracies of 0.88 and F1 scores >0.90, demonstrating superior precision in distinguishing toxic from non-toxic eyes.

Conclusions

This study establishes the first comprehensive reference dataset for 7-ring mfERG in hydroxychloroquine retinopathy. While standard ratios provide robust confirmatory markers, they lack sensitivity as standalone screening tools. We demonstrate that algorithmic analysis of high-resolution electrophysiology offers a superior diagnostic paradigm, supporting the integration of AI-driven strategies into future toxicity screening.