Background/Aim: Post-authorization vaccine safety surveillance is highly sensitive but constrained by noise, reporting delays, and symptom-level fragmentation, leading to late and clinically diffuse signals. We aimed to develop a machine learning–driven heterogeneous graph framework to detect emerging vaccine safety signals earlier and identify clinically coherent, multisystem adverse-event phenotypes from real-world surveillance data.
Methods: We analyzed publicly available raw VAERS data spanning 1990–2024, comprising over 2.3 million individual reports with linked vaccine administration records, demographics, MedDRA-coded symptoms, and onset-time information. Weekly heterogeneous graphs were constructed with node types representing vaccine product, manufacturer, dose series, age–sex strata, onset-interval bins, symptom terms, co-reported conditions or medications, and report identifiers. Relation-aware graph neural networks were trained to learn temporally resolved embeddings that capture higher-order co-occurrence patterns and dynamic structure beyond tabular surveillance models. Emerging safety signals were quantified using a hybrid anomaly detection strategy combining graph autoencoder reconstruction error and contrastive temporal drift relative to a rolling 52-week reference window, producing a calibrated Signal Acceleration Score (SAS). Syndrome-like adverse-event phenotypes were identified using hierarchical clustering in the learned embedding space with bootstrap-based stability selection. Validation included comparison against empirical Bayes disproportionality benchmarks, negative-control vaccines and outcomes, temporal robustness metrics, and week-level block bootstrap uncertainty estimation.
Results: Across 12 historically established vaccine safety benchmarks, the proposed framework detected signals a median 9.5 weeks earlier than classical disproportionality methods (IQR 6.0–14.0 weeks). At a fixed alert budget, Precision@20 was 0.80 compared with 0.55 for baseline methods (absolute improvement 0.25; 95% CI 0.12–0.38; p=0.001), while false-positive alerts among negative controls were reduced by 37% (95% CI 24–48; p<0.001). Clustering identified seven reproducible multisystem phenotypes with strong internal coherence (mean silhouette 0.41; 95% CI 0.36–0.45) and high temporal stability (median Jaccard overlap 0.72). Distinct onset-time signatures differentiated phenotypes, including immediate hypersensitivity (≤1 day), neuro-sensory syndromes (2–5 days), and cardio-inflammatory patterns (3–7 days; p<1×10⁻⁴), supporting clinically meaningful stratification beyond single-term signals. High-SAS clusters showed consistent week-on-week growth rates exceeding baseline expectations by 1.8–2.4-fold, with signal persistence across ≥6 consecutive weeks and convergence of symptom composition, supporting emergent syndromic patterns beyond transient reporting artifacts, with reproducible temporal ordering across independent bootstrap windows.
Conclusions: Heterogeneous graph machine learning enables earlier, phenotype-level vaccine safety signals with quantifiable acceleration and clinical interpretability. This framework advances scalable, decision-relevant pharmacovigilance by prioritizing emerging multisystem safety patterns in real-world surveillance.