BACKGROUND: The Vestibular/Ocular-Motor Screening (VOMS) is a valuable component of acute (<72 hours) sports-related concussion (SRC) assessments and is increasingly used with the Immediate Post-concussion Assessment and Cognitive Testing (ImPACT) instrument and the third edition of the Sport Concussion Assessment Tool (SCAT3). Research has suggested that VOMS acute postinjury scores are useful in identifying acute concussion. However, the utility of preseason baseline measurements to improve diagnostic accuracy remains ambiguous. To this end, there is a need to determine how reliable VOMS baseline assessments are across years and whether incorporating individuals' baseline performance improves diagnostic yield for acute concussions. PURPOSE: To analyze VOMS, SCAT3, and ImPACT to evaluate the test-retest reliability of consecutive-year preseason baseline assessments to directly compare the diagnostic utility of these tools when incorporating baseline assessments versus using postinjury data alone to identify acute SRC. STUDY DESIGN: Cohort study (diagnosis); Level of evidence, 2. METHODS: Preseason and postinjury VOMS, SCAT3, ImPACT Post-Concussion Symptom Scale (PCSS), and ImPACT composite scores were analyzed for 3958 preseason (47.7% female) and 496 acute (=48 hours) SRC (37.5% female) collegiate athlete evaluations in the National Collegiate Athletic Association-Department of Defense Concussion Assessment Research and Education Consortium. Descriptive statistics, Kolmogorov-Smirnov significance, and Cohen d effect size were calculated. Consecutive-year baseline reliability was evaluated for a subset of 447 athlete encounters using Pearson r, Cohen kappa, Cohen d, and 2-way mixed intraclass correlation coefficients (ICCs). Wilcoxon signed rank tests were used to determine the statistical significance between population performances, and the 90% reliable change index (RCI) was calculated from the test-retest results. Preseason to postinjury change scores were then calculated from each tool's RCI. Finally, receiver operating characteristic (ROC) curve analyses were conducted, and DeLong method was used to compare the area under the curve (AUC) of raw postinjury scores versus change scores from preseason baseline assessments. Potential effects of sex, medical history (learning disorders or attention-deficit/hyperactivity disorder), and outlier data were also explored. RESULTS: Effect sizes were large, and overall predictive utilities were clinically useful for postinjury VOMS Total (d = 2.44; AUC = 0.85), the SCAT3 Symptom Evaluation total severity score (d = 1.74; AUC = 0.82), and the ImPACT PCSS total severity score (d = 1.67; AUC = 0.80). Comparatively, effect sizes were small and predictive utilities were poor for Standardized Assessment of Concussion (SAC), modified Balance Error Scoring System (mBESS), and all ImPACT composites (d = 0.11-0.46; AUC = 0.48-0.59). Preseason baseline test-retest reliability was poor to moderate (r = 0.23-0.52; kappa = 0.32-0.36; ICC = 0.36-0.68) for all assessments except ImPACT Visual Motion Sensitivity (r = 0.73; ICC = 0.85). Incorporating baseline scores for VOMS Total, SCAT3 (Symptom Evaluation, SAC, mBESS), ImPACT PCSS, or ImPACT composites did not significantly improve AUCs. CONCLUSION: VOMS Total and symptom severity (SCAT3, PCSS) total scores had large effect sizes and clinically useful AUCs for identifying acute concussion. However, all tools demonstrated high within-patient test-retest variability, resulting in poor reliability. The findings in this sample of collegiate athletes suggest that incorporating baseline assessments does not significantly increase diagnostic yield for acute concussion.