The Rater Is the Variable: PES/WES Inter-Rater Reliability After Two Decades of Use

The pink esthetic score has been in clinical use since 2005, and the research literature has treated it as a settled instrument for most of that time. For something carrying that institutional weight, whether a camera or a scanner produces the better image for scoring it seems almost administrative. Braun and colleagues at the University of Bern set out to answer exactly that in “Reproducibility and Reliability of Intraoral Scanners for Evaluating Peri-Implant Tissues and Implant-Supported Prostheses: A Cross-Sectional Study,” published in the Journal of Esthetic and Restorative Dentistry in 2025. They found that the imaging method is not the variable. The rater is.

The Data Anchor

Forty adult patients with single-tooth implant-supported prostheses in the maxillary aesthetic zone were enrolled at Bern between November 2023 and March 2024. Each was photographed with a Canon EOS 80D and scanned with a Trios 5 wireless intraoral scanner (3Shape); PLY true-colour files were aligned to matching orientation before scoring.

The 90 images (45 per modality) were distributed via REDCap to 20 calibrated evaluators (five periodontists, five prosthodontists, five oral surgeons, five undergraduates) who scored each image against the modified PES (max 14) and WES (max 10), with 12.5% of images duplicated for intra-rater assessment. The exercise generated 12,600 individual PES measures and 9,000 WES measures; ICCs were classified by Cicchetti’s guidelines: < 0.40 poor, 0.40–0.59 fair, 0.60–0.74 good, ≥ 0.75 excellent.

Key Findings

Overall inter-rater agreement was only fair to good: PES ICCs ranged from 0.41 to 0.61; WES ICCs from 0.42 to 0.69 — a ceiling that gives pause for a measure embedded across two decades of research.
The imaging method had no significant effect on PES scores (p = 0.51), nor on its interaction with clinical background (p = 0.42). For WES a modest method effect emerged (p = 0.06) with a significant interaction (p = 0.02), though mean values remained comparable.
Periodontists scored consistently lower than all other groups for both PES and WES (PES means: 7.4 camera, 7.3 scanner; WES: 5.0 and 5.7). All comparisons reached p < 0.0001 with effects exceeding −1 for PES.
Intra-rater reliability varied enormously within each group. Best-case ICCs reached 0.94–0.98 (excellent); worst-case dropped to fair, poor, or negative. An undergraduate’s worst PES ICC was −0.56 — below chance.
Oral surgeons and undergraduates produced the highest mean scores, reversing the pattern in two earlier studies where periodontists ranked first.
Limitation: static 2D screenshots of PLY files were scored rather than interactive 3D scans; single-centre design and narrow inclusion criteria (no peri-implantitis, no mucosal discolouration) limit generalisability.

The intraoral scanner performed comparably to the digital camera, which is genuinely useful. But that finding is quietly upstaged by the rater data. If the same images produce a mean PES of 7.4 under one specialist’s gaze and 9.5 under another’s, the number being reported is partly a biography of the examiner.

💡 The Clinical Bottom Line

The practical headline: intraoral scanners substitute competently for a DSLR when scoring PES and WES, removing a friction point for practices already running digital workflows. But the more consequential finding is the fair-to-good inter-rater ceiling and the periodontist effect. Any implant aesthetic study that pools PES or WES scores across examiners from different specialty backgrounds is, to some degree, pooling different measurement instruments and calling them the same thing.

When reading a PES-based comparative study, it is worth asking who scored the images and whether specialty mix was reported. Twenty years of accumulated PES data may be rather less comparable across studies than the shared label implies.

Dr Samuel Rosehill is a general dentist with a prosthodontic focus, practising at Ethical Dental in Coffs Harbour, NSW. He holds a BDSc (Hons) from the University of Queensland, an MBA, an MMktg, and an MClinDent in Fixed & Removable Prosthodontics (Distinction) from King’s College London.

Reference: Braun D, Chappuis V, Fonseca M, Raabe C, Suter VGA, Couso-Queiruga E. Reproducibility and Reliability of Intraoral Scanners for Evaluating Peri-Implant Tissues and Implant-Supported Prostheses: A Cross-Sectional Study. Journal of Esthetic and Restorative Dentistry, 2025. DOI: 10.1111/jerd.13408