The Rater Is the Variable: PES/WES Inter-Rater Reliability After Two Decades of Use
A cross-sectional study from the University of Bern tested whether intraoral scanner images could replace digital camera photographs for PES/WES scoring — and found that the imaging method barely matters. What matters is who is holding the score sheet.
The rater, not the camera
Source Paper
Reproducibility and Reliability of Intraoral Scanners for Evaluating Peri-Implant Tissues and Implant-Supported Prostheses: A Cross-Sectional Study
The pink esthetic score has been in clinical use since 2005, and the research literature has treated it as a settled instrument for most of that time. For something carrying that institutional weight, whether a camera or a scanner produces the better image for scoring it seems almost administrative. Braun and colleagues at the University of Bern set out to answer exactly that in “Reproducibility and Reliability of Intraoral Scanners for Evaluating Peri-Implant Tissues and Implant-Supported Prostheses: A Cross-Sectional Study,” published in the Journal of Esthetic and Restorative Dentistry in 2025. They found that the imaging method is not the variable. The rater is.
The Data Anchor
Forty adult patients with single-tooth implant-supported prostheses in the maxillary aesthetic zone were enrolled at Bern between November 2023 and March 2024. Each was photographed with a Canon EOS 80D and scanned with a Trios 5 wireless intraoral scanner (3Shape); PLY true-colour files were aligned to matching orientation before scoring.
The 90 images (45 per modality) were distributed via REDCap to 20 calibrated evaluators (five periodontists, five prosthodontists, five oral surgeons, five undergraduates) who scored each image against the modified PES (max 14) and WES (max 10), with 12.5% of images duplicated for intra-rater assessment. The exercise generated 12,600 individual PES measures and 9,000 WES measures; ICCs were classified by Cicchetti’s guidelines: < 0.40 poor, 0.40–0.59 fair, 0.60–0.74 good, ≥ 0.75 excellent.
Key Findings
- Overall inter-rater agreement was only fair to good: PES ICCs ranged from 0.41 to 0.61; WES ICCs from 0.42 to 0.69 — a ceiling that gives pause for a measure embedded across two decades of research.
- The imaging method had no significant effect on PES scores (p = 0.51), nor on its interaction with clinical background (p = 0.42). For WES a modest method effect emerged (p = 0.06) with a significant interaction (p = 0.02), though mean values remained comparable.
- Periodontists scored consistently lower than all other groups for both PES and WES (PES means: 7.4 camera, 7.3 scanner; WES: 5.0 and 5.7). All comparisons reached p < 0.0001 with effects exceeding −1 for PES.
- Intra-rater reliability varied enormously within each group. Best-case ICCs reached 0.94–0.98 (excellent); worst-case dropped to fair, poor, or negative. An undergraduate’s worst PES ICC was −0.56 — below chance.
- Oral surgeons and undergraduates produced the highest mean scores, reversing the pattern in two earlier studies where periodontists ranked first.
- Limitation: static 2D screenshots of PLY files were scored rather than interactive 3D scans; single-centre design and narrow inclusion criteria (no peri-implantitis, no mucosal discolouration) limit generalisability.
The intraoral scanner performed comparably to the digital camera, which is genuinely useful. But that finding is quietly upstaged by the rater data. If the same images produce a mean PES of 7.4 under one specialist’s gaze and 9.5 under another’s, the number being reported is partly a biography of the examiner.
💡 The Clinical Bottom Line
The practical headline: intraoral scanners substitute competently for a DSLR when scoring PES and WES, removing a friction point for practices already running digital workflows. But the more consequential finding is the fair-to-good inter-rater ceiling and the periodontist effect. Any implant aesthetic study that pools PES or WES scores across examiners from different specialty backgrounds is, to some degree, pooling different measurement instruments and calling them the same thing.
When reading a PES-based comparative study, it is worth asking who scored the images and whether specialty mix was reported. Twenty years of accumulated PES data may be rather less comparable across studies than the shared label implies.
Dr Samuel Rosehill is a general dentist with a prosthodontic focus, practising at Ethical Dental in Coffs Harbour, NSW. He holds a BDSc (Hons) from the University of Queensland, an MBA, an MMktg, and an MClinDent in Fixed & Removable Prosthodontics (Distinction) from King’s College London.
Clinical Relevance
PES and WES inter-rater agreement is only fair to good (ICC 0.41–0.69) regardless of whether images come from a digital camera or intraoral scanner. Periodontists score consistently and significantly lower than other specialties. Any multi-examiner or cross-specialty comparison of PES/WES data should account for rater identity as a systematic variable, not background noise.
Disclosure: The author has no financial conflicts of interest related to the products or topics discussed in this review. This is an independent summary prepared for educational purposes.
Continue the conversation
This review is also published on Substack, where you can leave comments and join the discussion.
Read on Substack →