How we measure the 67% claim.
Pearl claims its AI marking platform, SAM, cuts assessor marking time by 67% on Ofqual-regulated qualifications. This page explains how that number is measured, what counts as marking time, the sample it is drawn from, and the caveats. Written for FE procurement teams who want to verify the maths before signing.
The headline
Across AY 2024-25, SAM-assisted marking reduced assessor active marking time by a mean of 67% per submission, with a median of 64% and an interquartile range of 58% to 73%. The sample covered Ofqual-regulated qualifications at Levels 2, 3 and 4, including BTECs, NVQs, Functional Skills and Access to HE units.
What we mean by marking time
Marking time is the active time an assessor spends:
- Opening a submission and reading the work.
- Applying the rubric, criterion by criterion.
- Writing assessor feedback and developmental comments.
- Confirming the final grade.
Marking time excludes:
- IQA sampling and verification time.
- Learner-facing communication and resubmission discussions.
- General administration, planning and CPD.
- One-off rubric training time when SAM is first deployed on a new qualification.
How the baseline is captured
Before SAM is enabled on a qualification, the assessor marks a cohort of submissions in the standard Pearl interface. Active time is measured from submission-open to grade-confirmed, with idle-tab detection pausing the timer after 90 seconds of inactivity. Assessor self-report time logs are reconciled against platform timestamps. The baseline is the mean active marking time per submission across that cohort, computed per assessor per qualification.
How the SAM-assisted time is captured
Once SAM is enabled on the qualification, every submission is processed by SAM on upload. The assessor sees a proposed grade, criterion-level rationale, and suggested feedback. The assessor reviews, edits where needed, and confirms the final mark. Active time is measured on the same submission-open to grade-confirmed basis, with the same idle-tab detection. The SAM-assisted time is the mean active time per submission across the same qualification cohort.
Time saved per submission is computed as:
Time saved % = (Baseline time − SAM-assisted time) / Baseline time × 100
Per-assessor figures are then aggregated to the qualification level, the provider level, and finally the cross-provider mean.
The sample
- Time period: AY 2024-25, September 2024 to July 2025.
- Providers: 4 UK FE providers (2 colleges, 2 ITPs).
- Assessors: 47, each with at least 30 baseline and 30 SAM-assisted submissions on the same qualification.
- Submissions: 8,412 marked submissions, paired across baseline and SAM-assisted conditions on matched qualifications.
- Qualification mix: Levels 2 to 4 across BTEC, NVQ, Functional Skills, Access to HE. Excluded: open portfolio assessments without a fixed rubric, group projects, and observed practical assessments.
Caveats and what we do not claim
Three caveats matter when you read the 67% figure:
- Rubric complexity drives variance. Time saved is strongest on structured written assessments with well-defined rubrics (mean 71%) and weakest on open-ended written work with high inter-criterion overlap (mean 54%). Open portfolio work without a fixed rubric was excluded from the sample.
- One-off setup cost is not netted off. Each qualification requires 4 to 12 hours of rubric configuration and calibration before SAM is reliable. This is borne by the lead assessor or IQA and is not counted in the per-submission marking time.
- Assessor authority adds time on a minority. Around 8% of submissions in the sample involved an assessor materially disagreeing with SAM's proposed grade. On those submissions, the assessor spent on average 22% longer than they would have done marking from scratch, because they reviewed both SAM's proposal and their own working. This is included in the SAM-assisted time average, so the headline 67% is net of those cases.
We do not claim:
- That SAM marks without an assessor. The assessor confirms every grade and retains final mark authority.
- That 67% applies to every qualification. Levels 1 and 5 are not in the sample.
- That the figure will hold at every provider. Provider-level means in the sample ranged from 58% to 74%.
How to verify with your own data
Procurement teams can replicate the measurement in a paid pilot:
- Pick one qualification, one cohort.
- Run 4 weeks of baseline marking in the Pearl interface without SAM enabled.
- Enable SAM, run a further 4 weeks on the same cohort.
- Pearl provides the raw timestamps, per-assessor active-time logs, and a reconciliation report. You compute time saved against your own definitions.
Pilot pricing and scope are agreed in advance. The pilot is paid because the rubric configuration and assessor training cost real time at our end.
FAQ
Who validated the methodology?
The measurement methodology was reviewed internally by Pearl's product and assessment leads, and externally by an IQA practitioner at one of the four sample providers. It has not been peer-reviewed in an academic journal. We are open to a third-party audit and will publish any subsequent revisions on this page.
Where can I see the raw data?
Anonymised per-assessor, per-qualification time logs are available under NDA for procurement teams in active evaluation. Email sales@epearl.co.uk to request access.
Does the 67% include marking quality?
No. This page only measures time. Mark quality is measured separately, against IQA agreement rates, inter-assessor reliability and learner outcome data. We will publish a separate methodology page on mark quality once the AY 2025-26 dataset is complete.
Run the 67% test on your own provision.
Book a 20-minute call. We will scope a paid pilot against one of your qualifications and agree the measurement protocol up-front.
Book a 20-min demo →