The GEM Report team has developed model-based estimates of time series of school completion rates from 1990 onwards using censuses and household surveys for 157 countries. The process consists of five steps: compilation, pre-processing, reconstruction, modelling, and post-processing.


The completion rate analysis relies on censuses and nationally representative household surveys: Demographic and Health Surveys (DHS), Multiple Indicator Cluster Surveys (MICS), European Union Statistics on Income and Living Conditions (EU-SILC), other country-specific household surveys, and population censuses. Census data have been retrieved from IPUMS in extracts of 1 million observations. The data from all sources are collected at the individual level and are aggregated in the pre-processing stage.


Completion rates and sampling error computation

Household survey and census data do not present completion rates directly. Rather, raw data tend to reflect the number of years of education completed for each individual in the sample. Using the schooling schedules for each country, the number of years of education completed is used to determine whether an individual has completed a given education level. For example, if primary school is 6 years, lower secondary school is 2 years, and upper secondary school is 4 years, an individual with 10 years of education completed is encoded as having completed primary and lower secondary school, but not upper secondary school.

The individual-level completion values are then aggregated by age and sex to produce country-level completion rates. Alongside the observed completion rates, sampling variances are estimated for each aggregated observation using the clustered jackknife approach broadly used in demographic surveys for other indicators.

Data quality

Data are also assessed to ensure they meet quality standards. Data may be excluded for two reasons. First, in the aggregation process, if the number of individuals in a given age and sex aggregation included in a survey is less than or equal to 30, the observation is excluded to avoid small sample concerns. Second, plausibility concerns are divided into individual level and survey level checks.


Household surveys are generally conducted at infrequent intervals resulting in limited years with observations. In the interest of constructing a time series of completion rates, we require completion rates for a specific age group over a series of years. Specifically, the completion at age a in year y approximates the completion at age a+x in year y+x. For example, the primary school completion of 14-year-olds in 2010 can be approximated by the primary school completion of 17-year-olds in 2013. In this way, the completion rates generated by a single survey can be dispersed over many years as a time series for a selected age.

This correspondence may not hold for significantly older populations due to mortality and migration. As such, we restrict reconstructed completion rates to the most recent 20-year-interval. Through this process, each survey can contribute a time series of up to 21 observations. This reconstructed dataset is then used to estimate the underlying completion rates.


Estimating ‘timely’ completion rates introduces a number of data challenges that must be addressed. Specifically, many countries experience significant delays in completion beyond the three-year grace period inherent in the completion rate indicator, causing a mismatch between reconstructed completion rates and the true reference age completion rates. Additionally, age-misreporting distortions are present in ages that are multiples of five. Finally, household surveys may be subject to potentially large survey bias and non-sampling errors. In response to these challenges, a model is used to estimate completion rates. The model extracts a smooth underlying trend in completion for the reference age population, while quantifying and adjusting for the data considerations described above. For details on the Adjusted Bayesian Completion Rate model, see Dharamshi et al. (2022).


The model estimates completion rates for two age groups for each country-year

  • those aged 3–5 years above the expected age of completion
  • those up to 8 years above the expected age of completion.

In addition to individual countries, regional, income and other group aggregates are computed.