Guide to the Census of Population, 2016
Appendix 1.8 – Evaluation of the impact of updating 2016 Census language data

After Statistics Canada was notified of anomalies, for various regions in Quebec, in the 2016 Census language data released on August 2, it conducted an in-depth investigation to identify their source.

During the census, Statistics Canada follows up with households that partially completed the census questionnaire. For the 2016 Census, Statistics Canada developed a computer program to perform certain steps of this operation. An error was identified in this computer program that affected French-language questionnaires.

The census language questions are the only questions for which response options differ between the French and English versions. The French version of the census questionnaire has given precedence to French in the wording of the questions and the response options. This only affects census questions on mother tongue, language spoken at home and knowledge of official languages. The example below illustrates this difference with respect to language spoken most often at home. This distinction was not taken into account by the new computer program used for partial non-response follow-up in 2016.

Bilingual figure showing Question 8 a) from the 2016 Census of Population questionnaire

Description for Figure

This bilingual figure shows Question 8 a) from the 2016 Census of Population questionnaire. The left portion of the figure displays the question in English which asks "What language does this person speak most often at home." The answer categories are "English; French; Other language – specify." The right portion of the figure displays the question in French which asks "Quelle langue cette personne parle-t-elle le plus souvent à la maison." The answer categories are "Français; Anglais; Autre langue – précisez."

The resulting error led to incorrect allocation of responses to the census language questions for roughly 61,000 individuals, mainly in Quebec. It resulted in an overestimation of the growth of English in Quebec between 2011 and 2016, both as a mother tongue and as a language spoken at home.

After correcting these allocation errors, Statistics Canada conducted an in-depth review to ensure that no other census questions were affected by an error and that the computer programs did not affect other variables. Moreover, Statistics Canada extensively reviewed the many data editing and control stages.

The results of this analysis and the corrective steps taken are described below.

Evaluation of the change in the allocation of language variables on income estimates in Census 2016

Following the detection of the language allocation error, Statistics Canada conducted a review to determine if the error negatively impacted the quality of income estimates. After a thorough review of systems, programs and estimation procedures, Statistics Canada concluded that the impact on the income estimates was negligible.

For the 2016 Census estimates on income, high-quality administrative data (including tax data from the Canada Revenue Agency) were used for approximately 95% of Canadians aged 15 and over.Note 1 Therefore, the incorrect allocation of the language variable could only potentially affect income estimates for the remaining 5% of records. The way in which this could happen is through the donor imputation process. In that process, income values that could not be matched to the respondents' tax data (the recipient) are copied from other respondents with similar characteristics (the donor). The incorrect allocation of the language variable could affect the income estimate for that record in one of two ways:

  1. A recipient record with a miscoded language is imputed with a donor. It could be that a donor with different income would have been chosen had the recipient's language been coded correctly.
  2. A recipient record with a correctly coded language is imputed with a donor. It could be that a donor was incorrectly chosen, based partly upon the donor having a miscoded language.

In this donor imputation process, donors are chosen by a score which reflects their similarity to the recipient. Usually, a donor is randomly chosen from among several similarly qualified donors. It is important to underscore that there is variability inherent in the donor imputation process, and any re-running of the donor imputation system would result in slightly different estimates due to this variability.Note 2

Statistics Canada conducted an analysis to determine whether the language error affected the results of the donor imputation process. To begin with, the potential effect of the language error would be considerably mitigated by the following factors:

  1. Very few imputed cases were affected by the language allocation error. Altogether, there were only 5,500 records across Canada which potentially used this miscoded language information in income imputation, either as a recipient or donor of information. These records were concentrated in Quebec (5,100), and represented only 0.06% of the Quebec population.
  2. These records were not concentrated in a particular municipality in Quebec (Census Subdivision or CSD); rather, they were distributed among many municipalities.
  3. Language is only one of several variables used in the donor imputation process. The donor selection variables include age, sex, geography and language. A recipient's characteristics are matched as closely as possible to a donor's characteristics across all of these dimensions. Because many dimensions are used, the importance of an error in any one dimension is significantly reduced.
  4. Donors that are a close match on other characteristics will also tend to have similar or equal income information, reducing the impact of using a different donor.

The analysis involved re-running certain steps of the income imputation process; first to assess the amount of variability in estimates arising from the imputation process itself; and second to assess the size of the change arising from re-imputing the cases affected by the language allocation errors. If the size of the change arising from re-imputing the cases affected by the language allocation error was small relative to the variability in estimates arising from the imputation process itself, then it could be concluded that the impact of the language allocation error on income estimates was negligible.

Table 1 illustrates the variability introduced to the estimates by the imputation system overall.Note 3 The table focusses on results from Quebec, where most of the affected cases were found. It shows results averaged across CSDs according to size, for median total income and median wages.Note 4 When the data is re-imputed for all CSDs, income estimates may rise or fall. For example, for a CSD in the population range of 20,000 to 99,999, when the total income value increased, it increased by an average of $16, and when it decreased it decreased by an average of $15. This is an illustration of the small variability in the estimate that derives from donor imputation.

Table 2 shows what would be the effect of correcting only those records whose language allocation changed.Note 5 For CSDs in this size class, in cases where the income estimate rose, it rose by $3 on average; and in cases where it fell, it fell by $4 on average. Thus, the change in income estimates resulting from correcting the language error and re-imputing the results is small and falls within the variability inherent in the imputation process, and therefore has a negligible impact on the results. This was also true for different CSD size classes and for estimates of wages.Note 6

Conclusion

The analysis of the potential impact of the language allocation error on estimates for income from the 2016 Census concluded that the error had a very negligible impact. This was to be expected, given that the vast majority (95%) of records are matched to their tax data, and very few respondents had income estimates that were affected by the language allocation error. Based upon these results, there was no statistical need to re-compute income estimates for the 2016 Census.

Table 1
Comparing income estimates generated through two imputation runs, total income and wages, census subdivisions (CSDs), Quebec
Table summary
Comparing income estimates generated through two imputation runs, total income and wages, census subdivisions (CSDs), Quebec. The information is grouped by census subdivision population size (appearing as row headers), Values from one imputation, Values from another imputation, Average imputation effect, Average of positive effects and Average of negative effects, calculated using Median total income (in dollars) and Median wages and salaries ($) (in dollars) units of measure (appearing as column headers).
CSD population size Values from one imputation Values from another imputation Average imputation effect Average of positive effects Average of negative effects
Median total income ($)
250 to 9,999 30,128 30,129 −3 69 −67
10,000 to 19,999 36,958 36,952 −10 20 −22
20,000 to 99,999 36,478 36,477 −5 16 −15
100,000+ 34,469 34,468 2 13 −8
Median wages and salaries ($)
250 to 9,999 28,534 28,530 2 97 −96
10,000 to 19,999 35,725 35,722 −1 29 −32
20,000 to 99,999 35,606 35,607 −2 12 −16
100,000+ 33,957 33,955 1 6 −5
Table 2
Comparing income estimates with expected estimates after re-imputing records with a language allocation error, total income and wages, census subdivisions (CSDs), Quebec
Table summary
This table displays the results of Comparing income estimates with expected estimates after re-imputing records with a language allocation error, total income and wages, census subdivisions (CSDs), Quebec. The information is grouped by census subdivision population size (appearing as row headers), Expected value (without correcting language error), Expected value (after correcting language error), Average change, Average of positive changes and Average of negative changes, calculated using Median total income (in dollars) and Median wages and salaries (in dollars) units of measure (appearing as column headers).
CSD population size Expected value (without correcting language error) Expected value (after correcting language error) Average change Average of positive changes Average of negative changes
Median total income ($)
250 to 9,999 30,133 30,129 −3 24 −31
10,000 to 19,999 36,957 36,955 −2 3 −6
20,000 to 99,999 36,477 36,477 0 3 −4
100,000+ 34,470 34,469 0 2 −1
Median wages and salaries ($)
250 to 9,999 28,535 28,536 1 44 −41
10,000 to 19,999 35,726 35,726 0 8 −8
20,000 to 99,999 35,609 35,607 −2 5 −6
100,000+ 33,955 33,956 0 3 −2

Notes

Date modified: