Statistics Canada
Symbol of the Government of Canada
Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

2006 Census processing

Crunching the numbers

During the first two weeks of May 2006, 70% of Canadian households received census questionnaires through the mail, while the remaining 30% had their questionnaire delivered by an enumerator, as in past censuses. In 2006, respondents "counted themselves in" when they completed their questionnaire online or mailed it back.

The processing phase of the census began as early as May 2, since households could choose to complete their forms online as soon as they received their paper questionnaire. This phase began with the process of translating responses from approximately 12.7 million households into meaningful data. This part of the census cycle is divided into six main activities:

  • Receipt and registration
  • Imaging and data capture from paper questionnaires
  • Edits and failed edit follow-up
  • Automated coding
  • Edit and imputation
  • Weighting.

Glossary of terms

block canvass - a physical verification and validation field exercise to update address listings in most urban areas to produce a complete and reliable list of addresses (which are listed on the Block Canvass Register), allowing for the mail-out of census questionnaires in selected urban and urbanized areas. Block canvass covers about 70% of dwellings in Canada.

census - a statistical portrait of Canada on one particular day: May 16, 2006. The census reflects demographic, social and economic information about people, housing units and agricultural operations in Canada on that day.

collection unit (CU) - refers to all dwellings in a geographic unit. There are 50,000 collection units for the 2006 Census.

list/leave (L/L) - refers to the delivery (as opposed to mail-out) of questionnaires to 30% of dwellings in Canada by a census enumerator.

visitation record (VR) - is a document used by enumerators in L/L areas to record summary information about each dwelling. There is one VR for each collection unit.

Receipt and registration - May to July 2006

Data Processing Centre staff in Gatineau, Quebec, were responsible for the registration of completed census questionnaires. Initial receipt, via the reading of barcodes through the see-through portion of the return envelope, was conducted by Canada Post.

For the 2006 Census, Canada Post delivered the mailed-back paper questionnaires to the Data Processing Centre while electronic questionnaires were transmitted directly. Electronic questionnaires were registered automatically and paper questionnaires were checked in by scanning the barcode on the front of the questionnaire.

Imaging - May to July 2006

The 2006 Census was the first census to capture data automatically using automated capture technologies rather than manual keying.

Steps in imaging

  • Document preparation - mailed-back questionnaires were removed from envelopes and foreign objects such as clips and staples were removed in preparation for scanning. Forms that were in the booklet format were separated into single sheets.
  • Scanning - scanning, using 18 high-speed scanners, converted paper to digital images (pictures).
  • Automated image quality assurance - an automated system verified the quality of the scanning. Images failing this process were flagged for rescanning or keying from paper.
  • Automated data capture - Optimal Mark Recognition and Intelligent Character Recognition technologies were used to extract respondents' data. Where the systems could not recognize the handwriting, data repair was done by an operator.
  • Check-out - as soon as the questionnaires had been processed through all of the above steps, the paper questionnaires were checked out of the system. Check-out is a quality assurance process that ensures the images and captured data are of sufficient quality that the paper questionnaires are no longer required.

Automated editing - May to July 2006

Some automated completion editing simulating edits an enumerator would have done in previous censuses were performed at this stage to check for completeness, consistency and coverage. Multiple responses for one household may have been received and flagged for subsequent interactive verification if an error was identified.

Failed edit follow-up - May to July 2006

When a missing or invalid response is uncovered, interactive verification may be used to resolve failures by manually examining the captured data and scanned images (where available) to help determine the appropriate response. When necessary, failed household questionnaire data were transmitted to a regional Census Help Line site for failed edit follow-up, where an operator contacted the respondent and completed the information using a computer-assisted telephone interviewing application. The data was transmitted back to the Data Processing Centre for reintegration into the system for subsequent processing.

Automated coding - May to October 2006

The 2B long-form questionnaire contained questions where answers could be checked off against a list, as well as questions where the respondent had to write in an answer in the boxes provided. These written responses had to be converted to numerical codes before they could be tabulated for release purposes. For the 2006 Census, all written responses on the long questionnaires underwent automated and computer-assisted coding to assign each one a numerical code using Statistics Canada reference files, code sets and standard classifications. Reference files were built using actual responses from past censuses for the automated match process. Specially trained coders and experts resolved cases that could be assigned automatically.

The variables for which coding applied were:

  • Relationship to person 1
  • Place of birth
  • Citizenship
  • Non-official languages
  • Home language
  • Mother tongue
  • Ethnic origin
  • Population group
  • Indian band/First Nation
  • Place of residence 1 year ago
  • Place of residence 5 years ago
  • Major field of study
  • Location of study
  • Place of birth of parents
  • Language at work
  • Industry
  • Occupation
  • Place of work

In 2006, it is expected that over 40 million write-ins will be coded, of which an average of about 75% will be coded automatically.

Edit and imputation - September 2006 to September 2007

The data collected in any survey or census contain omissions or inconsistencies. These errors can be the result of respondents missing a question, or they can be due to errors generated during processing. For example, a respondent may be unwilling to answer a question, may fail to remember the right answer, or may misunderstand the question. Census staff may code responses incorrectly or may make other mistakes during processing.

After the capture, initial editing and corrections, and coding operations are complete, the data are processed through the final edit and imputation activity. The editing process detects errors and the imputation process corrects them. The edit and imputation phase is important because:

  • Consistent estimates are essential to users, particularly those counts that are used as official estimates for legislative and administrative purposes.
  • If invalid or missing responses are not adjusted, data users would have to tabulate incomplete data or develop their own estimates, which would not be consistent with other results.
  • Many data users do not wish to adjust or tabulate incomplete data.
  • Correct data are necessary for processing purposes. For example, family patterns are constructed based on information provided by respondents on age, sex, marital status, relationship to Person 1, etc. If these data are missing or inconsistent, family characteristics cannot be compiled.

Weighting

Questions on age, sex, marital status, mother tongue and relationship to Person 1 are asked of 100% of the population. However, the bulk of census information is acquired on a 20% sample basis using the additional questions on the 2B questionnaire. "Weighting" is used to project the information gathered from the 20% sample to the entire population.

The weighting method provides 100% representative estimates for the 20% data and maximizes the quality of sample estimates.

For the 2006 Census, weighting will employ the same methodology used in the 2001 Census, known as calibration estimation. This begins with initial weights of approximately 5 and then adjusts them by the smallest possible amount needed to ensure closer agreements between the sample estimates (e.g., number of males, number of people aged 15 to 19) and the population counts for age, sex, marital status, common-law status and household.