A New Approach for the Development of a Public Use Microdata File for Canada's 2011 National Household Survey
The special PUMF included 925,564 individuals from 370,192 households. Excluding the small noise added to Age and income variables, 40% of individuals had values perturbed – 45% for individuals aged over 15. About 15% of individuals had more than one variable perturbed. Around 74% of the households with more than one member had at least one member's values perturbed beyond the "small noise". Although some practices such as top/bottom coding and the treatment of ethno-cultural variables may have increased data homogeneity, efforts were made to carry out perturbations in ways that reduced their impact on existing relationships. This was done particularly for variables that were perturbed more heavily like OCC.
Univariate population estimates from the PUMF were compared to those from the 2011 NHS. Differences could be introduced during the subsampling, perturbation and/or calibration steps (although calibration generally improved results). Excluding variables "Hours worked" and "Weeks worked," there were about 450 answer categories on the PUMF, and for three-quarters of them the difference with the 2011 NHS was within 1.25%. Only 23 categories had a difference of over 3%. For three of them (age categories 79 and 84 and Field of study category "Other") the difference was over 5%. Bivariate relationships were affected more, especially among rarer characteristics. It was following Subject matter reviews of such relationships that improvements were made to the perturbation of age and OCC.
The creation of this PUMF using data perturbation techniques, a first for Statistics Canada, was in many ways a research development project. In the process, ways were devised to avoid overlap with other PUMFs, to adapt and apply risk measures for a multitude of personal and household characteristics, and to carry out perturbations for related characteristics. During this process many lessons were learned whose benefit will extend to future work on PUMF creation at Statistics Canada.
- Date modified: