The following section describes the methods used to restrict the dissemination of census data of unacceptable quality.
Area suppression, when applied for data quality purposes, is used to replace all income characteristic data with zeroes for geographic areas with populations and/or number of households below a specific threshold.
If a census tabulation contains any data showing income characteristics for individuals, families or households, then the following rule applies. Income characteristic data are zeroed out for areas where the population is less than 250 or where the number of private households is less than 40. These thresholds are applied to 2006 Census data as well as all previous census data. The threshold of 40 private households is based upon the fact that weighted data are being used. With the weighting factor for each household being 5, setting a threshold of 40 ensures that there will be at least 8 households used in the calculation. The private household threshold does not apply for tabulations based on place of work geographies.
Data quality indicators (commonly referred to as data quality flags) are attached to each standard geographic area disseminated. In the census database environments, the data quality indicators consist of a five-digit numeric field. On the database and in electronic products browsed via Beyond 20/20, these flags are displayed as a five-digit numeric code (example: 0 2 1 3 1). On the census website and in print publications, flagging to end users partially enumerated areas is done through the use of symbols. Specific symbols in use for the 2006 Census are documented as part of the print publication standards.
In the 2006 and previous censuses, some dissemination areas for Indian reserves were not enumerated due to non-participation/non-cooperation. Data quality rules require these non-enumerated areas to be identified and removed from products. As well, higher-level geographic areas containing non-enumerated areas must be identified in the products.
Although there is no census data collected for non-response areas, the areas themselves are included as part of the standard geographic hierarchies on the census databases. Retrieval and tabulation software will retrieve these areas but with no data.
Any geographic area that contains an incompletely enumerated area is considered a partially enumerated area. Partially enumerated areas are flagged to end users as containing incompletely enumerated areas.
Global response rates are determined for each of the census geographic areas. These areas are flagged on the database according to the non-response rate. Geographic areas with a non-response rate higher than or equal to 25% are suppressed from tabulations. Geographic areas with a global non-response rate higher than or equal to 5% and lower than 25% are broken into 2 categories and are flagged according to the following ranges: falling between 5% and 10% and falling between 10% and 25%. These geographic areas are identified in tabulations, but not suppressed.
After the release of the population and dwelling counts, errors are occasionally uncovered in the data. It is not possible to make changes to the 2006 or the 2001 Census data presented. Users can, however, obtain the population and dwelling count amendments listed by census subdivisions and other levels of geography by visiting the 2006 or the 2001 Census portion of the Statistics Canada website.
Users wishing to compare 2006 Census data with those of other censuses should take into account that the boundaries of geographic areas may change from one census to another. In order to facilitate comparison, the 2001 counts are adjusted as needed to take into account boundary changes between the 2001 and 2006 censuses. The flag is also used to refer to corrections to the 2001 counts and to identify areas that have been created since 2001, such as newly incorporated municipalities (census subdivisions) and new designated places. However, most of these flags are the result of boundary changes.
The following table describes the data quality indicator field and its contents. Note that a zero in any of the five digits is the default for the respective indicator and means that no data quality action is required.
|1st (0XXXX)||Incomplete enumeration flag||0||Default.|
|1||Incompletely enumerated Indian reserve or Indian settlement (suppressed).|
|2||Excludes census data for one or more incompletely enumerated Indian reserves or Indian settlements.|
|2nd (X0XXX)||100% data quality flag||0||Default.|
|1||Data quality index showing, for the short census questionnaire (100% data), a global non-response rate higher than or equal to 5% but lower than 10%.|
|2||Data quality index showing, for the short census questionnaire (100% data), a global non-response rate higher than or equal to 10% but lower than 25%.|
|3||Data quality index showing, for the short census questionnaire (100% data), a global non-response rate higher than or equal to 25% (suppressed).|
|3rd (XX0XX)||Population and dwelling counts error flag||0||Default.|
|1||An error exists in the 2006 population and dwelling counts for this area. For further details, please refer to the population and dwelling counts data section of the ‘Notes' file.|
|2||In 2001 the population and/or dwelling count for this Census Subdivision were found to be incorrect. Since it is not possible to make changes to the 2001 Census data presented in these tables, the 2001 data should be used with caution. For further details, please refer to the population and dwelling counts data section of the ‘Notes' file.|
|3||Both the 2006 and 2001 population and/or dwelling counts for this area were found to be incorrect. Since it is not possible to make changes to the census data presented in these tables, these counts should be used with caution. For further details, please refer to the population and dwelling counts data section of the ‘Notes' file.|
|4th (XXX0X)||20% sample data quality flag||0||Default.|
|1||Data quality index showing, for the long census questionnaire (20% sample data), a global non-response rate higher than or equal to 5% but lower than 10%.|
|2||Data quality index showing, for the long census questionnaire (20% sample data), a global non-response rate higher than or equal to 10% but lower than 25%.|
|3||Data quality index showing, for the long census questionnaire (20% sample data), a global non-response rate higher than or equal to 25% (suppressed).|
|5th (XXXX0)||2001 adjusted population flag||0||Default.|
|1||2001 adjusted count; most of these are the result of boundary changes.|
Note: The 100% and 20% sample data quality flags do not apply to the population and dwelling counts. The flag legend for historical census years can be found in Appendix B.
Place of work areas are suppressed for data quality reasons when the following three conditions are met:
The data quality indicator for place of work uses only the 4th digit of the five-digit numeric code. A value of 3 on this indicator for a place of work geography indicates the area is to be suppressed.
|4th (XXX0X)||20% sample data quality flag||0||Default.|
The methods of suppression mentioned to this point provide sufficient data quality suppression and identification for most census data products. However, in some products, the specifying area or production area may require that additional data quality suppression be performed. Examples of additional suppression could include increasing population thresholds or applying distribution or cell suppression. These are typically product-specific requirements and therefore are not part of the automated suppression systems. In all cases, some form of manual process is required.
The most common example of other methods of data quality suppression is distribution suppression. This occurs in selected standard income products where income distributions are suppressed when the total number of units (persons, families, households) within the income distribution is less than 250. A variation of this procedure is applied to standard income products that feature number and average employment or total income only.
Further, when there are indications that there is high degree of variability
among responses and thus the possibility of extreme income values, the earnings
and/or income statistics may also be suppressed for data quality purposes.
Therefore, more specific rules are in place that account not only for population
size but also for the likelihood of uncertainty in the estimates due to extreme
values and sample variability.
Medians and more generally quantiles are calculated using linear interpolations. The quantile interval (that is the interval where the value of the quantile is located) is determined using two methods based on the kind of values of the statistical variables:
Variables that take values with decimals and any variables with dollar values.
The quantile interval is constructed to ensure that relative errors made by using the linear Interpolation are less than 0.78%. For example, if the true quantile is $30,000, the error made of using the built-in algorithm is less than $234.
Variables that take integer values that are not dollars.
For these variables, the quantile interval is always of size 1. For example, if the true quantile is 23.46, the interpolation is applied to the interval [23, 24].