View the most recent version.

## Archived Content

# Data quality and confidentiality standards and guidelines (public): Confidentiality (non-disclosure) rules

## Area suppression for standard^{}1 and non-standard geographic areas

## Population universes used for suppression routines

## Random rounding

## Disclosure avoidance for statistics

### Statistic suppression

### Special statistic calculations

### Outlier statistic suppression

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

The following describes the various suppression rules used to ensure confidentiality (or non-disclosure) of individual respondent identity and characteristics. All census data are subject to confidentiality suppression rules.

Area suppression is used to remove all characteristic data for geographic areas below a specified population size.

The specified population size for all standard areas or aggregations of standard areas is 40, except for blocks, block-faces or postal codes. Consequently, no characteristics or tabulated data are to be released for areas below a population size of 40.

The specified population size for six-character (FSA-LDU) postal codes, geocoded areas and custom areas built from the block, block-face or LDU levels is 100. Consequently, no characteristics or tabulated data are to be released for these area types below a population size of 100. Generally, blocks and individual urban block-faces (one side of the street between two intersections) will be too small to meet the above threshold specified population sizes. Where an aggregation of blocks or block-faces fall above the threshold specified by the population size, data can be retrieved through a custom tabulation. Additional area suppression is applied for data quality reasons if the census tabulation contains any data showing income characteristics for individuals, families or households.

These threshold specified population sizes are applied to 2006 Census data as well as all previous census data.

The population under consideration for all 100% data tabulations is the total.

For all other tabulations, except place of work data, the population under consideration is the lower of the 2A (100% data) or 2B (20% sample data) non-institutional population.

For place of work data, the population under consideration is the employed labour force having a usual place of work or worked at home.

2A | 2B | Place of work geographic areas |
---|---|---|

Total population | Lower of the 2A or 2B non-institutional population | Employed labour force having a usual place of work or worked at home |

For census tabulations that are based on place of work geographies or areas, all criteria are to be based on the employed labour force having a usual place of work or worked at home counts. That is, the 40 population, 100 population and 250 population thresholds are employed labour force having a usual place of work or worked at home counts, rather than the population of the areas. Tabulations containing both places of residence and places of work as geographic areas have the 40, 100 and 250 size limits applied to both place of residence (population) and place of work (employed labour force having a usual place of work or worked at home).

All counts in census tabulations are subjected to a process called random rounding. Random rounding transforms all raw counts to random rounded counts. This reduces the possibility of identifying individuals within the tabulations.

For 2A (100%) data, all counts are rounded to a base of 5. This means that all 2A counts will end in either 0 or 5. The random rounding algorithm employed controls the results and rounds the unit value of the count according to a pre-determined frequency. The table below shows those frequencies. Note that counts ending in 0 or 5 are not changed and remain as 0 or 5.

Unit values of | Will round to count ending in 0 | Will round to count ending in 5 |
---|---|---|

1 | 4 times out of 5 | 1 time out of 5 |

2 | 3 times out of 5 | 2 times out of 5 |

3 | 2 times out of 5 | 3 times out of 5 |

4 | 1 time out of 5 | 4 times out of 5 |

5 | Never | Always |

6 | 1 time out of 5 | 4 times out of 5 |

7 | 2 times out of 5 | 3 times out of 5 |

8 | 3 times out of 5 | 2 times out of 5 |

9 | 4 times out of 5 | 1 time out of 5 |

0 | Always | Never |

2B (20%) data require a slightly different random rounding algorithm. All counts greater than 10 are rounded to base 5, as is done for 2A data. Counts less than 10 are rounded to base 10. This means that any 2B counts less than 10 will always be changed to 0 or 10. The table below shows the effect of rounding on 2B counts with a value less than 10.

Count of | Will round to 0 | Will round to 10 |
---|---|---|

1 | 9 times out of 10 | 1 time out of 10 |

2 | 8 times out of 10 | 2 times out of 10 |

3 | 7 times out of 10 | 3 times out of 10 |

4 | 6 times out of 10 | 4 times out of 10 |

5 | 5 times out of 10 | 5 times out of 10 |

6 | 4 times out of 10 | 6 times out of 10 |

7 | 3 times out of 10 | 7 times out of 10 |

8 | 2 times out of 10 | 8 times out of 10 |

9 | 1 time out of 10 | 9 times out of 10 |

0 | Always | Never |

The random rounding algorithm uses a random seed value to initiate the rounding pattern for tables. In these routines, the method used to seed the pattern can result in the same count in the same table being rounded up in one execution and rounded down in the next.

Statistics (such as mean, standard error, sum, median, percentile, ratio or percentage) are not subject to random rounding. However, when shown in tabulations accompanying the counts used to calculate the statistic, their presence can result in disclosure of individuals. To prevent this, we use statistic suppression methods or special statistic calculations.

The following three situations will result in the suppression of statistics:

- It is possible (mainly for cells with small counts) that quantitative values were imputed from a single donor record. For example, an income cell with three individual records may in fact be only one actual response to income and the other two income amounts were imputed from the first record. When this occurs, the income characteristics of a single individual could be disclosed if the mean and standard error statistics are produced. To prevent this, and more generally to prevent disseminating statistics based on a narrow range of values, all statistics of a cell are suppressed if the relative difference between the minimum and the maximum is less than a specific percentage.
For all quantitative variables, a statistic is suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than a specific number.

Note: The number of records used in the calculation is not necessarily the number of records in the cell but, rather, the number of records that are applicable or available to the calculation of the statistic in the cell.

- For all quantitative variables, all statistics are suppressed if the sum of the weights is less than 10.

- The statistic value is never rounded, except for frequencies.
- All statistics based on ranks (medians, percentiles) are calculated the usual way, that is, never rounded.
- All dispersion statistics (standard error) are calculated the usual way, that is, never rounded.
- When a sum is specified, if the program sums a dollar value, a number of weeks, a number of hours, or an age, then the program multiplies the unrounded average of the group in question by the rounded, weighted frequency. Otherwise, the program rounds the actual weighted sum.

When a division is specified (averages, percentages, ratios, etc.), the program must apply the point (4) to both numerator and denominator before it proceeds with the division.

Note: Statistics based on ranks like median and percentiles are always calculated via linear interpolations. That means that, for low count cells, these statistics are not reliable. That is the reason why no additional confidentiality measures are applied to them.

Note: The average of dollar value, a number of weeks, a number of hours or an age is not altered by the rounding because the numerator is the product of the true average by the rounded frequencies and the denominator is the rounded frequencies. The two frequencies cancel each other leaving the true average untouched.

It is possible, though highly unlikely, that an outlier can be estimated accurately on the basis of an average. To reduce the risk of such a disclosure, all statistics for a cell will be suppressed if the ratio of the absolute value to the sum of the absolute values is greater than a specific percentage.