View the most recent version.

## Archived Content

# 9. Sampling variance

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Sampling error can be divided into two components: variance and bias. The variance
measures the variability of the estimate about its average value in hypothetical repetitions
of the survey process, while the bias is defined as the difference between the average value
of the estimate in hypothetical repetitions and the true value being estimated.
Chapter 6 presented results of the
sampling bias, describing the nature and extent of bias in the census sample prior to weighting.
Even with a perfectly unbiased sampling method, the results would still be subject to variance,
simply because the estimates are based only on a sample. The variance may be estimated using
the data collected by the sample survey^{1}. The
sampling variance was studied to estimate the effect of the sampling and estimation procedures on
those census figures that are based on sample data.

On the basis of the 2B sample data, thousands of tables are produced by Statistics Canada. Conceptually, the estimated sampling variance is a measurement of precision and can be associated with every estimate calculated in these tables. This measurement takes into account both the sample design and the estimation method. In practice, however, it cannot be calculated for every census estimate because of high data processing costs. Sampling variance is thus estimated for only a subset of census estimates. From this, the combined effect of the sample design and the estimation method on the sampling variance can be estimated. Simple estimates of sampling variance, which are inexpensive to calculate, can then be adjusted for this impact to produce estimates of sampling variance for any census estimates.

The square root of the sampling variance, known as the standard error, can be approximated using the data in Tables 9.1 and 9.2. Table 9.1 gives non‑adjusted (simple) standard errors of census sample estimates. The figures in this table were obtained by assuming that 1 in 5 simple random sampling, and simple weighting by 5 were used. The standard errors are expressed in Table 9.1 as a function of the size of both the census estimate and the geographic area. For example, for an estimate of 250 persons in a geographic area with a total of 1,000 persons, the non-adjusted standard error is 25.

Standard errors are given in Table 9.1 for only a limited number of values for the estimated total and the total number of persons, households, dwellings or families in the area. The following formula may be used to calculate the non-adjusted standard errors (NASE) for any estimated total for an area of any size:

where NASE is the non-adjusted standard error, E is the estimated total and N is the total number of persons, households, dwellings or families in the area. For example, for an estimated total of 750 persons in an area with a total of 9,000 persons, the non-adjusted standard error would be:

Table 9.2 provides adjustment factors^{2} by which the
non‑adjusted standard errors should be multiplied to adjust for the combined effect
of the sample design and the estimation procedure. To calculate these adjustment factors,
estimates of the sampling variances were calculated for regression estimates for different
categories of all of the characteristics^{3} given in
Table 9.2. This was done for each sampled WA. The estimates of
sampling variance at the provincial and national levels were obtained by summing up the
WA-level estimates. The adjustment factors for each
characteristic in each category were calculated by dividing the square roots of these estimates
by the non-adjusted standard errors. Adjustment factors were calculated at the provincial and
national levels for each characteristic by averaging the adjustment factors for all of its
categories. For example, the adjustment factors for 'Sex' are the average of those for the
categories Male and Female. The majority of characteristics have their categories grouped based
on similar adjustment factors, and the factor from the appropriate group should be used for each
category. In cases where a table references multiple categories, the largest adjustment factor
involved should be used. For further information on how these adjustment factors were
calculated, see Hovington (2004).

To estimate the standard error for a given census sample estimate, the user should determine from Table 9.2 the adjustment factor applying to the characteristic and multiply this factor by the non‑adjusted standard error selected in Table 9.1. If the characteristic is not identified in Table 9.2, the user should pick the adjustment factor of 1 shown for the 'All other' category. For each characteristic in Table 9.2, adjustment factors are given at the national and provincial levels, as well as at the WA level. Unless the area is smaller than a province, the 'National or provincial factor' should be selected. Adjustment factors for different provinces are given in Table 9.2 only for cases where they differ significantly from those at the national level. This only occurred for some of the language characteristics. It should be noted that since no sampling occurred in Nunavut, the adjustment factors for all characteristics in this territory should be zero. Since sampling was done in the Yukon Territory and the Northwest Territories, the 'Other provinces' adjustment factor should be used, if available. If an adjustment factor is needed for a census estimate associated with an area smaller than a province, then the percentiles of WA-level factors will provide a more accurate value. The percentiles give the spread of all the adjustment factors calculated in the study at the WA level for the different categories of a characteristic. N% of the adjustment factors at the WA level are below the Nth percentile and (100 – N) % are above the Nth percentile. For example, 90% of the adjustment factors at the WA level are below the 90th percentile and 10% are above it. The choice of which percentile to use will depend on how conservative the estimate of the standard error is desired to be. For example, using the 99th percentile would provide a very conservative estimate, while using the 75th percentile would provide a somewhat less conservative estimate.

The following rules should be followed when calculating adjusted standard errors:

- When determining the standard error of an estimate relating to families or
households, the number of families or households in the area, not the number of
persons, should be used for selecting the appropriate column in Table 9.1.

- Unless otherwise specified, family characteristics involving husband, wife,
lone‑parent or family reference person have the same adjustment factors as
population characteristics. For example, the adjustment factor for the characteristic
'Highest level of schooling of husband, wife, or lone parent of a census family' is
the same as the population characteristic 'Highest level of schooling.'

- For cross-classifications of two or more characteristics, the largest adjustment
factor for those characteristics should be used.

- All the standard error adjustment factors are for estimates of the number of persons, households, dwellings, or families, as opposed to, for example, dollar values. For example, the household income adjustment factors are for estimates of the number of households whose income falls in a certain dollar range, and not for estimates such as average household income.

The following example illustrates how to calculate the adjusted standard errors. Suppose the estimate of interest is the number of persons who immigrated to Canada between 1996 and 2006. The 2006 Census estimate for this characteristic was 1,954,605. The 2006 Census count for the population of Canada for sampled variables was 31,241,030. Since neither number is very close to any of the values given in Table 9.1, the formula given to calculate the non-adjusted standard error should be used. In this case the result would be 2,707. From Table 9.2, the national-level adjustment factor for the characteristic 'period of immigration' after 1990 is 1.67. Consequently, the adjusted standard error for this estimate is 2,707 x 1.67 = 4,520.

The sample estimate and its standard error may be used to construct an interval within which the unknown population value is expected to be contained with a prescribed confidence. The particular sample selected in this survey is one of a large number of possible samples of the same size that could have been selected using the same sample design. Estimates derived from the different samples would differ from each other. If intervals from two standard errors below the estimate to two standard errors above the estimate were constructed using each of the different possible estimates, then approximately 19 out of 20 of such intervals would include the value that would normally be obtained in a complete census. Such an interval is called a 95% (19 ÷ 20 = 95%) confidence interval. In order to guarantee 95% confidence however, these intervals must be calculated using the true standard errors of the sample estimates. The adjusted standard errors calculated from Tables 9.1 and 9.2 are only estimates of the true standard errors. For sample estimates at the provincial and national level, however, the adjusted standard errors should be close enough to the true standard errors to calculate approximate 95% confidence intervals of reasonable precision. Below the provincial level, the adjusted standard errors may not be accurate enough for this purpose.

Using the standard error calculated above, an approximate 95% confidence interval for the number of persons who immigrated to Canada between 1996 and 2006 would be 1,954,605 ± 2(4,520) or 1,954,605 ± 9,040.

It should be noted that estimates in small areas can be unreliable, as demonstrated with the following example. A community with a population of 500 persons that had an estimate of 50 for the number of persons who immigrated to Canada between 1996 and 2006 would have a standard error of 15 based on Table 9.1. Since this population is smaller than the provincial level, a WA level adjustment factor must be selected from Table 9.2. Taking the most conservative figure from the 99th percentile would result in an adjusted standard error of 15 x 2.46 = 36.9. This would result in an approximate 95% confidence interval of 50 ± 2(36.9) or 50 ± 73.8. That is to say that the actual population value in this community of persons who immigrated to Canada between 1996 and 2006 could be anywhere in the range from 0 to 123 with 95% confidence. Even a somewhat less conservative figure using the 75th percentile adjustment factor (1.52) results in a 95% confidence interval that ranges from 5 to 95.

Table 9.1 Non-adjusted estimates of standard errors of sample estimates

Table 9.2 Standard error adjustment factors at national or provincial and weighting area levels

**Notes:**

- Unfortunately, the sampling variance does not provide any indication of the extent of non-sampling error.
- The squares of the adjustment factors are commonly known as 'design effects.'
- For example, '$10,000 to $19,999' was one of the categories for which estimates of sampling variance were calculated for the characteristic 'Number of persons in total income intervals.'