Using an Area Sampling Frame to calculate livestock statistics in the Gauteng Province, South Africa, within a GIS

By Craig von Hagen


In South Africa, there are no reliable statistics regarding animal numbers and distribution.The goal, therefore, of this research is to provide the framework and procedure for obtaining these statistics efficiently and accurately.Available sampling methods and sampling frames were investigated and it was decided to carry out a sample survey because the Gauteng Province consists of a large number of holdings (land parcels).In the Gauteng Province, where a complete list of farmers or landowners is not available, it was decided to use an area-sampling frame.Once the choice of sample design was made, the survey objectives were defined according to the clients' needs.The sampling frame was constructed using various land parcel layers.These land parcels were merged, using GIS software, into one continuous layer of land parcels.They were then stratified to reduce the variance of the variable (animals) under study over the entire area, using area of land parcel and land-cover.The sample size was then calculated and the land parcels were selected randomly for survey purposes.The survey was conducted between September and December 1999 and the questionnaires were input into a database for the estimation procedures.The closed estimation procedure was used because it is the only possible option if the data surveyed are referenced to the land parcel (and not to a farm that includes several land parcels).The area frame sampling methodology worked well for cattle, sheep, horses, pigs and dogs/cats and to a lesser extent for goats, donkeys and game.The area frame method did not work well for poultry (because of extremely high values in a few land parcels), ostriches or mules (these are rare in the province).Spatial distributions and density distributions were then interpolated from the animal counts taken in the survey and they give a general idea of the location of animals.The distributions of cattle, sheep, horses, pigs and dogs/cats are reliable.The distributions of the rest are distorted due to extreme counts in a few land parcels but a general idea of concentrations can still be inferred.Considering that no historical data exists and that the overall goal of this research was to get an idea of animal numbers and the distribution of animals in Gauteng province, it can be considered successful, in that decision-makers now have a reliable source of information from which good decisions can be made.

Authors: CR von Hagen, DR J van der Meigden (GVS), DR G van der Zel (GVS), ARC - ISCW, Pretoria, South Africa
Keywords: area frame, estimation, livestock, sampling, spatial distribution


Censuses have always been the principal tools to acquire reliable statistical information.A census is an enormous undertaking and requires a lot of resources.The tabulation of questionnaires can take years.This is also true for specialised censuses such as those specifically addressed to agriculture.Censuses are also error prone because of omissions and the difficulty in quality control on a large mass of data.

At the moment very little information exists regarding agricultural statistics in communal and rural areas.Where it does exist, it is either outdated or very subjective and this is the case for the whole of South Africa.This implies that there is no baseline data that can be used as an input for a survey and consequently an estimation of animal numbers. Having this data available would have made the stratification of the frame more accurate and therefore the estimates as well.To obtain agricultural information by taking a complete enumeration of the population (e.g.livestock) would not be feasible in a developing country like South Africa, which cannot afford the expense of a census in order to obtain agricultural statistics.

In order to adapt and comply with the goals and regulations of the Government, Gauteng Veterinary Services (GVS) have identified the lack of spatially-linked livestock information as a major problem, especially with regard to the outbreak and subsequent control of diseases.GVS have to answer important questions regarding the geographical extent of a disease, the number of livestock that may be affected by a disease, the amount of vaccine that must be distributed to an affected area, other species that might be affected, how many farms might have to be quarantined and what short and long-term plans to put in place to deal with a specific situation.

Presently, these decisions are made based on how well the local government veterinarian knows the area.By GVS's own admission, these decisions can be very subjective.To prevent the outbreak of an epidemic, Gauteng Veterinary Services have realised that accurate, spatially-linked livestock information has to be available in order to take the most appropriate and cost-effective decisions.

The objective, therefore, of this research is to provide the framework and procedure for obtaining these statistics efficiently and accurately.From these results, estimations have been made and incorporated into a GIS.

Methods and procedures

The alternative to using a complete enumeration of a population would be to conduct a sample survey. This would cost the taxpayer less, could be conducted more efficiently, a greater scope of data might be sampled and, in some cases, may even be more accurate (Cochran, 1977).

Area Frame sampling is a widely used methodology in countries like the USA and in Europe and consists of the following steps:

  • Stratification - the distribution of livestock varies considerably over the province and magisterial districts. The precision of the survey estimates can be improved by dividing the land into homogenous groups or strata and then optimally allocating the total sample to the strata.
  • Sampling - within each stratum, the land could be divided into the sampling units and then a random sample of these units is taken.
  • Analyses - several decisions are made that can have an impact on the statistical and cost efficiency, e.g.strata definitions, allocation of the sample to the strata and the method of selecting the sample.
  • Quality assurance - it must be ensured that no land is omitted from - or duplicated in - the frame and that the area has been properly stratified.

The advantages are versatility, being statistically sound, cost-effective, an ideal framework for a GIS, the fact that it can be implemented anywhere and that it has future value.

The disadvantages, however, are that it is less efficient than a complete list of the population, it is inadequate for rare items and the lack of sufficient boundaries and materials can be a problem.

Compiling the frame

The Gauteng area frame was compiled using existing land parcel data, which was obtained from the Surveyor General.Farm subdivisions, smallholdings and erf data were combined into one coverage of land parcels (or sampling units).


The purpose of stratification is to reduce the variance of the variable under study in each stratum so as to obtain a lower variance over the entire study area, or to obtain the same variance as that of a non-stratified sample using a smaller sample (Deneufchatel & Porchier, 1993).

The Gauteng area frame was stratified using the 1:250 000 land-cover data and land parcel size. The 31 land-cover classes were reclassified into five classes.The land-cover was reclassified so as to minimise the number of strata in the end.The reclassified classes were the result of meetings with Gauteng Veterinary Services.The classes were chosen by looking at each land-cover class and deciding if the livestock distribution would be significantly different from that of any another class.For example, cultivated land will have a different distribution than open vegetation or residential areas.The decision was based on the knowledge of livestock distribution from GVS, since no previous statistics were available.Each land parcel was then assigned a specific land-cover class according to the majority area covered. For example, a land parcel might have two land cover classes, vegetation (covers 50 ha) and cultivated land (covers 300 ha), the land parcel will then be classified as cultivated.This is done to minimise the amount of polygons in the data to a manageable level.Land parcels were divided into three classes, namely small, medium and large.These classes were also decided on in various meetings with GVS.It was decided that parcels smaller than 25 ha would be classed as small.Parcels between 25 ha and 400 ha would be medium, while parcels with an area greater than 400 ha would be classified as large.Again it must be emphasised that this division was subjective as no previous statistics exist.

Simple queries were run in ArcView GIS to assign a stratum value to each land parcel.An example of such a query would be "select all the land parcels between 25 ha and 400 ha with a vegetation land-cover." The parcels selected in this query would be assigned to the "vegetation with parcels 25 - 400 ha" stratum (Table 1).

Sample size

There are a variety of ways by which the sample may be selected. For each, a rough estimate of the sample size can be made from the degree of desired precision specified by the client. The relative costs and time involved for each sample strategy must be compared before taking a decision (Cochran, 1977).

More than one item or characteristic is usually measured in sample survey and sometimes the number is large.If a desired degree of precision is prescribed for each item, the calculations lead to a series of conflicting values for n, one for each item.Some method must be found to reconcile these values (Cochran, 1977).These conflicts were solved by using the animal priorities given by GVS (Table 2).

To determine the sample size for the Gauteng area frame, two assumptions were considered. Firstly, if it was assumed that nothing was known about the population distribution and that money was no problem, over-sampling (sample size of 25 - 50%) could be decided on. Secondly, if it were assumed that GVS have some idea of the population distribution and that a limited budget was available, then a much smaller sample size would be selected. In this study the latter assumption was the most appropriate and the sample extraction definition was built accordingly.The statistical procedure used in calculating the sample size for each stratum was taken from Cochran (1977).

Extraction of sample

The land parcels in each stratum were selected by running a random selection script on each stratum individually.The script was run within ArcView GIS directly on the polygon land parcel theme.The script was written by ESRI (ArcView GIS developers) and it is included in ArcView as a sample script.

Cartographic preparation

The survey maps were prepared using ArcView GIS.After the random selection was made, the centroid coordinate of each parcel was calculated and included on the survey form. This coordinate was used to locate each land parcel.Each of the selected land parcels was then labelled using a unique number so as to keep track when the survey forms came back and were entered into the database.The 1:50 000 grid was then overlaid on the selected parcels.A3 maps were then printed per 1:50 000 map sheet, per stratum for easy reference.1:50 000 road data were also included on each map sheet for orientation purposes. A total of 280 A3 maps were printed for survey purposes.

Survey questionnaire

The questionnaire for the survey was developed using FAO (1996) guidelines in conjunction with GVS.The self-enumeration method was preferred because there is no link between landowner or land user and the actual land parcel.It would therefore be less time-consuming for the enumerator to visit the land parcel using the given coordinates and A3 survey maps, than to find out who, or where, the landowner is located.The questionnaire was kept as simple as possible so as to ensure that the enumerator spends the least possible time on each land parcel.The survey was conducted by GVS between September and November 1999.The database for the collection and input of the survey questionnaires was constructed in Microsoft Access.


Mr Francesco Luccarrini, a statistician from Italian firm Aquater, developed the estimation procedure in MS Access.The sampling unit total, sample size and data from the survey questionnaires are all incorporated into the estimator.There are three types of estimators: closed, open and weighted.A closed estimator was used, as it was best suited to the survey.

The following example from Ford et al, (1986) illustrates the difference between the three types:

Suppose the following situation occurs for a specific farm: land parcel acres = 10, farm acres = 100, hogs on the land parcel = 20, and hogs on the farm = 40.The closed segment value of number of hogs would be 20; the weighted segment value would be 40 X (10/100) = 4; and the open segment value would be 40 (if the headquarters is in the land parcel) or 0 (if the headquarters is not in the land parcel).

In the previous section, estimations of population totals were made and accuracy for each of the items was given.These are only population totals and they give no indication of the distribution of items.Points of observation at specific locations are, however, available from the survey and these can be used to estimate the values of the unknown areas from the measured points in the survey. This is known as the interpolation process and it results in spatial distribution maps.


Estimates were made for each stratum.Some strata give more accurate results because of variations in sample rate.Here only the combined estimate for the province is given. Terms that are used in the discussion below will now be explained.

The coefficient of variance (CV) is one major tool for evaluating the quality of the estimates.It measures the precision of an estimate, but not the accuracy.A low CV shows that the estimator has very little variation relative to the point estimate and is precise.Conversely, a high CV means that the estimator has a wide confidence interval and that the estimated value could change greatly given a different sample (Garibay et al.1996).

According to GUSS/NASS Project (1995), estimates should have CVs less than 10 % for items of major importance.Budget limitations on the survey, and therefore the sampling rate, may produce higher than desired CVs.The fact that no historical data exists will also lead to higher CVs.

On the advice of Mr Luccarrini (pers.comm., 2000), two distinct estimates have been produced for each stratum.One including outliers and one excluding outliers.This is because extremely high outliers (e.g.poultry) lead to a distorted estimate and a high CV.The outliers were calculated by selecting values exceeding the mean by five times the standard deviation.The usual limit is three times the standard deviation (Luccarrini, pers.comm., 2000).The difference in the two estimates comes from only a few land parcels with extremely high values and sometimes this difference is very high as in the case of ostriches and poultry.

Land parcel area is not directly included in the estimator because the selection of the sample was completely random (same probability for each land parcel) and not proportional to land parcel area.According to Mr Luccarrini (pers.comm., 2000), land parcel area could be included a posteriori in a weighted estimator but the correlation between number of animals and land parcel area is not that strong, so the results could be misleading.

Stratum 6 (urban: residential) was excluded from the global estimate because of the extremely low sampling rate, which leads to unreliable expansions.

The total of all animals (for all strata) actually counted is given in Table 3, while the estimation results discussed can be viewed in Table 4 and Table 5.

Cattle were counted in all strata.The cattle estimates with (921 880) and without (766 930) outliers differ slightly, but the difference is not too drastic.The CV for both estimates is relatively low, indicating a reliable estimate for cattle.The a priori figure given to calculate the sample size is well below the estimates (Table 2).

Sheep were counted in all strata.The sheep estimates (395 194 with outliers and 377 186 without outliers) are very close.The CV for both estimates is low, indicating a reliable estimate for sheep.The a priori information given for the combination of sheep and goats is below the estimate (Table 2).

Goats were counted in all strata.The estimates with (111 970) and without (55 203) outliers are significantly different.The difference occurs because a few land parcels have a high goat count.The estimate without outliers has an acceptable CV.

Horses were counted in all strata.The estimates with (31 798) and without (29 263) outliers are almost the same.This is because the outliers lie very close to the cut-off of five standard deviations above the mean.The CV for both estimates is low, indicating a reliable estimate for horses.The a priori value given for a combination of horses, mules and donkeys, is not too far off, but the difference is still a significant one (Table 2).

Donkeys were not counted in strata 6 or 7.No outliers occurred for this item so the estimate stays the same (3 448).The CV of around 20 % is acceptable for this item.

Mules were not counted in strata 4, 5 and 6.There were no outliers occurring for mules so the estimate remains unchanged (1 110).The CV is too high to provide a reliable estimate.This is because mules are a rare item in the province.A much higher sampling rate over the whole province would be needed to provide a reliable estimate for this item.

Pigs were not counted in stratum 6.The pig estimates are relatively close with (133 698) and without (126 692) outliers.The low CV for both estimates indicates a reliable estimate.It is interesting to note that the a priori figure given is slightly higher than both estimates (Table 2).

Poultry were counted in all strata.There is an extremely large variation between the estimate with (approx.20 million) and the estimate excluding outliers (approximately 12 million).The CVs for both estimates are extremely large and the estimate is unreliable.The huge difference between estimates is due to a few land parcels having very high counts of 10 000 or greater.To get a more reliable estimate, all chicken farms and batteries will have to be identified and included in a stratum on their own.The rest of the province will have to be sampled more intensively as well.The estimates are far higher than the a priori information given to calculate sample size (Table 2).

The combination of dogs and cats was counted in all strata.The estimates with (226 041) and without (221 701) outliers are very similar, indicating the outliers lie very close to the cut-off of five standard deviations.The CV for both estimates is very low, indicating quite an accurate estimate.This is the only item for which the estimates for stratum 6 and the rest of the strata can be combined because both have a relatively low CV and the expansion for stratum 6 was not distorted.The two can be combined using the formula:

[Global estimate without stratum 6] + [stratum 6 estimate]

The result including outliers is 2 461 706 and excluding outliers is 2 457 366.The global CV for dogs and cats for the whole province is 7.22 %, indicating a reliable estimate.The estimate is significantly lower than the a priori value given (Table 2).

No game was counted in stratum 6.No outliers occur for this item so the estimate remains unchanged (98 331).The CV for game is acceptable given the priority set by GVS for this item (Table 2).The estimate differs greatly from the a priori information given in Table 2.

Overall it can be said that the estimates for dogs/cats, cattle, sheep, horses and pigs are reliable and accurate.The estimates for goats, donkeys, ostriches and game are acceptable while the estimates for mules and especially poultry are unreliable.

Spatial distribution

There are many advantages in taking spatial data beyond a purely descriptive display method, such as thematic mapping of points using colours or proportionally sized symbols. Modelling and interpolation software, such as ArcView Spatial Analyst, provide the means necessary to process and display data in a new derivative form (Wyatt, 2000).

It was decided to use the IDW interpolation technique for the following reasons:

  • irregularly spaced sample points
  • Potentially large variation in livestock numbers and livestock density
  • Surfaces go through known points.
Maps showing spatial distributions and densities are available for each species from ARC - ISCW.

Density is calculated when the major concern is the relative geographical crowding or sparseness of discrete phenomena, such as the number of persons or cattle per area unit (Robinson et al.1984).The density is computed by

D = N / A
where N is the total number of phenomena occurring in an enumeration unit and A is the area of the unit.Density is more closely related to the land than other averages and ratios, and the significant element in the relationship is area.Thus, for example, 5000 items in an area of 100 hectares is a density of 50 per hectare.When working with density, the size of the land parcels limits the detail that can be presented.Generally, the larger the units, the less will be the differences among the values (Robinson et al.1984).

Examples of the cattle, and dog and cat distributions as well as their density distribution are shown in Figures 1 - 4.These two items, given as priorities 1 and 3 by GVS, will be briefly described.


Cattle distribution (Fig.1) mainly corresponds to the land parcels with an area greater than 25 ha and areas with a vegetation land-cover.The urban and residential areas of Johannesburg, Pretoria and the East and West Rand show little or no concentration of cattle.Relatively high concentrations occur in the districts of Nigel, Heidelberg, Krugersdorp, Bronkhorstspruit, Cullinan, Wonderboom and Vanderbijlpark.These districts occur on the outskirts of the urban commercial and residential centres.Vanderbijlpark is dominated by cultivated land but still shows quite a high concentration of cattle. This might be an indication that farmers graze cattle on fallow land in this district.The interpolation shows a high concentration of cattle in the extreme south of the Johannesburg district.This could be associated with cattle grazing on communal land or on mining land.According to Mr van der Zel of GVS (pers.comm., 2000), the high values in Bronkhorstspruit and Heidelberg could be associated with feedlots.In Cullinan, state land is being given to developing farmers and this could account for the high cattle concentration there (van der Zel, pers.comm., 2000).The high values in Nigel and Krugersdorp need to be investigated.

The cattle density map (Fig.2) shows some slight differences.Krugersdorp, Nigel and Heidelberg still stand out as areas of high cattle concentration.As mentioned in the previous paragraph, the Krugersdorp and Nigel areas need some investigation, while the Heidelberg values might be associated with feedlots.Even though the land parcels in these districts are mostly greater than 25 ha, there is a sufficient number of cattle to show a high cattle density.Districts such as Brakpan, Germiston, Wonderboom and Vereeniging have few cattle, but they still show high cattle density.These areas correspond with smallholdings, which means the land parcels in these areas are smaller and therefore indicate a high density.The high densities in Soshanguve and Johannesburg might be associated with communal grazing land, where all owners graze their animals on pieces of open land.The high density in the west of Johannesburg corresponds well with the location of Soweto so this reinforces the association with communal land.

Dogs and cats

Most of the Gauteng land parcels have between three and ten dogs and cats (Fig.3).The exceptions are the urban commercial and industrial areas as well as the mining areas. Higher concentrations occur in Nigel, Vanderbijlpark, Krugersdorp, Wonderboom and Springs.The very high concentrations in Nigel and Springs will need to be investigated.

High densities (Fig. 4) occur in most of the urban residential areas because of the small erf land parcel sizes.Relatively high densities also occur in Soshanguve, indicating that communal areas also have a high dog and cat count.


According to Cochran (1977), the more information we have initially about a population, the easier it is to devise a sample that will give accurate estimates.Any completed sample is potentially a guide to improved future sampling, in the data that it supplies about the means, standard deviations and the nature of the variability of the principal measurements and the costs involved in getting the data.

The precision of the sampling procedure for this survey can only really be judged by examining the frequency distribution generated for the estimate if the procedure is applied again and again to the same population.

Considering that no historical data exists and that the overall goals of this research were to get an idea of animal numbers and the distribution of animals in the Gauteng province, it can be considered that it has succeeded in that decision-makers now have a reliable source of information from which good decisions can be made.A reliable estimate for a number of different animal species is now available and the figures are being used to update national livestock statistics.A reliable distribution of livestock, as well as density distributions, is now available and being used by GVS to identify areas where they can concentrate resources in an emergency situation (disease outbreak).

Area sampling frame development is a major undertaking that must be considered a long-term investment of time, money and other resources.The efficiency of the frame over time depends on the strata definitions and quality of the stratification. When land-use does not change, restratification is not necessary and the work completed during the first year will be valid for many years (Garibay et al.1996).This is also valid for Gauteng, however, using the land-cover data from the "Urban Eye" project, which has recently been completed by the CSIR, could improve the stratification and therefore the estimates. This project has mapped the Gauteng land-cover at a 1:50 000 scale.This has resulted in a much more accurate (and more recent) land-cover assessment of the province.It has provided a more detailed breakdown of the urban land-cover classes and most likely will include classes such as communal land and/or informal settlements.This data was not available at the start of this project.

In light of the survey results, strata could also be combined in future surveys e.g.the vegetation with parcel size greater than 400 ha stratum and the vegetation with parcel size between 25 and 400 ha stratum as the results for these two strata were similar.Further research and consultations with area frame experts would have to be undertaken.

More survey points would have to be added in future surveys for rare items (e.g.ostriches and donkeys) so as to get a more accurate estimate and distribution.

The area frame could also be broken down to district level and a more detailed idea of the numbers and distribution of animals could be calculated.This could be done in conjunction with the work that is done by the veterinary technicians in each district.Every time a farm or land parcel is visited (for whatever reason), a complete count of animals is taken.It is imperative, in these cases, to record the date of the count so as to ensure that animals are not counted twice, due to animal movement.

GVS have disease occurrence points, where outbreaks of diseases have been recorded.A surface per disease could be created and then compared with the animal distribution and animal density surfaces.A relationship might exist between high numbers or densities and disease occurrence.Another relationship might exist between soil type and disease occurrence and, therefore, animal numbers and densities.These possibilities are being investigated by GVS and this will lead to GVS being able to make more informed decisions regarding disease control.

A way to incorporate the statistics from this survey and the statistics from the routine work of technicians is
being found by GVS. This will create historical data that can be used in future to generate more accurate
estimates and distribution surfaces.The results of this survey is also currently being used as one of the
inputs to create an agricultural potential atlas for the Gauteng province.


Cochran, W.G.1977: Sampling Techniques (3rd ed.).John Wiley & Sons.New York.

Deneufchatel, D.& J.C.Porchier.1993: Area Frame Surveys and Remote Sensing: Guide to Surveying by Segments and Points.Study Group on Food and Agricultural Statistics in Europe, 5 - 9 July 1993.FAO.Geneva.

FAO Statistical Development Series.1996: Conducting Agricultural Censuses and Surveys.Food and Agriculture Organisation of the United Nations.Rome.

Ford, al.1986: Area Frame Estimators in Agricultural Surveys: Sampling versus Non-sampling Errors.Agricultural Economics Research, Vol.38, No.2, pp.1 - 9.Statistical Reporting Service - United States Department of Agriculture.Washington, D.C.

Garibay, al.1996: Area Frame Point Sampling: An Exploratory Study to Measure Nicaragua Agriculture Production.National Agricultural Statistics Service - United States Department of Agriculture.Washington DC.

GUS/NASS Project.1995: Pilot Point Sample Area Frame - Poland.Survey Specifications.Draft 3.

Robinson, A.H., et al.1984: Elements of Cartography (5th ed.).John Wiley.New York.

Wyatt, P.2000: The Interpolation Process.Directions Magazine. /article.asp?ArticleID=52.

List of Figures

Fig.1.Cattle distribution in the Gauteng province
Fig.2.Cattle density distribution in the Gauteng province
Fig.3.Dog and cat distribution in the Gauteng province
Fig.4.Dog and cat density distribution in the Gauteng province

[Figure 1 - Click for larger image]

[Figure 2 - Click for larger image]

[Figure 3 - Click for larger image]

[Figure 4 - Click for larger image]


The Gauteng Veterinary Services and ARC-Onderstepoort Veterinary Institute for funding.

Mr G.Narciso for guidance about area frame technology.

Mr A.B.Potgieter for general guidance and assistance.

Gauteng Veterinary Services for providing manpower to carry out the survey.

Dr F.van der Vyfer, Dr F.van der Meigden and Dr G.van der Zel from Gauteng Veterinary Services for their critical inputs.

Mr F.Luccarrini from Aquater (Italy) for assisting with the estimation procedure.

Contact information:

CR von Hagen, DR J van der Meigden (GVS), DR G van der Zel (GVS)

600 Belvedere Street

Private Bag X79
0001 Authors: CR von Hagen, DR J van der Meigden (GVS), DR G van der Zel (GVS), ARC - ISCW, Pretoria, South Africa
Keywords: area frame, estimation, livestock, sampling, spatial distribution

Published Wednesday, August 21st, 2002

Written by Craig von Hagen

If you liked this article subscribe to our newsletter...stay informed on the latest geospatial technology

© 2016 Directions Media. All Rights Reserved.