Training site selection
The statistical validity of the results of the classification is dependent on two characteristics of training inputs, namely, the size and representativeness of the sample. In order to obtain suitable training site data, the guidelines described below were followed.
From literature review (Lillesand and Keifer 1994), the number of pixels used to train the classifier should be between 10n and 30n for each class, where n is the number of spectral bands. In this study, six spatial and two principal component bands (Section 4.0) were input into the classifier, equating to an area of between 80 and 240 pixels or 8,000 and 24,000 m2 (10x10 m2 pixels).
As the spectral properties of a pixel may not be entirely independent of neighbouring pixels due to spatial autocorrelation, training sites from more than one location were selected for each class. A target of four sites per class was set and in only one case was three training sites used.
In order that the classifier had the greatest probability of accurately assigning pixels to a class, sites were selected so that the spectral properties are as homogeneous as possible (Cognalton 1991). For example, a building in the centre of a target vegetation type within a training area will increase the likelihood of misclassification.
For classes that were readily identifiable in the aerial photos, training sites were delineated by on-screen digitising using the aerial photos as a backdrop. In cases where it was not possible to positively identify training sites from the desktop, field visits were conducted. Areas to be visited were located by examining the aerial photos and consultation with ecological specialists. The training site log sheet summary is provided in Table A1 of Annex A. The co-ordinate locations of the training sites are presented in Table A2 of Annex A.
Reference site selection
Reference sites are similar to training sites in the way that they are locations where the surface features are known and are collected by field survey or examination of other data sources. Indeed, surplus training sites not used in the classification are suitable for use as the reference sites. Reference sites are used in accuracy assessment where sites of known surface features are compared to the classified image, often in the form of an error matrix or contingency table (Section 0). The reference site log sheet summary is provided in Table A3 of Annex A.
In this project, reference sites derived from training sites were supplemented by stratified random sampling of sites in the initial classification. The sampling was stratified so that classes that occurred infrequently were adequately represented in the total sample. The random sampling procedure was performed in ArcInfo (Section 4.1.5) and involved randomly selecting 30 polygons with an area greater than 400 m2 for each class. Any randomly selected site that coincided with training site location was discarded. These sites were then overlaid onto the aerial photo and the class verified by ecological specialists. Where reference sites of particular classes were difficult to verify from the aerial photos, field visits were conducted.
For reasons of practicality, random selection of sites that were to be visited in the field was achieved using two additional parameters:
The distance from road/track parameters condition was implemented for access and visibility reasons. The vicinity of reference sites to roads also raised the possibility of human disturbance so disturbed sites were avoided where possible. Both conditions were implemented using ArcView (Section 4.1.5); the first by buffering roads and tracks by the specified distance and the second by digitising a boundary defining the limits of the area to be visited on a particular day.
Fifteen reference sites were collected for each of the classes that were to be assessed, with the exception of the Forest class, for which sixteen sites were used. A standard area of 10 m2 was used so that each site was equally represented in the total sample.
|