4.0 MATERIALS AND METHODS
4.1 |
Study Design
The time series approach was adopted in this study. This design specifically caters for matched daily series of exposure and outcome data (Schwartz et al., 1996). It aims to quantify adverse short term effects of the current levels of air pollutants on health. The "health outcome" time series data were daily counts of hospital admissions from certain causes (total respiratory and cardiovascular diseases and individual diseases, namely, bronchial asthma and acute myocardial infarction). The "exposure" time series data were daily measurements of the following "criteria pollutants": nitrogen dioxide, respirable suspended particulates (with aerodynamic diameter less than 10 µm), sulphur dioxide and ozone. Statistical modelling was then performed, taking into consideration the characteristic (approximately Poisson) distribution, overdispersion and positive autocorrelation of the outcome data. Daily meteorological variables (temperature and humidity) and others (seasonal changes, holidays, day of the week, time trends) were included as confounding variables in the statistical model. The health effects of individual pollutants were then examined.
The time series approach allows the best use to be made of routinely collected air quality and hospital data currently available in Hong Kong to address questions concerning the short-term health effects of ambient pollution levels. It addresses the question of whether variations in measured levels of ambient air pollution are statistically associated with variations in health outcomes (in the context of this study, acute hospital admissions for selected diseases).
The general approach of the study is modelled after the protocol of the APHEA project (a European approach using epidemiological time series data), developed within the framework of the EC Environment 1991 - 94 Programme (Katsouyanni et al., 1996). The rationale of using this standard approach is that it represents the best compromise between rigour and feasibility. It caters for the use of aggregated time series data on air quality and health outcomes that were not originally collected for the purpose of epidemiological investigation. It also allows the specific methods to be adapted to suit local data and conditions, within a standardized framework that has established guidelines and quality control criteria. Finally, it allows the results to be compared to other epidemiological time series studies within a meta-analysis framework (Briggs et al., 1996).
The databases and statistical methods employed will be elaborated further in the following section.
|
|
|
4.2 |
Databases
4.2.1 |
Overview of Hospital Admissions Data
The first part of this project involves the collection of health "outcome" data based on daily hospital admissions for selected respiratory and cardiovascular illnesses in 12 hospitals under the Hospital Authority. By design, the study included those hospitals with Accident and Emergency (A & E) Departments (ten in number, including eight major hospitals in the Territory), where patients with acute health conditions under study would be directed 24 hours of the day, and where computerized medical records using the International Classification of Diseases (ICD) coding of diagnosis were available. One (Ruttonjee Hospital) which served as a referral base for emergency patients from the A & E Department of a neighbouring hospital (Tang Shiu Kin Hospital), and another (Our Lady of Maryknoll Hospital) which provided a 24 hour Outpatient Department were also included.
Private hospitals, hospitals without Accident and Emergency Departments and specialist hospitals (e.g., psychiatric hospitals) were excluded from the study. The primary reason for excluding private hospitals was that computerized medical data were not available. Also, none of them had Accident and Emergency Departments. As patients admitted for acute respiratory and cardiovascular symptoms would normally present at an A & E Department, we have excluded those hospitals without this facility. The contribution of private hospitals to the total number of hospital beds in Hong Kong was actually quite small (less than 10 %), and it could be safely assumed that their contribution to the total number of admissions for acute respiratory and circulatory diseases was relatively small. Specialist hospitals in Hong Kong do not routinely admit patients with the acute conditions covered by this study.
The inclusion of Yan Chai Hospital, despite its lack of an A & E Department until late 1994, was due to its strategic location in the District of Tsuen Wan, which was relatively highly polluted and had no other suitable hospital available for this study. One district hospital under the Hospital Authority (Caritas Medical Centre) was excluded because of the lack of computerized medical records for the year 1994. The possible effects of its exclusion will be discussed in Section 8.
For the 12 hospitals included in the study, computerized data were not available until 1993/94. These data were stored within two separate medical record systems - the Integrated Patient Administration System (IPAS) and the Medical Record Abstracting System (MRAS). For most hospitals, the IPAS system was being gradually phased out by the Hospital Authority to be replaced by the MRAS system. For this study, data from both systems were extracted (hospital by hospital) and modified into a common computer format for statistical analysis. The types of patient information contained within these databases which were pertinent to the study were as follows: Dates of admission and discharge, personal information (age, gender, marital status, ethnic group, district of residence, patient code), hospital code, admission source, diagnosis code and discharge status.
The following diseases, based on the Ninth Revision of the ICD (WHO, 1977) were selected for this study:
a. Diseases of the respiratory system (ICD460 - 519):
- The following disease groups were covered:
- Acute respiratory infections (ICD460-466)
- Other diseases of upper respiratory tract (ICD471-478)
- Pneumonia and influenza (ICD480-487)
- Chronic obstructive pulmonary disease (ICD490-496)
- Pneumoconioses and other lung diseases due to external agents (ICD500-508)
Asthma (ICD493) was also analyzed separately.
b. Diseases of the circulatory system:
- - Hypertensive disease (ICD401-405)
- Ischaemic heart disease (ICD410-414)
- Diseases of pulmonary circulation (ICD415-417)
- Other forms of heart disease (ICD420-429)
- Cerebrovascular disease (ICD430-438)
- Diseases of arteries (ICD440-444)
Acute myocardial infarction (ICD410) was also analyzed separately..
Table 1 shows the time of migration from the Integrated Patient Administration System (IPAS) to the Medical Record Abstracting System (MRAS). The migration to the latter system resulted in an almost complete coding of hospital discharges, whereas up to 25% of discharges (in some hospitals) were uncoded within the former system. Most hospitals migrated to the MRAS by early 1995.
Table 1: |
Hospitals by time of migration from the Integrated Patient Administration System (IPAS) to the Medical Record Abstracting System (MRAS). |
Name of hospital |
Time of migration |
United Christian Hospital (UCH) |
February 1993 |
Queen Elizabeth Hospital (QEH) |
February 1993 |
Pamela Youde Nethersole Eastern Hospital (PYN) |
December 1993 |
Tuen Mun Hospital (TMH) |
October 1994 |
Prince of Wales Hospital (PWH) |
December 1994 |
Ruttonjee Hospital (RH) |
December 1994 |
Kwong Wah Hospital (KWH) |
January 1995 |
Queen Mary Hospital (QMH) |
April 1995 |
Princess Margaret Hospital (PMH) # |
April 1995 |
Yan Chai Hospital (YCH) |
November 1995 |
Our Lady of Maryknoll Hospital (OLM) * |
- |
Pok Oi Hospital (POH)* |
- |
# |
Official migration date provided by Hospital Authority only. Most medical records were still coded in IPAS throughout 1995. |
|
|
* |
Still using IPAS |
The total numbers of admissions due to respiratory and cardiovascular diseases by hospital in 1994, 1995 and the first half of 1996 are shown in Table 2. A 10-30% increase in admissions can be observed for most of these hospitals in 1995 compared to 1994, with the following exceptions. In Kwong Wah Hospital and Our Lady of Maryknoll Hospital, a slight decrease is apparent in 1995 and an increase in 1996 (when extrapolated for the whole year). By contrast, a dramatic sevenfold increase in admissions was recorded in Yan Chai Hospital (YCH) in 1995, which stabilized in 1996. Pamela Youde Nethersole Eastern Hospital (PYN) recorded a 60% increase in 1995 and 40% in 1996. A 48% increase in 1995 and 29% in 1996 was found in Ruttonjee Hospital (RH). Tuen Mun Hospital (TMH) had a smaller increase of 35% and 29% in these years.** Overall, there was an increase in the mean number of daily admissions (respiratory and cardiovascular diseases, all 12 hospitals) of 18.2% and 16.8% respectively. These systematic differences, as noted above, were adjusted for in the statistical modelling procedures by the introduction of a t (linear time trend) variable, t2 (quadratic time trend) variable and a 'year effect' indicator.
** |
The increase was likely to be due to the commissioning of additional beds in PYH, RH and TMH during the study period and the opening of the A & E Department in YCH in late 1994. |
Table 2: |
Number of admissions due to respiratory and cardiovascular diseases by hospital in 1994, 1995, and the first half-year of 1996 |
Hospital |
1994 |
1995 |
1996 |
KWH |
12,282 |
9,789 |
5,697 |
OLM |
831 |
729 |
491 |
PMH |
9,529 |
10,775 |
5,712 |
POH |
2,920 |
3,031 |
1,591 |
PWH |
9,608 |
10,908 |
6,187 |
PYN |
3,796 |
6,077 |
4,244 |
QEH |
10,631 |
13,052 |
7,654 |
QMH |
9,959 |
10,355 |
5,533 |
RH |
2,815 |
4,170 |
2,691 |
TMH |
6,058 |
8,194 |
5,282 |
UCH |
8,709 |
9,185 |
5,790 |
YCH |
1,743 |
7,011 |
3,460 |
Total |
78,881 |
93,276 |
54,332 |
Mean daily admissions |
216.11 |
255.55 |
298.53 |
|
|
|
4.2.2 |
Overview of Air Quality Data
The exposure time series data which were analyzed include daily measures of the following air pollutants: sulphur dioxide, nitrogen dioxide, ozone and respirable suspended particulates (RSP, measured by Tapered Element Oscillating Microbalance - TEOM). Various daily measures (mean and maximum levels) of the above pollutants were monitored at air quality monitoring stations of the Environmental Protection Department (EPD) and those available for the study period were provided in a computerized format (Table 3). The following monitoring sites: Central and West, Kwai Chung, Kwun Tong, Sham Shui Po, Shatin, Tai Po, Yuen Long and Tsuen Wan are located on low roof tops (four to six storeys) in various urban, industrial and new development areas. Data collected at these stations represent 'population background exposure' levels of ambient air pollution. Data from Mongkok station were not comparable as they were collected at street level and were therefore excluded.
Table 3: |
Summary Description of Air Quality Parameters |
Parameter |
Measurement units and method |
Sub-parameters |
Sulphur Dioxide (SO2) |
ug.m-3
pulsed fluorescence
|
SO2 - 24hr mean
SO2 - max 1 hr
|
Nitrogen Dioxide (NO2) |
ug.m-3
gas-phase chemiluminescence
|
NO2 - 24hr mean
NO2 - max 1 hr
|
Respirable suspended particulates (RSP)
(diameter < 10 µg)
|
ug.m-3
tapered element oscillating microbalance
(TEOM)
|
RSP - 24hr mean
RSP - max 1 hr
|
Ozone (O3)
|
ug.m-3
Ultraviolet absorption
|
O3 - 8 hr (9am-5pm) mean
O3 - max 1hr
|
A rigorous quality control programme has been implemented by the EPD (EPD, 1994). Measuring instruments are routinely calibrated and spurious data caused by extrinsic factors are screened out to produce a valid, if not complete, data set. For the hourly data to be accepted, two third of the 5-minute readings for that hour must be available and valid. The same "two third" criterion applies to the daily values which summarize the hourly readings. This study, however, adopted a more rigorous "75% criterion" in order to conform to the APHEA protocol. Also, for each pollutant, monitoring stations with more than 25% of valid daily measurements missing for the entire study period were excluded. As particulates have been shown to exert significant effects on health in many studies (Schwartz & Dockery, 1992; Dockery & Pope, 1994; Schwartz et al., 1995; Schwartz, 1996; Samet et al., 1995), an exception was made for RSP (measured by TEOM). In this case, three stations which were missing more than 25% of the data series (but less than 33%) were included (Table 4). For a station with less than 25% of missing daily values, the missing values were estimated based on the available measurements in the other monitoring sites for the same day. The daily missing value was replaced by the mean daily level of the remaining stations multiplied by a correction factor, which was the ratio of the seasonal (three-month) mean for the missing station to the corresponding seasonal mean for the remaining stations. The detailed APHEA methods for preparing the air quality data for time series analysis, including the imputation of missing data, are presented in Appendix I.
Table 4 shows the degree of completeness of the air quality data based on the above criteria. Data from Yuen Long could not be included for any of the air pollutants due to the extent of missing data (more than 80% in 1994). Data for NO2 were complete for all remaining seven stations, SO2 for six stations, RSP for five and O3 for two. Data for the first half-year of 1996, especially for RSP, were much more complete than in 1994-95.
Table 4: |
Percentage of valid daily measures of air pollutants by station available for the study period (1994 - 95 and first half of 1996) |
|
Central
Western |
Kwai
Chung |
Kwun
Tong |
Sham
Shui Po |
Shatin |
Tai Po |
Tsuen
Wan |
NO2 |
1994 |
90.41 |
92.33 |
95.34 |
86.58 |
65.75 |
100.00 |
93.70 |
1995 |
92.60 |
92.33 |
95.89 |
89.04 |
97.26 |
96.71 |
95.34 |
1994-95 |
91.51 |
92.33 |
95.62 |
87.81 |
81.51 |
98.36 |
94.52 |
1996 |
98.52 |
99.01 |
99.38 |
97.41 |
99.63 |
98.89 |
98.03 |
O3 |
1994 |
89.32 |
93.42 |
- |
- |
- |
- |
- |
1995 |
96.71 |
96.44 |
- |
- |
- |
- |
- |
1994-95 |
93.01 |
94.93 |
- |
- |
- |
- |
- |
1996 |
98.77 |
99.26 |
|
|
|
|
|
SO2 |
1994 |
100.00 |
94.79 |
97.53 |
93.15 |
97.81 |
- |
93.70 |
1995 |
98.08 |
96.71 |
97.26 |
93.15 |
99.45 |
- |
95.07 |
1994-95 |
99.04 |
95.75 |
97.40 |
93.15 |
98.63 |
- |
94.38 |
1996 |
99.26 |
98.77 |
99.51 |
95.05 |
99.88 |
- |
99.14 |
RSP (by TEOM) |
1994 |
53.42 |
45.75 |
56.99 |
- |
76.44 |
- |
91.23 |
1995 |
85.21 |
94.79 |
79.18 |
- |
87.12 |
- |
94.25 |
1994-95 |
69.32 |
70.27 |
68.08 |
- |
81.78 |
- |
92.74 |
1996 |
100.00 |
100.00 |
98.40 |
- |
96.67 |
- |
97.17 |
|
|
|
4.2.3 |
Overview of Meteorological Data
Meteorological data, namely, daily mean, maximum and minimum temperature and relative humidity were obtained through the Royal Observatory for the study period. There were seven stations (King's Park, Lau Fau Shan, Wong Chuk Hang, Shatin, Tuen Mun, Ta Kwu Ling and Tseung Kwan O). The entire series of daily values were complete for all stations except Ta Kwu Ling, where data were not recorded for only three days. Mean temperature and humidity were confounding variables as they vary with time and have been shown to be correlated with both air quality and health outcome variables (Schwartz et al., 1996). Appropriate adjustments were made for their effects in the statistical modelling.
|
|
|
|
4.3 |
Statistical Modelling
The statistical modelling followed the guidelines proposed by the APHEA protocol, which established that hospital admissions data are generally best represented by a Poisson distribution (Schwartz, Spix, Touloumi, et al., 1996). This is because, on any given day, only a small proportion of the population is admitted to hospital and large numbers of admissions are relatively rare. Also, the numbers of admissions represent counts which are non-negative integers. It has also been observed that admissions data are usually overdispersed (that is, the variance is larger than the mean), and positively auto-correlated. This is in contrast to the characteristics of a Poisson distribution, in which the variance is equal to the mean.
In a Poisson process, which is a relative risk model, a homogeneous risk to the underlying population on a given day is assumed (Schwartz et al., 1996). Given that underlying risk, the expected number of admissions on any day is . The probability of y admissions occurring on a given day is given by:
The Poisson regression model assumes that varies with time varying predictor variables X1, X2 ..... Xn,
log = b0 + 1X1 + .... + nXn
where X1 .... Xn are the predictors of daily admissions / mortalities and 1 .... n are the regression coefficients for these predictors. The relative risk of the ith predictor is given by e i .
The presence of overdispersion and serial correlation necessitates some statistical adjustment to the Poisson model. To address these problems, a number of methods have been reported in the literature. An iterative method called Generalized Estimating Equations (GEE), which is an extension of the Poisson regression, was used by Zeger (1988). In this method, the vector of residuals was weighted by an estimate of the inverse of the covariance matrix, and the weighted residuals were filtered with an autoregressive filter.
Brannas and Johansson (1994) extended the Poisson regression model by correcting the covariance matrix. The significance of the predictors was then assessed by the X2 test using the corrected estimates of the variances, allowing valid inferences to be made from the regression coefficients. This is a much simpler procedure than the computer intensive method by Zeger.
When applying Brannas and Johannson's method in this study, we have found that the deviance remained quite large despite using different transformations of the independent variables specified in the APHEA protocol. Williams (1982) suggested that a large residual variation might be due either to the intrinsic (overdispersed) nature of the data or to some overlooked explanatory variables. He proposed a method for correcting overdispersion by multiplying the variance by an estimate of the dispersion parameter.* This method was adapted to the Poisson model by Breslow (1984) by taking appropriate limits in Williams' formulae. After testing all the potential confounding variables recommended by the APHEA protocol, we accepted that the data was overdispersed and adopted Williams' method of correction. Williams' method is supported by the SAS statistical software (SAS, 1996).#
* |
Suppose that the data consist of n binomial observations. The variance of the response probability is given by: V(Pi) = ØØ pi(1-pi) An estimate of Ø , a non-negative but otherwise unknown dispersion parameter, was made by equating the value of Pearson's chi-square statistic for the full model to its approximate expected value. After a weighted fit of the model, and X2 were recalculated, and a revised estimate of Ø was calculated. The iterative procedure was repeated until c2 is very close to its degree of freedom. |
|
|
# |
We used PROC LOGISTIC of SAS to run the Poison regression model. The option scale=Williams in PROC LOGISTIC was then chosen. |
The APHEA guidelines (Katsouyanni et al., 1996) were also followed in the construction of the models. This procedure started with the construction of a "core" model in which the potential confounders of the short term relationship of air pollutants and daily hospital admissions for respiratory and cardiovascular diseases in 1994 and 1995 were investigated. In a time series model, the response variables (hospital admissions and deaths) show both a long term trend and shorter term periodic variations. These have to be adjusted for in order to identify the effects attributable to the pollutants. The APHEA guidelines specify that the core model (without pollutant variables) should include variables to account for the following - long term trends (time trend), medium term variations (season, using sine and cosine terms to control for seasonal and other cyclical patterns), short term systematic (day of the week, holidays, day after holiday) and short term, less systematic (meteorological) variations. The inclusion of these variables removes much of the "noise" in the model. Based on the APHEA protocol and the goodness-of-fit of the models, the following variables were included:
- Linear time trend, t: (Day 1,2,..., 730)
Quadratic time trend, t2: 1,4,9,...)
Year-effect indicator, Y: (1994 and 1995)
Day of the week: I1 to I6 (six dummy variables)
Holiday: H1
Day after holiday: H2
Seasonality, S1 - S4, CS1 - CS4: sin {2k¹t/365} and cos {2k¹t/365}, where k = 1,2,3,4 (one year, six months, four months and three months)
Daily mean temperature
Daily relative humidity
Time lags for temperature and humidity, and the interaction between temperature and relative humidity were found to be insignificant when entered in a stepwise multiple linear regression model. These were excluded from the core model on this basis.
Verhoeff, Hoek, Schwartz & van Wijnen (1995) observed in the Amsterdam study of air pollution and daily mortalities that, after controlling for season and trend, the magnitude of the serial correlation in hospital admission data was low and the estimates were only slightly changed by incorporating serial correlation. However, considerable overdispersion was noted in Hong Kong's hospital admissions dataset for 1994 and 1995. Accordingly, adjustments for the overdispersion of the daily hospital admissions were made using Williams' method (1982), described above, and the results compared with the simple Poisson model.
Delayed effects of air pollutants were explored using single day lags and cumulative lags up to five days for ozone and three days for the other air pollutants. Owing to the high correlation coefficients between individual pollutants, the 'single pollutant model' was used to determine the effect of each individual pollutant on hospital admissions. In this approach, each air pollutant was separately entered into the "core model" to obtain its respective partial regression coefficient. It is recognized that adverse health outcomes may be due to the combined exposure to more than one pollutant. However, certain pollutant variables were highly correlated and the APHEA protocol recommends the construction of single-pollutant models as a starting point. Based on the partial regression coefficients (b) of the individual pollutants in the single pollutant model, relative risks of hospital admissions and deaths due to respiratory and cardiovascular diseases associated with a 100 ug.m-3 increase in air pollutant concentrations were derived.
The effects of more than one pollutant (including their interactions) were then explored using a 'multiple pollutant model'. In this multiple pollutant model*, relative risks of individual pollutants adjusted for the effects of the others, were obtained. The final model was constructed by the following steps:
- Initially, all 4 pollutants (main effects) and all 2-way interactions were included. Stepwise selection was employed to select the significant interaction(s).
- Insignificant pollutants (except those involved in the significant interactions) were then removed from the model.
- William? method was applied to the model obtained in Step 2.
- Any insignificant interaction(s) and main effect(s) was (were) removed.
- The final model consisted of significant main effect(s) and significant interaction(s) as well as the corresponding main effect(s) in those significant interaction terms (even though the main effects by themselves were insignificant).
In the choice of pollutant parameters, the 'best' lags or cumulative lags which had been selected in the single pollutant models were used in the construction of the multiple pollutants model. When significant interactions were observed, the relative risks of one pollutant at different levels of the interacting pollutant were calculated. The effect of multi-collinearity was compared with and without using Ridge regression (Schaefer, 1986).
* |
To address the problem of collinearity between the pollutants, the technique of Ridge estimation for collinear data in logistic regression (Schaefer, 1986) was used but the results were similar to those without using this method. |
|
|
|
4.4 |
Validation of Model
The model based on data from 1994 to 1995 was then validated using data for the first half-year of 1996. Fitness of the model was assessed by plotting the observed and predicted daily admissions on the same graph for each pollutant to look for discrepancies visually.
|
|