Semiparametric Regression for Clustered Data Using Generalized Estimating Equations 论文

2001Journal of the American Statistical Association引用 281
Statistical Methods and InferenceStatistical Methods and Bayesian InferenceBayesian Methods and Mixture Models

摘要

We consider estimation in a semiparametric generalized linear model for clustered data using estimating equations. Our results apply to the case that the number of observations per cluster is finite, while the number of clusters is large. The mean of the outcome variable is of the form g( ) = X T + (T ), where g( ) is a link function, X and T are covariates, is an unknown parameter vector and (t) is an unknown smooth function. Kernel estimating equations proposed previously in the literature are used to estimate the infinite dimensional nonparametric function (t) and a profile-based estimating equation is used to estimate the nite dimensional parameter vector . We show that for clustered data this conventional profile/kernel method often fails to yield a p n- consistent estimator of along with appropriate inference unless working independence is assumed or (t) is artificially undersmoothed, in which case asymptotic inference is possible. To gain insight of these results, we derive the semiparametric efficient score of , which is found to have a complicated form, and show that unlike for independent data, the profile/kernel method does not yield a score function asymptotically equivalent to the semiparametric efficient score of , even when the true correlation is assumed and (t) is undersmoothed. We illustrate the methods with an application to infectious disease data and evaluate their finite sample performance through a simulation study.