# Large Sample Properties of Generalized Method of Moments Estimators

Large Sample Properties of Generalized Method of Moments Estimators Author(s): Lars Peter Hansen Reviewed work(s): Source: Econometrica, Vol. 50, No. 4 (Jul., 1982), pp. 1029-1054 Published by: The Econometric Society Stable URL: http://www.jstor.org/stable/1912775 . Accessed: 14/03/2012 21:45 Your use of the JSTOR archive indicates your acceptance of the Terms then the GMM estimator { bN: N 1} exists and converges almost surely to /go. In examining Theorem 2.2, let us first consider the case in which X(/) = /3. Condition (i) is easily verified for (S, a) = (Rq, I 1). The function ho is given by h0(,B) = a0E[c0(x1)] + aoE[c1(x1)] P. Suppose that ao and Ec,(x,) are both of full rank. Furthermore, we assume that flo is a zero of ho. This is sufficient to imply that for any p 0 inf{ 1 I ho(13)J :f P Rq,I p-}pol 0, and consequently condition (iii) is met. Thus, for models that are linear in parameters, Theorem 2.2 requires no additional assumptions on the parameter space in order to achieve consistency. Of course, this result could be demon- strated easily by explicitly solving for the estimator. Nonetheless, a consistency result for linear models in which the underlying stochastic process is stationary and ergodic is embedded in Theorem 2.2. The more interesting aspect of Theorem 2.2 is that it provides a consistency result for models that are nonlinear in the parameters and does not explicitly employ a compactness requirement. For this reason, we will examine conditions (ii) and (iii) in more depth. Condition (ii) requires that the mapping defined by the inverse of X be continuous at the true parameter vector. In interpreting assumption (iii), it is fruitful to view R m as the unrestricted parameter space. The function X is used to indicate the elements of that space which satisfy the restrictions generated by the model. In particular, we let P = ( 9 E Rm : X(/f) = O for some/3 E S }, i.e., P is the image of X over the set S, and the set S indexes elements of P that satisfy the restrictions. We define another set Q as Q = (9 E Rm: a0Ec0(xl) + a0Ecl(x1)9 = 0}. The hyperplane Q consists of all the elements of Rm that satisfy the population orthogonality conditions used in estimation. From conditions (ii) and (iii), we are guaranteed that Q n P = {f 0} where 90 = X(fo)4 We now endeavor to construct sufficient conditions for condition (iii) to hold. LARGE SAMPLE PROPERTIES 1037 We define a function t by ((p) = inf { - 71 E Q, (=- P, |w -Sol P, 10-Sot P} The following lemma supplies some sufficient conditions for (iii) of Theorem 2.2 to hold. LEMMA 2.3: Suppose Assumptions 2.4 and 2.5 are satisfied. If (i) for any p 0, t(p) 0; (ii) limP-0inf t(p)/p 0; then condition (iii) of Theorem 2.2 is satisfied. Condition (i) of Lemma 2.3 says that it is not possible for elements in P to get arbitrarily close to elements in Q outside of the neighborhood of 00,. Condition (ii) of Lemma 2.3 says that the distance between P and Q outside a neighbor- hood of radius p of 00 eventually grows at least proportionately with p. 2 A condition like (ii) is needed because although the set P is specified a priori, the set Q is not known and ao, E[co(xl)], and E[c,(x,)] have to be estimated. Using estimators of these matrices, define the random set QN = E aN N c(x) + aN N C(Xn) = 0}- Even very small errors in estimating ao, E [co(x1)], and E [c,(x,)] get magnified in terms of the distance between QN and 0 E Q as 0 becomes large in absolute value. To insure that the GMM estimator is consistent, we have to rule out the possibility that QN n P contains an element far away from 00 for sufficiently large sample size N. If S is compact and P n Q = { 00}, then assumptions (i) and (ii) of Lemma 2.3 are trivially met. However, Lemma 2.3 and Theorem 2.2 can be applied in situations in which S is not compact. In fact, an important special case occurs when Q = {S0}. This means that the unrestricted parameter vector QO is uniquely determined by the population orthogonality conditions used in estimation. When Q = { 0O} assumptions (i) and (ii) of Lemma 2.3 are easily verified. 3 The consistency Theorems 2.1 and 2.2 illustrate the potential tradeoff between assumptions on the function f and assumptions on the parameter space S in order to obtain strong consistency of the GMM estimator. Theorem 2.1 most closely resembles other consistency theorems in the literature for nonlinear instrumental variables, where the parameter space is assumed to be compact [1, 24]. In contrast to those theorems, Theorem 2.1 does not assume that distur- bances are serially independent. Theorem 2.2 relaxes the compactness assump- tion at the cost of being more restrictive about the specification off. 12 A requirement equivalent to condition (ii) of Lemma 2.3 can be formulated in terms of asymptotic cones. If we let As(P) and As( Q) denote the asymptotic cones of P and Q, respectively, then condition (ii) is equivalent to requiring that As(P) n As( Q) = {O}. 13In the example considered in Hansen and Sargent [18, pp. 33-36], Q O {90}. Malinvaud [27, p. 350] has proved a theorem for minimum distance estimators similar to Theorem 2.2 in cases in which Q = {f 0}. Huber [23] has a general treatment of consistency in cases in which the observation vector is independent and identically distributed. 1038 LARS PETER HANSEN Before concluding this discussion on consistency, one additional theorem is considered. Suppose elements in Rq are partitioned into two subvectors, i.e., la = (3 j , 13). Furthermore, suppose the metric a is ao(8, y) = max(o1(,y1, I),o2(/32v72)} where a1 is a metric defined on S1= (/ 3: (/,/3 5) E S for some/32} and 2 is a metric defined on S2= 2 (12:(I, 13) E S for some l1}. In some circumstances it may be computationally convenient to construct a strongly consistent estimator {blN N ?1 } of f3lo by using a subset of the orthogonality conditions provided by the model. In particular, Theorems 3.1 or 3.2 could be used to establish this consistency. After obtaining this estimator of 1 0, we can construct an estimator of 1B2,0 by minimizing |hN[w, bl, N (W)1 82 ]12 with respect to 8B2 such that ( 23j% 13) E S, where I = blN(w). Theorem 2.3 establishes the consistency of this recursive estimator. THEOREM 2.3: Suppose Assumptions 2.1-2.5 are satisfied. If (i) the conditions of Theorem 3.1 are satisfied; (ii) { bl ,N: N 1 } converges almost surely to 13a o; (iii) for any sequence { yj:j2 1 } in S such that { y I : j]? 1 } converges to 81 o, there exists a sequence { 82j:j2 1 } such that {(-Y 1j 382j): ] 1} is a sequence in S that converges to 180; then a GMM estimator {bN : N 1 } exists and converges almost surely to 10.14 A couple of comments are in order about Theorem 2.3. First, condition (iii) imposes an extra requirement on the parameter space S. If a1 and 2 are defined by the absolute value norm, then a sufficient condition for (iii) to hold is that S be convex. However, condition (iii) can be satisfied in the absence of convexity. Second, some of the coordinate functions of hN[w,bl N (w), ] may not actually depend on J12. If this is the case, computation of the criterion function for the second step of this recursive procedure can be simplifed by ignoring these coordinate functions. 3. THE ASYMPTOTIC DISTRIBUTION OF THE GMM ESTIMATOR In this section we establish the asymptotic normality of a generic GMM estimator. Our discussion adopts a different but closely related formulation of 14A version of Theorem 2.3 also can be established using the assumptions of Theorem 2.2 and conditions (ii) and (iii) of Theorem 2.3. LARGE SAMPLE PROPERTIES 1039 GMM estimation to that in Section 2. The first-order conditions of the minimiza- tion problem used to define a GMM estimator in Section 2 have the interpreta- tion of setting q linear combinations of the r sample orthogonality conditions to zero where q is the dimensionality of the parameter space. It turns out that estimators obtained by minimizing or maximizing other criterion functions, e.g., quasi-maximum likelihood or least squares estimators, oftentimes can be inter- preted in the same manner by examining the corresponding first-order condi- tions. 15 Our approach in this section is to adopt a generic form of the first-order conditions and to assume that consistency has already been established. For estimators not included in the discussion of Section 2, consistency might be established by appealing to other treatments of those estimators or by appropri- ately modifying the proof strategy employed in Section 2. We begin our asymptotic distribution discussion by describing the underlying assumptions which we make. We extend the index set of the stochastic process containing the observable variables from the nonnegative integers to include all of the integers. For studying probabilistic properties, Doob [7, p. 456] argues that this extension is innocuous. ASSUMPTION 3.1: {x,: - O. Assumptions 3.1 and 3.3 imply that {Wn: - O} is a martingale difference sequence. ASSUMPTION 3.5: E[w0w ] exists and is finite, E[wj wo ,w w J 1, . . . ] con- verges in mean square to zero, and 7 0E [Vj. j]1/2 is finite. Among other things, Assumption 3.5 implies that E[ f(xn, I30) ]=O for -oo 1} and make the following assumption: ASSUMPTION 3.6: {a* : N 1} converges in probability to a constant matrix a* which has full rank. 16Abbreviated versions of the proofs of some of the results in this section and in Section 4 are provided in the Appendix. More detailed versions of the proofs can be obtained from the author on request. 7This implication can be seen by employing an iterated expectations argument and noting that E[wo] = E[f(xn, 0o)I- LARGE SAMPLE PROPERTIES 1041 We require that a GMM estimator {b*: N 1 } asymptotically satisfy the set equations aO*Ef(x,, )= 0 in the sense of Definition 3.1. DEFINITION 3.1: The GMM estimator {b: N 1 } is a sequence of random vectors that converges in probability to /0 for which { Na*gN(b): N 1} converges in probability to zero. Before showing that this GMM estimator is asymptotically normal and dis- playing the dependence of its asymptotic covariance matrix on the limiting weighting matrix a*, we discuss the link between this estimator and the GMM estimator of Definition 2.1. Note that IhN (1)12 = laNgN (1)1 = gN(8 )aNaNgN ( )a Assuming that the first-order conditions for the problem of minimizing IhNI2 are satisfied by bN, then (6) g (bN) aNaNgN (bN) Let aN be the q by r matrix (7) a* = a,g (bN) a aN. Substituting (6) into (7), we obtain aNgN(bN) = 0 which trivially satisfies one of the key requirements of Definition 2.1. Once we establish the strong consistency of the estimator of Definition 2.1, require that the first-order conditions (6) be satisifed, and demonstrate that the sequence {a*: N 1 } converges in probabil- ity to a constant matrix, then we obtain a GMM estimator of Definition 3.1. Lemma 3.2 supplies sufficient conditions for {a*: N 1} as defined by (7) to converge in probability to a constant matrix. LEMMA 3.2: Suppose Assumptions 3.1-3.4 are satisfied. If (i) {bN :N 1} converges in probability to I30; (ii) { aN: N 1 } converges in probability to ao; then {(agN/a/3)(bN): N 1} converges in probability to do and {a* : N 1} given by (7) converges in probability to a* = doa ao. While the above discussion shows how the estimators of Definition 2.1 can be viewed as GMM estimators under Definition 3.1, our asymptotic distribution is not limited to estimators of this form. Any consistent estimators which minimize 1042 LARS PETER HANSEN or maximize criterion functions with first-order conditions that can be repre- sented as aNgN(bN) + HN(b*) = 0 where {VNHN(b *) : N 14 converges in probability to zero for an appropriate choice of a*, f, and HN, can be viewed as special cases of the generic GMM estimator of Definition 3.1.18 Thus, various forms of least squares and quasi- maximum likelihood along with nonlinear instrumental variables estimators are included in our asymptotic distribution discussion. In preparation for our asymptotic distribution theorem, we let Rw(j) = E[ wOw-]. Assumptions (3.1) and (3.5) insure that Rw(j) is finite and that the matrix + 00 SW = E RW(j) j=-00 is well defined and finite. 9 Theorem 3.1 displays the asymptotic distribution of the GMM estimator. THEOREM 3.1: Suppose Assumptions 3.1-3.6 are satisfied. Then {VN(b* - 30): N 1) converges in distribution to a normallv distributed random vector with mean zero and covariance matrix (ado)-la* SwaS* (a*do) 1.2O Since Sw plays a prominent role in the expression for the asymptotic covari- ance matrix, we shall examine Assumption (3.5) in conjunction with the compu- tation of Sw. We focus on situations in which (10) f(xn, i?) = Un (0 Zn where we view zn as a vector of the instrumental variables and u, is a vector of the disturbance terms from the econometric model. Let Ru(j) = E[ Un Un1] and RZ (j) = EL ZnZn ] and assume that RJ(O) and RZ(O) exist and are finite. It is instructive for us to examine five special cases. 18The minimax estimator of Sargan [30] can be interpreted as a GMM estimator with a nontrivial HN- 19Under Assumptions 3.1 and 3.5 it can be shown that the elements in the autocovariance function for { wn: - 0 1. This shows that Assumption 3.5 is satisfied. Also, it can be demonstrated that R-(U)=0 for j# 0 and Sw = Rw(O) = Ru(0) 0 R(0). Thus, Sw can be computed from the second moments of Zn and un. CASE (ii): E[ un I Zn, Un- I,Zn- ,Un-21 O. =? This case differs from Case (i) in that we no longer assume that the conditional covariance matrix for un is independent of the conditioning set. This allows for a particular form of heteroskedasticity. The stationarity assumption, however, restricts us to circumstances in which the unconditional variances of {unu: -X k and vj=0 for jk. This means that Assumption 3.5 is satisfied. Also, RW(j)=0 for jk and k-I SW= E R, (j). j=k+ I Computation of SW entails only the determination of a finite number of the autocovariances of { wn -X 0. This additional assumption can be used to simplify the expressions obtained in (8) and (9). The five special cases discussed above illustrate how auxiliary assumptions imply alternative formulas for calculating Sw. These auxiliary assumptions also can be used to obtain formulas for models with orthogonality conditions that have representations other than (10). Assumption 3.5, however, accommodates models that do not necessarily satisfy the defining assumptions of any of the five special cases discussed above. Some of the models examined by Hansen and Sargent [19] are not included in these cases as well as models whose orthogonal- ity conditions emerge because certain equations define best linear predictors but not conditional expectations. Theorem 3.1 can be applied to these models as well. In order to make asymptotically valid inferences and construct asymptotically correct confidence regions, it is necessary to have consistent estimators of a*, do, S