Comparing Different Methods of Estimating the Variance of Propensity Score Matching Estimator

Kamalian, Alireza; Tayebi, Seyed Komail; Sharifi, Alimorad; Amiri, Hadi

doi:10.22099/ijes.2021.38054.1692

Document Type : Research Paper

Authors

¹ Department of Economics, University of Isfahan, Isfahan, Iran

² Department of Economics, University of Isfahan, Isfahan,Iran

https://doi.org/10.22099/ijes.2021.38054.1692

Abstract

Propensity score matching is extensively utilized in estimating the effects of policy interventions and programs for data observations. This method compares two treatment and control groups to make statistical inferences about the significance of the effects of these policies on target variables. Therefore, when using propensity score matching, it is significant to obtain the standard error to estimate the treatment effect. The precise estimations of variance and standard deviation facilitate more efficient statistical testing and more accurate confidence intervals. However, there is no agreement in the literature on the estimation method of standard error; some methods rely on resampling, while others do not. This study compares these methods using Monte Carlo simulation and calculating the Mean Squared Errors (MSE) of these estimators. Our results indicate that Jackknife and standard methods are superior to Abadie and Imbens (2006) bootstrap, and subsampling ones in terms of accuracy. Finally, reviewing Tayyebi et al. (2019) indicated that different methods of estimating variance in the matching estimator led to different statistical inferences in terms of statistical significance.

Keywords

Main Subjects

Econometrics

Article Title [Persian]

مقایسه‌ی روش‌های متفاوت برآورد واریانس در برآوردگر مچینگ ضریب تمایل

Authors [Persian]

علیرضا کمالیان ¹
سید کمیل طیبی ¹
علیمراد شریفی ¹
هادی امیری ²

¹ دانشکده اقتصاد، دانشگاه اصفهان، اصفهان، ایران

² دانشکده اقتصاد، دانشگاه اصفهان، اصفهان، ایران

Abstract [Persian]

مچینگ ضریب تمایل به وفور برای تخمین اثر برنامه و مداخلات سیاستی برای داده های مشاهده‌ای استفاده شده است. این روش با مقایسه ی میان دوگروه درمان و کنترل به استنتاج آماری درباره معنی داری تاثیر این سیاستها بر متغیرهای هدف می پردازد و به همین دلیل یکی از موضوعات مهم در هنگام استفاده از مچینگ ضریب تمایل، برآورد انحراف معیار برای تخمین اثر درمان است. برآورد دقیق واریانس و انحراف معیار،آزمون آماری کاراتر و فاصله اطمینان دقیق تر را ممکن می سازد. با این حال اختلافات بسیاری در ادبیات چگونگی تخمین انحراف معیار وجود دارد. برخی از این روش‌ها مبتنی بر بازنمونه‌گیری و برخی مستقل از آن است. در این پژوهش با به‌کارگیری شبیه‌سازی مونت کارلو و محاسبه‌ی میانگین حداقل مربعات خطای این برآوردگرها( MSE) به مقایسه این روش‌ها پرداخته شده‌است. نتایج شبیه‌سازی در این مطالعه دلالت بر مزیت روشهای جکنایف و استاندارد نسبت به روش های آبادی-ایمبنز ، بوت استرپ و زیرنمونه داشته‌است. در پایان نیز با بررسی مقاله طیبی و همکاران نشان داده شد که روش های مختلف برآورد واریانس در برآوردگر مچینگ منجر به استنتاج آماری متفاوت از لحاظ معنی داری آماره ها می شد.

Keywords [Persian]

مچینگ
ضریب تمایل
مونت‌کارلو
بازنمونه‌گیری

References

Abadie, A., & Imbens, G. W. (2009). Matching on the estimated propensity score.
No. w15301. Cambridge, MA: National Bureau of Economic Research.
doi, 10, w15301.
Abadie, A., & Imbens, G. W. (2006). Large sample properties of matching
estimators for average treatment effects. Econometrica, 74(1), 235-267.
Abadie, A., & Imbens, G. W. (2008). On the failure of the bootstrap for matching
estimators. Econometrica, 76(6), 1537-1557.
Abadie, A., & Imbens, G. W. (2011). Bias-corrected matching estimators for
average treatment effects. Journal of Business & Economic Statistics, 29(1),
1-11.
Abadie, A., & Imbens, G. W. (2012). A martingale representation for matching
estimators. Journal of the American Statistical Association, 107(498), 833-
843.
Agodini, R., & Dynarski, M. (2004). Are experiments the only option? A look at
dropout prevention programs. Review of Economics and Statistics, 86(1),
180-194.
Althauser, R. P., & Rubin, D. (1970). The computerized construction of a matched
sample. American Journal of Sociology, 76(2), 325-346.
Austin, P. C., & Cafri, G. (2020). Variance estimation when using propensity‐
score matching with replacement with survival or time‐to‐event
outcomes. Statistics in Medicine, 39(11), 1623-1640.
Austin, P. C. (2009). Using the standardized difference to compare the prevalence
of a binary variable between two groups in observational
research. Communications in Statistics-Simulation and Computation, 38(6),
1228-1234.
Austin, P. C., & Small, D. S. (2014). The use of bootstrapping when using
propensity‐score matching without replacement: A simulation
study. Statistics in Medicine, 33(24), 4306-4319.
Bang, H., & Robins, J. M. (2005). Doubly robust estimation in missing data and
causal inference models. Biometrics, 61(4), 962-973.
Becker, S. O., & Ichino, A. (2002). Estimation of average treatment effects based
on propensity scores. The Stata Journal, 2(4), 358-377.
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and
applications. Cambridge university press.
Chapin, F. S. (1947). Experimental designs in sociological research Harper &
Row. New York.
Cochran, W. G., & Rubin, D. B. (1973). Controlling bias in observational studies:
A review. Sankhyā: The Indian Journal of Statistics, Series A, 417-446.
Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in
removing bias in observational studies. Biometrics, 295-313.
Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B.,
& Wynder, E. L. (1959). Smoking and lung cancer: Recent evidence and a
Kamalian et al., Iranian Journal of Economic Studies, 9(1) 2020, 181-212 199
discussion of some questions. Journal of the National Cancer
Institute, 22(1), 173-203.
Dehejia, R. H., & Wahba, S. (2002). Propensity score matching methods for
nonexperimental causal studies. Review of Economics and Statistics, 84(1),
151-161.
Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal
effects: A general multivariate matching method for achieving balance in
observational studies. Review of Economics and Statistics, 95(3), 932-945.
Efron, B. (1992). Bootstrap methods: Another look at the jackknife.
In Breakthroughs in Statistics (pp. 569-593). Springer, New York, NY.
Fabra, N., von der Fehr, N. H., & Harbord, D. (2002). Designing electricity
auctions: Uniform, discriminatory, and Vickrey. preprint.
Federico, G., & Rahman, D. (2003). Bidding in an electricity pay-as-bid
auction. Journal of Regulatory Economics, 24(2), 175-211.
Greenland, S., Robins, J. M., & Pearl, J. (1999). Confounding and collapsibility
in causal inference. Statistical Science, 29-46.
Greenwood, E. (1945). Experimental sociology: A study in method. King's Crown
Press.
Hansen, B. B. (2008). The essential role of balance tests in propensity-matched
observational studies: Comments on ‘A critical appraisal of propensity-score
matching in the medical literature between 1996 and 2003’ by Peter Austin,
statistics in medicine.
Hansen, B. B. (2008). The prognostic analogue of the propensity
score. Biometrika, 95(2), 481-488.
Harder, V. S., Stuart, E. A., & Anthony, J. C. (2010). Propensity score techniques
and the assessment of measured covariate balance to test causal associations
in psychological research. Psychological Methods, 15(3), 234.
Heckman, J. J., Ichimura, H., & Todd, P. (1998). Matching as an econometric
evaluation estimator. The Review of Economic Studies, 65(2), 261-294.
Heckman, J. J., Ichimura, H., & Todd, P. E. (1997). Matching as an econometric
evaluation estimator: Evidence from evaluating a job training
programme. The Review of Economic Studies, 64(4), 605-654.
Hill, J. L., Rubin, D. B., & Thomas, N. (2000). The design of the New York school
choice scholarship program evaluation. Validity and Social Experimentation:
Donald Campbell’s legacy, 1, 155-180.
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as nonparametric
preprocessing for reducing model dependence in parametric causal
inference. Political Analysis, 15(3), 199-236.
Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between
experimentalists and observationalists about causal inference. Journal of the
Royal Statistical Society: Series A (Statistics in Society), 171(2), 481-502.
Imbens, G. W. (2004). Nonparametric estimation of average treatment effects
under exogeneity: A review. Review of Economics and Statistics, 86(1), 4-
29.
200 Kamalian et al., Iranian Journal of Economic Studies, 9(1) 2020, 181-212
Keshavarz Haddad, Gh.R. (2018). Micro econometrics and policy evaluation.
Ney.
Lechner, M. (2002). Some practical issues in the evaluation of heterogeneous
labour market programmes by matching methods. Journal of the Royal
Statistical Society: Series A (Statistics in Society), 165(1), 59-82.
Lee, M. J. (2005). Micro-econometrics for policy, program, and treatment effects.
Oxford University Press on Demand.
Mammen, E. (1993). Bootstrap and wild bootstrap for high-dimensional linear
models. The Annals of Statistics, 255-285.
Otsu, T., & Rai, Y. (2017). Bootstrap inference of matching estimators for average
treatment effects. Journal of the American Statistical Association, 112(520),
1720-1732.
Pingel, R. (2018). Estimating the variance of a propensity score matching
estimator for the average treatment effect. Observational Studies, 4, 71-96.
Politis, D. N., & Romano, J. P. (1994). Large sample confidence regions based on
subsamples under minimal assumptions. The Annals of Statistics, 2031-2050.
Ren, Y. (2001). A comparison of pool cost and consumer payment minimization
in electricity markets (Doctoral dissertation, McGill University Libraries).
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score
in observational studies for causal effects. Biometrika, 70(1), 41-55.
Rubin, D. B. (2007). The design versus the analysis of observational studies for
causal effects: Parallels with the design of randomized trials. Statistics in
Medicine, 26(1), 20-36.
Rubin, D. B., & Stuart, E. A. (2006). Affinely invariant matching methods with
discriminant mixtures of proportional ellipsoidally symmetric
distributions. The Annals of Statistics, 34(4), 1814-1826.
Rubin, D. B., & Thomas, N. (1992). Affinely invariant matching methods with
ellipsoidal distributions. The Annals of Statistics, 1079-1093.
Schafer, J. L., & Kang, J. (2008). Average causal effects from nonrandomized
studies: A practical guide and simulated example. Psychological
Methods, 13(4), 279.
Sekhon, J. S. (2007). Multivariate and propensity score matching software for
causal inference.
Song, J., Belin, T. R., Lee, M. B., Gao, X., & Rotheram-Borus, M. J. (2001).
Handling baseline differences and missing items in a longitudinal study of
HIV risk among runaway youths. Health Services and Outcomes Research
Methodology, 2(3-4), 317-329.
Stuart, E. A., & Green, K. M. (2008). Using full matching to estimate causal
effects in nonexperimental studies: Examining the relationship between
adolescent marijuana use and adult outcomes. Developmental
Psychology, 44(2), 395.
Stuart, E. A. (2008). Developing practical recommendations for the use of
propensity scores: Discussion of ‘A critical appraisal of propensity score
Kamalian et al., Iranian Journal of Economic Studies, 9(1) 2020, 181-212 201
matching in the medical literature between 1996 and 2003’ by Peter Austin,
statistics in medicine. Statistics in Medicine, 27(12), 2062-2065.
Stuart, E. A., & Lalongo, N. S. (2010). Matching methods for the selection of
participants for follow-up. Multivariate Behavioral Research, 45(4), 746-
765.
Stuart, E. A. (2010). Matching methods for causal inference: A review and a look
forward. Statistical Science: A Review Journal of the Institute of
Mathematical Statistics, 25(1), 1.
Tahmasbi, R. and Rezaei, S. (2012). Statistical simulation, Professor Hesabi.
Tayyebi, S. K., Kamalian, A.R., Sarkhosh Sara, A., and Mobini_Dehkordi, M.
(2019). Analyzing the effects of globalization on the government budget
deficit: The matching approach. Economics and Modelling, 10 (1), 65-96.
Tibshirani, R. J., & Efron, B. (1993). An introduction to the
bootstrap. Monographs on Statistics and Applied Probability, 57, 1-436.
Wacholder, S., & Weinberg, C. R. (1982). Paired versus two-sample design for a
clinical trial of treatments with the dichotomous outcome: Power
considerations. Biometrics, 801-812.
Weitzen, S., Lapane, K. L., Toledano, A. Y., Hume, A. L., & Mor, V. (2004).
Principles for modeling propensity scores in medical research: A systematic
literature review. Pharmacoepidemiology and Drug Safety, 13(12), 841-853.
Wu, C. F. J. (1986). Jackknife, bootstrap, and other resampling methods in
regression analysis. The Annals of Statistics, 14(4), 1261-1295.
Yu, C. H. (2002). Resampling methods: Concepts, applications, and
justification. Practical Assessment, Research, and Evaluation, 8(1), 19.
Zhao, Z. (2004). Using matching to estimate treatment effects: Data requirements,
matching metrics, and Monte Carlo evidence. Review of Economics and
Statistics, 86(1), 91-107.

Comparing Different Methods of Estimating the Variance of Propensity Score Matching Estimator

References

References

Volume 9, Issue 1 - Serial Number 17March 2020Pages 181-212

Volume 9, Issue 1 - Serial Number 17
March 2020
Pages 181-212