Home | Current | Past volumes | About | Login | Notify | Contact | Search
 Statistics Surveys > Vol. 3 (2009) open journal systems 


Causal inference in statistics: An overview

Judea Pearl, UCLA, Computer Science Department


Abstract
This review presents empirical researchers with recent advances in causal inference, and stresses the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called "causal effects" or "policy evaluation") (2) queries about probabilities of counterfactuals, (including assessment of "regret," "attribution" or "causes of effects") and (3) queries about direct and indirect effects (also known as "mediation"). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.

Keywords: Structural equation models, confounding, graphical methods, counterfactuals, causal effects, potential-outcome, mediation, policy evaluation, causes of effects.

Creative Common LOGO

Full Text: PDF


Pearl, Judea, Causal inference in statistics: An overview, Statistics Surveys, 3, (2009), 96-146 (electronic). DOI: 10.1214/09-SS057.

References

   Angrist, J. and Imbens, G. (1991). Source of identifying information in evaluation models. Tech. Rep. Discussion Paper 1568, Department of Economics, Harvard University, Cambridge, MA.

   Angrist, J., Imbens, G. and Rubin, D. (1996). Identification of causal effects using instrumental variables (with comments). Journal of the American Statistical Association 91 444–472.

   Arah, O. (2008). The role of causal reasoning in understanding Simpson’s paradox, Lord’s paradox, and the suppression effect: Covariate selection in the analysis of observational studies. Emerging Themes in Epidemiology 4 doi:10.1186/1742–7622–5–5. Online at http://www.ete-online.com/content/5/1/5.

   Arjas, E. and Parner, J. (2004). Causal reasoning from longitudinal data. Scandinavian Journal of Statistics 31 171–187. MR2066247

   Avin, C., Shpitser, I. and Pearl, J. (2005). Identifiability of path-specific effects. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence IJCAI-05. Morgan-Kaufmann Publishers, Edinburgh, UK. MR2192340

   Balke, A. and Pearl, J. (1995). Counterfactuals and policy analysis in structural models. In Uncertainty in Artificial Intelligence 11 (P. Besnard and S. Hanks, eds.). Morgan Kaufmann, San Francisco, 11–18. MR1615008

   Balke, A. and Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association 92 1172–1176.

   Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital data. Biometrics Bulletin 2 47–53.

   Bishop, Y., Fienberg, S. and Holland, P. (1975). Discrete multivariate analysis: theory and practice. MIT Press, Cambridge, MA. MR0381130

   Blyth, C. (1972). On Simpson’s paradox and the sure-thing principle. Journal of the American Statistical Association 67 364–366. MR0314156

   Bollen, K. (1989). Structural Equations with Latent Variables. John Wiley, New York. MR0996025

   Bonet, B. (2001). Instrumentality tests revisited. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 48–55.

   Bowden, R. and Turkington, D. (1984). Instrumental Variables. Cambridge University Press, Cambridge, England. MR0798790

   Brent, R. and Lok, L. (2005). A fishing buddy for hypothesis generators. Science 308 523–529.

   Cai, Z. and Kuroki, M. (2006). Variance estimators for three ‘probabilities of causation’. Risk Analysis 25 1611–1620.

   Chalak, K. and White, H. (2006). An extended class of instrumental variables for the estimation of causal effects. Tech. Rep. Discussion Paper, UCSD, Department of Economics.

   Chen, A., Bengtsson, T. and Ho, T. (2009). A regression paradox for linear models: Sufficient conditions and relation to Simpson’s paradox. The American Statistician 63 218–225.

   Chickering, D. and Pearl, J. (1997). A clinician’s tool for analyzing non-compliance. Computing Science and Statistics 29 424–431. MR1601275

   Cole, P. (1997). Causality in epidemiology, health policy, and law. Journal of Marketing Research 27 10279–10285.

   Cole, S. and Hernรกn, M. (2002). Fallibility in estimating direct effects. International Journal of Epidemiology 31 163–165.

   Cox, D. (1958). The Planning of Experiments. John Wiley and Sons, NY. MR0095561

   Cox, D. and Wermuth, N. (2003). A general condition for avoiding effect reversal after marginalization. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 65 937–941. MR2017879

   Cox, D. and Wermuth, N. (2004). Causality: A statistical view. International Statistical Review 72 285–305.

   Dawid, A. (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society, Series B 41 1–31. MR0535541

   Dawid, A. (2000). Causal inference without counterfactuals (with comments and rejoinder). Journal of the American Statistical Association 95 407–448. MR1803167

   Dawid, A. (2002). Influence diagrams for causal modelling and inference. International Statistical Review 70 161–189.

   DeFinetti, B. (1974). Theory of Probability: A Critical Introductory Treatment. Wiley, London. 2 volumes. Translated by A. Machi and A. Smith.

   Duncan, O. (1975). Introduction to Structural Equation Models. Academic Press, New York. MR0398558

   Eells, E. (1991). Probabilistic Causality. Cambridge University Press, Cambridge, MA. MR1120269

   Frangakis, C. and Rubin, D. (2002). Principal stratification in causal inference. Biometrics 1 21–29. MR1891039

   Glymour, M. and Greenland, S. (2008). Causal diagrams. In Modern Epidemiology (K. Rothman, S. Greenland and T. Lash, eds.), 3rd ed. Lippincott Williams & Wilkins, Philadelphia, PA, 183–209.

   Goldberger, A. (1972). Structural equation models in the social sciences. Econometrica: Journal of the Econometric Society 40 979–1001. MR0327267

   Goldberger, A. (1973). Structural equation models: An overview. In Structural Equation Models in the Social Sciences (A. Goldberger and O. Duncan, eds.). Seminar Press, New York, NY, 1–18.

   Good, I. and Mittal, Y. (1987). The amalgamation and geometry of two-by-two contingency tables. The Annals of Statistics 15 694–711. MR0888434

   Greenland, S. (1999). Relation of probability of causation, relative risk, and doubling dose: A methodologic error that has become a social problem. American Journal of Public Health 89 1166–1169.

   Greenland, S., Pearl, J. and Robins, J. (1999). Causal diagrams for epidemiologic research. Epidemiology 10 37–48.

   Greenland, S. and Robins, J. (1986). Identifiability, exchangeability, and epidemiological confounding. International Journal of Epidemiology 15 413–419.

   Haavelmo, T. (1943). The statistical implications of a system of simultaneous equations. Econometrica 11 1–12. Reprinted in D.F. Hendry and M.S. Morgan (Eds.), The Foundations of Econometric Analysis, Cambridge University Press, 477–490, 1995. MR0007954

   Hafeman, D. and Schwartz, S. (2009). Opening the black box: A motivation for the assessment of mediation. International Journal of Epidemiology 3 838–845.

   Heckman, J. (1992). Randomization and social policy evaluation. In Evaluations: Welfare and Training Programs (C. Manski and I. Garfinkle, eds.). Harvard University Press, Cambridge, MA, 201–230.

   Heckman, J. (2008). Econometric causality. International Statistical Review 76 1–27.

   Heckman, J. and Navarro-Lozano, S. (2004). Using matching, instrumental variables, and control functions to estimate economic choice models. The Review of Economics and Statistics 86 30–57.

   Heckman, J. and Vytlacil, E. (2005). Structural equations, treatment effects and econometric policy evaluation. Econometrica 73 669–738. MR2135141

   Holland, P. (1988). Causal inference, path analysis, and recursive structural equations models. In Sociological Methodology (C. Clogg, ed.). American Sociological Association, Washington, D.C., 449–484.

   Hurwicz, L. (1950). Generalization of the concept of identification. In Statistical Inference in Dynamic Economic Models (T. Koopmans, ed.). Cowles Commission, Monograph 10, Wiley, New York, 245–257. MR0038640

   Imai, K., Keele, L. and Yamamoto, T. (2008). Identification, inference, and sensitivity analysis for causal mediation effects. Tech. rep., Department of Politics, Princton University.

   Imbens, G. and Wooldridge, J. (2009). Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47.

   Kiiveri, H., Speed, T. and Carlin, J. (1984). Recursive causal models. Journal of Australian Math Society 36 30–52. MR0719999

   Koopmans, T. (1953). Identification problems in econometric model construction. In Studies in Econometric Method (W. Hood and T. Koopmans, eds.). Wiley, New York, 27–48. MR0061358

   Kuroki, M. and Miyakawa, M. (1999). Identifiability criteria for causal effects of joint interventions. Journal of the Royal Statistical Society 29 105–117. MR1765187

   Lauritzen, S. (1996). Graphical Models. Clarendon Press, Oxford. MR1419991

   Lauritzen, S. (2001). Causal inference from graphical models. In Complex Stochastic Systems (D. Cox and C. Kluppelberg, eds.). Chapman and Hall/CRC Press, Boca Raton, FL, 63–107. MR1893411

   Lindley, D. (2002). Seeing and doing: The concept of causation. International Statistical Review 70 191–214.

   Lindley, D. and Novick, M. (1981). The role of exchangeability in inference. The Annals of Statistics 9 45–58. MR0600531

   MacKinnon, D., Fairchild, A. and Fritz, M. (2007). Mediation analysis. Annual Review of Psychology 58 593–614.

   Manski, C. (1990). Nonparametric bounds on treatment effects. American Economic Review, Papers and Proceedings 80 319–323.

   Marschak, J. (1950). Statistical inference in economics. In Statistical Inference in Dynamic Economic Models (T. Koopmans, ed.). Wiley, New York, 1–50. Cowles Commission for Research in Economics, Monograph 10.

   Meek, C. and Glymour, C. (1994). Conditioning and intervening. British Journal of Philosophy Science 45 1001–1021.

   Miettinen, O. (1974). Proportion of disease caused or prevented by a given exposure, trait, or intervention. Journal of Epidemiology 99 325–332.

   Morgan, S. and Winship, C. (2007). Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). Cambridge University Press, New York, NY.

   Mortensen, L., Diderichsen, F., Smith, G. and Andersen, A. (2009). The social gradient in birthweight at term: quantification of the mediating role of maternal smoking and body mass index. Human Reproduction To appear, doi:10.1093/humrep/dep211.

   Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science 5 465–480. MR1092986

   Pavlides, M. and Perlman, M. (2009). How likely is Simpson’s paradox? The American Statistician 63 226–233.

   Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA. MR0965765

   Pearl, J. (1993a). Comment: Graphical models, causality, and intervention. Statistical Science 8 266–269.

   Pearl, J. (1993b). Mediating instrumental variables. Tech. Rep. TR-210, http://ftp.cs.ucla.edu/pub/stat_ser/R210.pdf, Department of Computer Science, University of California, Los Angeles.

   Pearl, J. (1995a). Causal diagrams for empirical research. Biometrika 82 669–710. MR1380809

   Pearl, J. (1995b). On the testability of causal models with latent and instrumental variables. In Uncertainty in Artificial Intelligence 11 (P. Besnard and S. Hanks, eds.). Morgan Kaufmann, San Francisco, CA, 435–443. MR1615027

   Pearl, J. (1998). Graphs, causality, and structural equation models. Sociological Methods and Research 27 226–284.

   Pearl, J. (2000a). Causality: Models, Reasoning, and Inference. Cambridge University Press, New York. 2nd edition, 2009. MR1744773

   Pearl, J. (2000b). Comment on A.P. Dawid’s, Causal inference without counterfactuals. Journal of the American Statistical Association 95 428–431.

   Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 411–420.

   Pearl, J. (2003). Statistics and causal inference: A review. Test Journal 12 281–345. MR2044313

   Pearl, J. (2005). Direct and indirect effects. In Proceedings of the American Statistical Association, Joint Statistical Meetings. MIRA Digital Publishing, Minn., MN, 1572–1581.

   Pearl, J. (2009a). Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge University Press, New York. MR1744773

   Pearl, J. (2009b). Letter to the editor: Remarks on the method of propensity scores. Statistics in Medicine 28 1415–1416. http://ftp.cs.ucla.edu/pub/stat_ser/r345-sim.pdf.

   Pearl, J. (2009c). Myth, confusion, and science in causal analysis. Tech. Rep. R-348, University of California, Los Angeles, CA. http://ftp.cs.ucla.edu/pub/stat_ser/r348.pdf.

   Pearl, J. and Paz, A. (2009). Confounding equivalence in observational studies. Tech. Rep. TR-343, University of California, Los Angeles, CA. http://ftp.cs.ucla.edu/pub/stat_ser/r343.pdf.

   Pearl, J. and Robins, J. (1995). Probabilistic evaluation of sequential plans from causal models with hidden variables. In Uncertainty in Artificial Intelligence 11 (P. Besnard and S. Hanks, eds.). Morgan Kaufmann, San Francisco, 444–453. MR1615028

   Pearl, J. and Verma, T. (1991). A theory of inferred causation. In Principles of Knowledge Representation and Reasoning: Proceedings of the Second International Conference (J. Allen, R. Fikes and E. Sandewall, eds.). Morgan Kaufmann, San Mateo, CA, 441–452. MR1142173

   Pearson, K., Lee, A. and Bramley-Moore, L. (1899). Genetic (reproductive) selection: Inheritance of fertility in man. Philosophical Transactions of the Royal Society A 73 534–539.

   Petersen, M., Sinisi, S. and van der Laan, M. (2006). Estimation of direct causal effects. Epidemiology 17 276–284.

   Robertson, D. (1997). The common sense of cause in fact. Texas Law Review 75 1765–1800.

   Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period – applications to control of the healthy workers survivor effect. Mathematical Modeling 7 1393–1512. MR0877758

   Robins, J. (1987). A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. Journal of Chronic Diseases 40 139S–161S.

   Robins, J. (1989). The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. In Health Service Research Methodology: A Focus on AIDS (L. Sechrest, H. Freeman and A. Mulley, eds.). NCHSR, U.S. Public Health Service, Washington, D.C., 113–159.

   Robins, J. (1999). Testing and estimation of directed effects by reparameterizing directed acyclic with structural nested models. In Computation, Causation, and Discovery (C. Glymour and G. Cooper, eds.). AAAI/MIT Press, Cambridge, MA, 349–405. MR1696459

   Robins, J. (2001). Data, design, and background knowledge in etiologic inference. Epidemiology 12 313–320.

   Robins, J. and Greenland, S. (1989a). The probability of causation under a stochastic model for individual risk. Biometrics 45 1125–1138. MR1040629

   Robins, J. and Greenland, S. (1989b). Estimability and estimation of excess and etiologic fractions. Statistics in Medicine 8 845–859.

   Robins, J. and Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology 3 143–155.

   Rosenbaum, P. (2002). Observational Studies. 2nd ed. Springer-Verlag, New York. MR1899138

   Rosenbaum, P. and Rubin, D. (1983). The central role of propensity score in observational studies for causal effects. Biometrika 70 41–55. MR0742974

   Rothman, K. (1976). Causes. American Journal of Epidemiology 104 587–592.

   Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66 688–701.

   Rubin, D. (2004). Direct and indirect causal effects via potential outcomes. Scandinavian Journal of Statistics 31 161–170. MR2066246

   Rubin, D. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association 100 322–331. MR2166071

   Rubin, D. (2007). The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Statistics in Medicine 26 20–36. MR2312697

   Rubin, D. (2009). Author’s reply: Should observational studies be designed to allow lack of balance in covariate distributions across treatment group? Statistics in Medicine 28 1420–1423.

   Shpitser, I. and Pearl, J. (2006). Identification of conditional interventional distributions. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (R. Dechter and T. Richardson, eds.). AUAI Press, Corvallis, OR, 437–444.

   Shpitser, I. and Pearl, J. (2007). What counterfactuals can be tested. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence. AUAI Press, Vancouver, BC, Canada, 352–359. Also, Journal of Machine Learning Research, 9:1941–1979, 2008. MR2447308

   Shpitser, I. and Pearl, J. (2008). Dormant independence. In Proceedings of the Twenty-Third Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, 1081–1087.

   Shpitser, I. and Pearl, J. (2009). Effects of treatment on the treated: Identification and generalization. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, Montreal, Quebec.

   Shrier, I. (2009). Letter to the editor: Propensity scores. Statistics in Medicine 28 1317–1318. See also Pearl 2009 http://ftp.cs.ucla.edu/pub/stat_ser/r348.pdf.

   Shrout, P. and Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations. Psychological Methods 7 422–445.

   Simon, H. (1953). Causal ordering and identifiability. In Studies in Econometric Method (W. C. Hood and T. Koopmans, eds.). Wiley and Sons, Inc., New York, NY, 49–74. MR0061358

   Simon, H. and Rescher, N. (1966). Cause and counterfactual. Philosophy and Science 33 323–340.

   Simpson, E. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society, Series B 13 238–241. MR0051472

   Sobel, M. (1998). Causal inference in statistical models of the process of socioeconomic achievement. Sociological Methods & Research 27 318–348.

   Sobel, M. (2008). Identification of causal parameters in randomized studies with mediating variables. Journal of Educational and Behavioral Statistics 33 230–231.

   Spirtes, P., Glymour, C. and Scheines, R. (1993). Causation, Prediction, and Search. Springer-Verlag, New York. MR1227558

   Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search. 2nd ed. MIT Press, Cambridge, MA. MR1815675

   Stock, J. and Watson, M. (2003). Introduction to Econometrics. Addison Wesley, New York.

   Strotz, R. and Wold, H. (1960). Recursive versus nonrecursive systems: An attempt at synthesis. Econometrica 28 417–427. MR0120034

   Suppes, P. (1970). A Probabilistic Theory of Causality. North-Holland Publishing Co., Amsterdam. MR0465774

   Tian, J., Paz, A. and Pearl, J. (1998). Finding minimal separating sets. Tech. Rep. R-254, University of California, Los Angeles, CA.

   Tian, J. and Pearl, J. (2000). Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence 28 287–313. MR1797625

   Tian, J. and Pearl, J. (2002). A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence. AAAI Press/The MIT Press, Menlo Park, CA, 567–573.

   VanderWeele, T. (2009). Marginal structural models for the estimation of direct and indirect effects. Epidemiology 20 18–26.

   VanderWeele, T. and Robins, J. (2007). Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology 18 561–568.

   Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer Science+Business Media, Inc., New York, NY. MR2055670

   Wermuth, N. (1992). On block-recursive regression equations. Brazilian Journal of Probability and Statistics (with discussion) 6 1–56. MR1220428

   Wermuth, N. and Cox, D. (1993). Linear dependencies represented by chain graphs. Statistical Science 8 204–218. MR1243593

   Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. John Wiley, Chichester, England. MR1112133

   Woodward, J. (2003). Making Things Happen. Oxford University Press, New York, NY.

   Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge and London.

   Wooldridge, J. (2009). Should instrumental variables be used as matching variables? Tech. Rep. https://www.msu.edu/ ec/faculty/wooldridge/current%20research/treat1r6.pdf, Michigan State University, MI.

   Wright,S. (1921). Correlation and causation. Journal of Agricultural Research 20 557–585.

   Yule, G. (1903). Notes on the theory of association of attributes in statistics. Biometrika 2 121–134.




Home | Current | Past volumes | About | Login | Notify | Contact | Search

Statistics Surveys. ISSN: 1935-7516