Exploring a Health Expenditures Dataset

Stat 203 Lecture 26


Dr. Janssen


In this activity, you’ll explore a dataset from the Medical Expenditure Panel Survey (MEPS), conducted by the U.S. Agency of Health Research and Quality. You can read about the sample and data here.

You can download the data here.

We would like to develop a model that predicts whether a given person is likely to incur health expenditures in a given year. The response variables of interest are:

  • EXPENDIP: amount of expenditures for inpatient visits
  • EXPENDOP: amount of expenditures for outpatient visits

Note: these variables are not binary! Thus, you should create a new response variable for which a binomial GLM is appropriate.

Using what you know about binomial GLMs, you should perform an exploratory data analysis, and develop at least two models using different link functions, and compare them. Be sure to articulate which model is best, how you know, and which factors/covariates are significant.

Be prepared to share your work with the rest of the class.