This project gives you the opportunity to use statistics to investigate the relationship between Global Health and Global Wealth. Although, anecdotally, wealthier countries are expected to have healthier populations, there is significant variability surrounding that relationship. Attached are some data for you to use to investigate the relationship for yourself. Within the data you will find 22 randomly selected countries from around the world. For each country you will find a variety of economic and healthcare indicators (see the data dictionary for a definition of each variable). Truthfully, there is not one single factor that explains overall health in a selected location. In fact, we would expect many different variables to contribute to the general health of a population. However, for this assignment, you are to determine the indicator that you believe best predicts general health, as measured by life expectancy. You must use the data given to you.
1. Investigation Results – Using the given data, investigate the possible relationships between Life Expectancy as the response variable (y) and each of the three explanatory variables (x). For each x variable, you will complete the following steps. (Note: you will do parts a – d three times!) (60 points)
a. Construct a scatter diagram displaying the relationship between x and y. You can use graph paper and draw the diagram by hand, use your calculator and take a picture of the screen, or use the online site: https://www.desmos.com/, which will allow you to print or save your diagram. Whatever tool you use, be sure to choose an appropriate scale for the axes so that the relationship between the variables is easy to see.
b. Calculate and state the sample correlation coefficient r.
c. Describe the type of correlation, if any, and interpret the correlation in the context of the data.
d. Determine if the correlation is significant. Use α = 0.05 and show all steps of this process.
2. Inferences – Write at least one paragraph for each of the following questions. In the paragraph, you should explain as though you are talking to someone that is not in a statistics class. In other words, give details. A paragraph is 3-5 well-developed sentences. (40 points)
a. Now that you have investigated the relationships between y and each of the three different x variables, which explanatory variable (x) is the best predictor for Life Expectancy (y)? To defend your choice, discuss your investigation results for each (x, y) pairing from part 1, including correlation coefficients and their significance.
b. Calculate the regression equation for the relationship you determined to be the “best” in part a. What is the “rate of change” of your equation? Using this rate of change, describe the behavior of Life Expectancy (y) as a result of changes in the explanatory variable (x) that you chose.
c. Discuss the overall fitness of your regression line to the data set, by graphing the line on your scatter plot and by considering the correlation coefficient. Would you say that the explanatory variable you chose is reasonably predictive of Life Expectancy? Why or why not?
d. You may have noticed that the United States was not included in the data set you were given. Below are the relevant statistics for the United States.
|Life Expectancy (y)||78.5|
|GDP per capita (x)||59531.7|
|% Spending versus GDP (x)||17.17|
|Inverted Corruptions Score (x)||24|
Using the regression model that you calculated in part b, plug in the appropriate x-value for the United States to make a prediction of Life Expectancy. Also, calculate the residual value (difference between your prediction and the actual value). Does the United States seem to fit in your model or is it an outlier? Give the reason(s) for your answer.
e. Summarize your findings. If you were part of a United Nations team tasked with researching and making recommendations to member countries to improve life expectancy for their citizens, what would be your next steps?