Your task is to produce a report that effectively includes data visualizations to make your points. Imagine it is 2001 and you are a consultant for the VA lottery. They are interested in lottery sales across the counties of VA, and are interested in how much sales vary from county to county, what that variation looks like, and what factors could explain the differences in lottery sales across counties. In the initial meeting, someone mentions that they are concerned about the negative revenue impact that may result from North Carolina adopting a lottery (currently no lottery sales occur in North Carolina).
For this initial report, you are tasked with:
1. Describe the data you have assembled and what the key general descriptive statistics are for the data (highlight what, in your opinion, are the most important ones – you do not have to include everything).
2. Describe the differences in lottery sales between counties. What are the key insights that emerge from the data?
3. Potentially, what could be the important county factors that contribute to more or less lottery sales?
4. Are there any initial indications that the concern about North Carolina moving to a lottery is justified?
Raw data is in the excel file
Data description for the excel file The data consists of information on 135 Virginia counties for the year 2000 (hence there are 135 rows of data). I collected this data with the primary purpose of understanding the differences in the sales of lottery tickets between counties in VA. As such, I have assembled data (information) on each county that may help in understanding variations in lottery sales between the counties. This dataset is an example of cross sectional data: I have data for across many counties for 1 time period (the year 2000).
Below is a list of all the variable definitions (in order):
County: Name of county
Population: Census population of county, number of people
%under18: Percent of county population that is under the age of 18
%white: percent of the county population that is white
%hsgrad: percent of the population aged 25 or older that has a HS diploma
%coll: percent of the population aged 25 or older that has a college diploma
Traveltime: average travel time to work for residents in that county, in minutes
Percapinc: average individual income in the county, in dollars
Metarea: equals 1 if the county is defined as a metro area, 0 otherwise
Pop18: number of residents in county 18 years of age or older
%Nonwhite: percent of county population that is not white
Retail: Number of retail establishments that sell lottery tickets
Retailpercap: (Retail/Pop18)*1000
Scratcher: Total sales in dollars of scratcher lottery for county
Kicker: Total sales in dollars of kicker lottery for county
Lotto: Total sales in dollars of Lotto lottery for county
Biggame: Total sales in dollars of Big Game lottery for county
Pick 3: Total sales in dollars of Pick 3 lottery for county
Pick 4: Total sales in dollars of Pick 4 lottery for county
Cash 5: Total sales in dollars of Cash 5 lottery for county
Totalsales: Total lottery sales for county in dollars
Totcap: Totalsales/pop18
Scrrcap: Scratcher/pop18
Kickcap: Kicker/pop18
Lottcap: Lotto/pop18
Bigcap: Biggame/pop18
P3cap: Pick3/pop18
P4cap: Pick4/pop18
C5cap: Cash5/pop18
Ten: equals 1 if the county borders Tennessee, 0 otherwise
Ken: equals 1 if the county borders Kentucky, 0 otherwise
WV: equals 1 if the county borders West Virginia, 0 otherwise
NC: equals 1 if the county borders North Carolina, 0 otherwise
MY: equals 1 if the county borders Maryland, 0 otherwise
DC: equals 1 if the county borders DC, 0 otherwise