To the extent that the;[P-ample size allows it, we will try that our matches are limited to students within the same schools. This will be done to control for school characteristics such as school climate, principal, and even neighborhood. Question 2: Is there an effect of the “Coding for the future” program on girl’s self-esteem and self- efficacy in middle school? For this question, we have several occasions of measure for our outcome variables, which correspond to outcome 2. Although our treatment and control groups remain the same, we will be employing a difference in difference approach. This is a rather stronger approach than the PSM because it allows us to control for non-observable characteristics that are stable across time. Although we are assuming our cohorts are similar, even if we admit there is a potential difference in non-observable characteristics that could drive differences in our outcome variable, we expect that the trends are similar across time. Therefore, a change in trends would be due to the treatment – in this case, the “Coding for the Future” program. The equation we will estimate is the following:
𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑡𝑡 + 𝛽𝛽2𝑐𝑐𝑐𝑐𝑐𝑐 + 𝛽𝛽3(𝑡𝑡 ∗ 𝑐𝑐𝑐𝑐𝑐𝑐) + 𝐵𝐵𝑋𝑋′ + 𝜖𝜖𝑖𝑖 Where 𝑦𝑦 represents either the self-efficacy or self-esteem outcome for each participant, 𝛽𝛽0 is the intercept, 𝑡𝑡 is time, which takes the value of 0 if the measures come from 4th grade and 1 if the measures come from 6th grade, 𝑐𝑐𝑐𝑐𝑐𝑐 is the treatment variable which represents the participation in the program (also a dummy variable), and our interest parameter is 𝛽𝛽3. 𝐵𝐵𝑋𝑋′ is a matrix of the control variables mentioned in the previous section and 𝜖𝜖𝑖𝑖 is the error term. A positive statistically significant value of 𝛽𝛽3 would mean that the program had a positive effect in the measures of self-efficacy and self-esteem for participating girls in the year the program took place. It is important to note that although our control and treatment groups answer questionnaires in different years, we are not incorporating the year of response into the analysis. Rather, we are using the grade they were in as the common characteristic that makes them comparable.
Robustness check – fixed effects at the school level. In order to control for school characteristics such as school climate, principal, and even neighborhood, we will also estimate the same equation including a fixed effects term 𝜆𝜆𝑠𝑠 for schools :
𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑡𝑡 + 𝛽𝛽2𝑐𝑐𝑐𝑐𝑐𝑐 + 𝛽𝛽3(𝑡𝑡 ∗ 𝑐𝑐𝑐𝑐𝑐𝑐) + 𝐵𝐵𝑋𝑋′ + 𝜆𝜆𝑠𝑠 + 𝜖𝜖𝑖𝑖 Question 3: To what extent does the relationship with self-esteem and self-efficacy persist across time? For this question, we will follow a similar strategy to the one used in the previous question. The only difference will be that of the time variable. Instead of taking the value of 0 when they were in 4th grade and 1 in 6th grade, it will take the value of 1 in their freshman year in college. This is due to the fact that we want to look at the long-term effects of the program. We will also conduct the robustness check using fixed effects at the school level. Question 4: How do these relationships vary by race? For choice of STEM degree. Since this is a non-parametric estimation, we will conduct the same analysis within subsamples. Specifically, for White, Black, Hispanic, and Asian women. We will first filter the data to keep all girls who mainly identify with each race, and then conduct the PSM analysis for that group. This way, we will compare Black women who participated in the program with Black women who did not participate (and from each racial group) without averaging out the effect. For self-efficacy and self-esteem. Although we can also filter the sample and use the same equations for the analysis, we can include interaction terms with race dummies in the estimation equations to include racial considerations. This would make the equations look like this:
𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1𝑡𝑡 + 𝛽𝛽2𝑐𝑐𝑐𝑐𝑐𝑐 + 𝛽𝛽3(𝑡𝑡 ∗ 𝑐𝑐𝑐𝑐𝑐𝑐) + 𝛽𝛽4𝑐𝑐𝑐𝑐𝑐𝑐 ∗ 𝐵𝐵𝐵𝐵𝐵𝐵𝑐𝑐𝐵𝐵 + 𝛽𝛽5 ∗ 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐵𝐵𝐻𝐻𝐻𝐻𝑐𝑐 + 𝐵𝐵𝑋𝑋′ + 𝜆𝜆𝑠𝑠 + 𝜖𝜖𝑖𝑖 The interaction of treatment and the race dummies will allow us to interpret the specific isolated effects for each race in our non-cognitive outcomes, both in the short and long terms.