Final Models for CPP 528

Overview

This last step in your project will walk you through adding data from two federal programs designed to help low-income

About the NMTC Program

The NMTC Program enables economically distressed communities to leverage private investment capital by providing investors with a federal tax credit. All NMTC investments must meet statutory qualifications for their investors to be able to claim the tax credit. The vast majority of NMTC investments are made within statutorily defined “Low-Income Communities.” Low-Income Communities are census tracts with a poverty rate of 20 percent or greater, or a median family income at or below 80 percent of the applicable area median family income. In addition to investments located in Low-Income Communities, investments can qualify for NMTCs by using other statutory provisions designed to target certain areas or populations, including provisions for Rural Counties, and Low-Income Targeted Populations.

Through the first 15 application rounds of the NMTC Program, the CDFI Fund has made 1,178 awards, allocating a total of $57.5 billion in tax credit authority to CDEs through a competitive application process.

data download website

About the LIHTC Program

The Low-Income Housing Tax Credit (LIHTC) is the most important resource for creating affordable housing in the United States today. The LIHTC database, created by HUD and available to the public since 1997, contains information on 47,511 projects and 3.13 million housing units placed in service between 1987 and 2017.

data download website

Evaluating Program Impact

Difference-in-Difference Model:

Back to the diff-in-diff option. It turns out there is a relatively easy fix to the challenge of estimating growth from two points in time. It is possible because of the magic of log functions.

In the regression context, logs change our interpretation of slopes from a one-unit change in X being associated with a B-unit change in Y, to a one-unit change in X being associated with a growth rate of B for Y.

So back to the home value problem. A is worth $200k and B $100k. They both grow at the same rate. Home A increases in value by $20k, and home B by $10k.

Once logged, however, note the important approximation:

log( A[t=2] ) - log( A[t=1] ) is approximately equal to ( (A2-A1)/A1 ) or the growth rate.

So if we log the home value in the diff-in-diff models then we can calculate growth rates as:

log(C2) - log(C1) = growth rate of comparison / secular market trend

log(T2) - log(T1) = growth rate of treatment

log(C1) = B0

log(C2) = B0 + B1

secular growth rate = log(C2) - log(C1) = (B0 + B1) - B0 = B1

B1 represents the defaul growht rate of home values for the comparison group.

The important coefficient, B3, would then represent the growth rate of the treatment group above the secular growth rate, or the growth rate necessary to generate a home value of T2 if we start from a value of T1.

y1 <- log1p( d$p.prof.00 )
y2 <- log1p( d$p.prof.10 )
treat <- as.numeric( d$num.nmtc > 0 )

d1 <- data.frame( y=y1, treat=treat, post=0 )
d2 <- data.frame( y=y2, treat=treat, post=1 )

d3 <- rbind( d1, d2 )

m <- lm( y ~ treat + post + treat*post, data=d3 )

summary( m )

## 
## Call:
## lm(formula = y ~ treat + post + treat * post, data = d3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4909 -0.2765  0.0445  0.3454  1.4522 
## 
## Coefficients:
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)  3.453182   0.002004 1723.440  < 2e-16 ***
## treat       -0.290276   0.013533  -21.450  < 2e-16 ***
## post         0.037723   0.002836   13.302  < 2e-16 ***
## treat:post   0.101500   0.019153    5.299 1.16e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4835 on 118873 degrees of freedom
##   (567 observations deleted due to missingness)
## Multiple R-squared:  0.007158,   Adjusted R-squared:  0.007133 
## F-statistic: 285.7 on 3 and 118873 DF,  p-value: < 2.2e-16

y1 <- log1p( d$p.unemp.00 )
y2 <- log1p( d$p.unemp.10 )
treat <- as.numeric( d$num.nmtc > 0 )

d1 <- data.frame( y=y1, treat=treat, post=0 )
d2 <- data.frame( y=y2, treat=treat, post=1 )

d3 <- rbind( d1, d2 )

m <- lm( y ~ treat + post + treat*post, data=d3 )

summary( m )

## 
## Call:
## lm(formula = y ~ treat + post + treat * post, data = d3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.58937 -0.36946 -0.02365  0.35673  2.84588 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.769236   0.002350  753.02   <2e-16 ***
## treat        0.649582   0.015870   40.93   <2e-16 ***
## post         0.489733   0.003325  147.28   <2e-16 ***
## treat:post  -0.319184   0.022460  -14.21   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.567 on 118887 degrees of freedom
##   (553 observations deleted due to missingness)
## Multiple R-squared:  0.1659, Adjusted R-squared:  0.1659 
## F-statistic:  7885 on 3 and 118887 DF,  p-value: < 2.2e-16

y1 <- log1p( d$p.col.edu.00 )
y2 <- log1p( d$p.col.edu.10 )
treat <- as.numeric( d$num.nmtc > 0 )

d1 <- data.frame( y=y1, treat=treat, post=0 )
d2 <- data.frame( y=y2, treat=treat, post=1 )

d3 <- rbind( d1, d2 )

m <- lm( y ~ treat + post + treat*post, data=d3 )

summary( m )

## 
## Call:
## lm(formula = y ~ treat + post + treat * post, data = d3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2034 -0.4699  0.0584  0.5530  2.0031 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.055909   0.002998 1019.37  < 2e-16 ***
## treat       -0.443917   0.020246  -21.93  < 2e-16 ***
## post         0.147524   0.004242   34.78  < 2e-16 ***
## treat:post   0.147002   0.028654    5.13  2.9e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7236 on 119006 degrees of freedom
##   (434 observations deleted due to missingness)
## Multiple R-squared:  0.01642,    Adjusted R-squared:  0.01639 
## F-statistic: 662.2 on 3 and 119006 DF,  p-value: < 2.2e-16

Report and interpret the main results from the models. Are the programs effective at catalyzing neighborhood improvement? We are assuming median home value captures the increased market demand that reflects the desirability of improvements.

ANSWER: Proportion Professionals: the the secular trend was -5.58% loss. For this program the treatment group grew 31.11 percentage points more than the baseline group.

Unemployment Rate: the the secular trend was 49.15% growth. For this program the treatment group decreased 30.78 percentage points faster than the baseline group.

Proportion College Educated: the the secular trend was 14.68% growth. For this program the treatment group increased 13.70 percentage points faster than the baseline group.

REFLECTION:

How can we test the parallel lines assumption in this model? We know that growth rates change significantly between periods. The market for urban homes from 1990-2000 looks very different from the market in 2000 to 2010.

I would say that we would have to look at the population we are examining. Are we looking at the community? We would be able to tell if the # of long time residents decreased, or if the reacial make up chages… Esentially we would be looking to see whether we have impacted to community member originally in the area, and did not cause a case of gentrification.

Analysis Created By: Ricky Duran For: CPP 528 - Data Sciences for Public Service III Created On: “April 27, 2020”