missing data sheet

The Critical Need to Address Missing Data in HMDA

The single best tool we have for assessing access to home mortgage credit is in danger. One in every four home loan records now has no borrower demographic data attached to it.

For policymakers, researchers, and civil rights enforcers, this means that it is difficult to know for sure if the things you think you see in the data are truly there.

The share of home purchase loans going to Black and Hispanic borrowers since 2020 has risen slightly, as noted in NCRC’s preliminary analysis of 2021 Home Mortgage Disclosure Act data (HMDA). But due to the increase in the share of loans lacking demographic data it is difficult to understand if these gains are real, or if the declining quality of the dataset is creating a false impression of progress.

In 2020, as White and Asian borrowers surged into the market to take advantage of low interest rates to refinance their homes, the share of loan applications[1] with a Black or Hispanic applicant fell sharply. In 2021 those losses were generally reversed. Since 2018 Black and Hispanic applicants have seen only marginal changes in their share of the housing market and remain severely underserved by the mortgage industry when compared to their population size.

However, there has simultaneously been an upward creep in loan records reported without demographic information. This rising share of “No Data” loans since 2018 may be creating a false impression. Loans initiated online do not require the lender to submit demographic information unless the applicant offers it. Lenders are allowed to delete ethnicity, race, sex, age, and income data information on loan records that they purchase from other institutions, such as correspondent mortgage lenders. If correspondent or online mortgage lending favors specific racial groups, HMDA data would not necessarily reflect such a skew. These “No Data” loan records could make it appear that underserved groups are doing better when in fact they may not be.

This is no mere blemish or scuff in our national records. Some 5.2 million loan records lack demographic information in the newest 2021 HMDA dataset. That is 23% of the entire dataset. The demographic blindspot in HMDA now obscures nearly one in every four loan records nationwide. Nor is this lack of data a merely academic concern: HMDA is a critical dataset, underpinning a broad array of housing policy and legislative efforts.

This is also not the first time the issue of missing data has been flagged as a problem. As far back as 2001 Jason Dietrich, the current Chief of the CFPB’s Compliance Analytics and Policy team, wrote while at the OCC that missing demographic data in HMDA was “…correlated with income, loan amount, and action taken on the loan.”

“Given these concerns, what can be done? Above all, regulators must recognize and address the problem.” – Jason Dietrich OCC 2001

A more detailed statistical analysis is required, but we can explore the descriptive data and notice interesting trends.

Applications lacking race and ethnicity are highly likely to track very closely to the weighted average of applications with such data, suggesting that the racial composition of No Data loans is similar to that of all loan records.

But loans in HMDA’s blindspot do show one glaring difference to the rest of the dataset. Borrower income reported on these applications is by far the highest. At $180,000 the borrower income on loans without racial data exceeds the second highest group, Asian borrowers at $157,000. There is also a modest difference in the rate spread of No Data loans. Rate spread, the difference between the interest rate at closing and the average interest rate offered on that day, for No Data applications was 34 basis points compared to the average figure of 43 basis points. The rate spread and income differences may indicate that No Data loans in fact include a higher share of White and Asian borrowers than the rest of the loan records. If so, our interpretation of the market share of Black and Hispanic applicants may be incorrect.

Compared with historical data from the OCC, current data shows some dramatic changes.

Due to changes in reporting practices and regulations it is not possible to directly compare HMDA data from 1999 with current lending. But it does suggest some possible trends. The percentage of No Data loans for conventional home purchase loans has remained stable. Home Improvement and Refinance loans, where banks and credit unions tend to play larger roles, have improved greatly. However, FHA and VA home purchases have seen increases of 16.2 and 11.7 percentage points respectively, in No Data applications.

Loan purchases, where the lender buys their loans from other lenders, lacked data on over 80% of the records submitted in 2021.

When reviewing individual lenders it is clear that some outperform their peers on collecting demographic information.

While a certain percentage of borrowers will choose to withhold this information, lenders are required to input the data themselves on the “basis of visual observation or surname.” The exceptions to this rule are if the loan was initiated online or if the lender purchased the loan from another lender. In the latter case, they are permitted to remove this data. Virtually all of the major loan purchasers choose to throw away this data each year.

Meanwhile, a 2018 revision to HMDA that allowed applicants who are Hispanic, Asian, Hawaiian or Pacific Islanders to report more precisely on their country or place of origin, is popular with applicants. Among eligible 2021 applicants, 65% utilized these optional fields to specify their identity – indicating that most applicants are willing to share this information when given the chance. The resulting dataset offered a more nuanced and complicated view of how these different communities do in the mortgage market today. This shows that when they are given the chance, most applicants are willing to share this information. This suggests that when lenders fail to collect demographic data on a large percentage of their applicants that is more likely due to flaws in how they ask for such data, not an unwillingness on the part of applicants to share it.

The CFPB should expand and enforce collection of disaggregated racial and ethnic data. The CFPB should also consider further expanding the kind of demographic data collected to reflect modern understandings of the vectors that discrimination can take in the loan process. To that end, the agency should require that Sexual Orientation and Gender Identity (SOGI) data be collected by lenders, as our work has shown this also may impact access to credit. Without collecting this kind of data it is difficult to understand how an applicant’s orientation may affect their ability to become a homeowner.

HMDA is a critical tool used by policymakers, stakeholders, community groups, local and state governments, and the public. The information gathered based on the race and ethnicity of the borrowers is essential to inform community leaders as well as regulators and policy makers focused on issues such as redlininggentrification and the racial wealth gap.  The CFPB should take action now to reverse this damaging trend of missing data and convene relevant stakeholders to discuss best practices for the collection of this data – before the share of US home lending that escapes all demographic scrutiny rises even higher than one in four loans.

[1] Unless otherwise noted, this blog looks at forward loans on site built, owner occupied, 1-4 unit homes.

Jason Richardson, Senior Director of Research, NCRC
Fabio Balbi / Adobe Stock


Print Friendly, PDF & Email
Scroll to Top