Rebuttal to personal privacy of HMDA in a world of big data

Anthony Yezer’s recent white paper, Personal Privacy of HMDA in a World of Big Data, is part of the banking industry’s continued effort to limit public access to data collected by federal agencies.[1] Yezer’s paper, which was funded by the Mortgage Bankers Association, fundamentally challenges data collection and reporting under the Home Mortgage Disclosure Act (HMDA). The paper attempts to do so by claiming that borrower privacy can be compromised by linking HMDA data with other publicly and commercially available datasets. The arguments in Yezer’s paper, however, rest on groundless assumptions and can be easily refuted.

The statutory purposes of HMDA include assessing whether lenders are meeting housing and credit needs and to enforce fair lending and anti-discrimination laws. Furthermore, HMDA data provides a means of evaluating access to credit in neighborhoods that were redlined by government and private finance and continue to be underserved. Redlining leads to long-term disinvestment in communities and creates structural barriers to credit access in neighborhoods. The burden of these barriers impacts not just the families that live in redlined neighborhoods but also local jurisdictions that must devote a portion of their tax base to revitalize those neighborhoods.

Dr. Yezer asserts that U.S. Census Bureau (USCB) standards should be utilized to assure anonymization of HMDA data, but he acknowledges that these standards are difficult to achieve and would also delete data at the neighborhood level. As a result, he recommends that HMDA data collection be abandoned in favor of commercially available datasets, which he asserts provide a more comprehensive view of the housing market. This recommendation, however, is flawed because it undermines HMDA’s statutory purposes.

Dr. Yezer fails to address how basic demographic data regarding borrowers would be collected, or how the data would be made publicly available. In contrast to HMDA, commercial databases do not collect race, ethnicity, gender, and income of borrowers. If commercial databases replaced HMDA, the statutory purposes of HMDA would not be achieved since the data could no longer be used to enforce fair lending laws and assess whether credit needs are being met for various groups of borrowers and neighborhoods. Dr. Yezer’s paper also ignores reality: As far as we are aware, HMDA data has never been used to breach someone’s privacy in the manner he describes. In contrast, commercial data providers and vendors suffer security breaches with alarming frequency. It would appear that despite Dr. Yezer’s arguments, actual identify thieves find HMDA to be of little interest.

Dr. Yezer’s recommendations would make data less available to the public and would make markets more opaque. At the same time, they would help corporations consolidate a monopoly on the availability of mortgage data. This would ultimately harm consumers and communities.

Dr. Yezer’s paper can be condensed to five major charges against current HMDA data reporting and its expansion:

Charge 1 – The current HMDA dataset can be readily linked with other datasets to achieve high levels of borrower identification or re-identification.

Charge 2 – Expansion of HMDA data to include indicators of creditworthiness will make the credit scoring of individuals publicly available, exposing them to cyber-crimes and identity theft.

Charge 3 – New HMDA data disclosures release information about borrowers that is commonly suppressed in other government datasets such as those of the U.S. Census Bureau (USCB).

Charge 4 – HMDA as it exists is not only prone to misuse, but is inaccurate and obsolete, providing misleading information and could/should be replaced by commercially available data.

Charge 5 – HMDA is a clear invasion of privacy and consumers must be given an explicit opt-out if HMDA data collection is continued

These charges are exaggerated and inaccurate, but since the lending industry is actively promoting Dr. Yezer’s paper, the charges require a rebuttal. The Mortgage Bankers Association and other trade associations have used the paper as part of their comments to the Consumer Financial Protection Bureau (CFPB) regarding the CFPB’s proposed policy guidance on the public dissemination of HMDA data.

Charge 1 – The current HMDA dataset can be readily linked with other datasets to achieve high levels of borrower identification or re-identification.

The first charge is that the agency established to protect consumers, the CFPB, is actually increasing their vulnerability to identity theft and other cyber-crimes through insufficient anonymization of the HMDA dataset. A scenario for the re-identification of borrowers is presented which relies on the linking of publicly available information from local tax assessment records and records of purchases and deeds.

Rebuttal: The issues of privacy protection and decreasing anonymity in the “Age of Big Data” are considerable problems due to identity theft and the criminal abuse of technology. In 2015, a data breach of the systems of the Office of Personnel Management (OPM) resulted in the theft of 21.5 million records of individuals who had undergone government background investigations. In 2017, a data breach of the credit reporting agency Equifax resulted in the release of 145.5 million credit records with sensitive personal information including Social Security numbers. Compared to these detailed sources of individual data, the HMDA data would be a poor choice for hackers seeking information for the purposes of fraud.

The financial services industry commonly exploits commercial datasets for marketing purposes. These reveal greater detail about home purchasers and borrowers than HMDA achieves. Additionally, the delayed release schedule of HMDA data, delayed by at least nine months, makes HMDA an unlikely source of exploitation when other, more current commercial datasets are readily available for misuse.

Misrepresentation of academic studies: Dr. Yezer cites academic and government agency research to establish that HMDA can be matched with databases such as county deed records to reveal borrower identities. These examples are misleading. First, this academic and government research was not focused on identifying borrowers. Second, the precise methodology of the studies is not discussed in detail. In particular, Dr. Yezer discussed a report conducted by economists at the Federal Reserve Bank of Boston.[2] These economists likely had information on the date of application. The date of application greatly facilities matching HMDA data with public deed records. But the Federal Reserve System deletes that data from the publicly available HMDA data. Dr. Yezer also states that property addresses and the identification number of the originator facilitate identifying borrowers in HMDA data.[3] Again, this discussion is highly misleading because the CFPB will not publicly disclose property addresses or the identification number for the originator in the HMDA data.

Time and effort required to match HMDA and deed records: Dr. Yezer uses an example of Montgomery County, MD to assert a match rate of HMDA and public deed records of 72 percent. He says this county should be among the most difficult to match HMDA and deed records due to the diversity of the population. However, Dr. Yezer’s computations to make matches are complex and involved. In most cases, adversaries would want to use databases that provide information without extensive and time consuming manipulations involved in matching HMDA and public deed records.[4]

Dr. Yezer admits that the deed records vary in their format and ease of use, meaning that many of them are not easily searchable or “scrapable”.[5] In fact, Dr. Yezer claims that the online Washington DC tax records is “very user friendly.” Yet, the Washington DC tax records cannot be downloaded. The user accesses a few records at a time, which would make the matching exercise Yezer describes in his paper to be quite cumbersome and time consuming. The United States contains more than 3,000 counties. These counties are likely to have deed records that vary in their ease of use and their accessibility. A project that sought to match HMDA data and public deed records for the purpose of identifying borrowers would encounter considerable inefficiencies due to the wide variety of public deed records.

Dr. Yezer asserts that key variables used for linking HMDA and public deed records are loan amount, census tract location, and lender name. The CFPB is sensitive to this possible use of HMDA data and has proposed to mask the loan amount in HMDA data in order to make it more difficult to match HMDA data to deed records. The CFPB is proposing to report the midpoint of a $10,000 range for loan amount, which is likely to confuse efforts to match HMDA data and public deed records, particularly in cases with borrowers with the same lender and similar loan amounts.

Readily Available Commercial Databases

In contrast to HMDA data, a number of readily available databases from the private sector can be used more effectively to identify and market to specific individuals on a large scale. Although we do not condone the use of this data for marketing or related purposes, the fact is that the data is already widely available. Some examples of these databases include:

Realtytrac

Realtytrac is a private sector real estate data service that offers a wide variety of data on neighborhood housing stock and price trends.[6] Users can view data for individual homes or summary data on a zip code level for free, for a limited period trial, or by purchasing data from Realtytrac. NCRC obtained sample data from Realtytrac in Excel format that costs about 8 cents per record. The data includes detailed information about homeowners including name, street address, loan characteristics, and property characteristics. This is far more detail than available in HMDA data.

The full list of variables available in Realtytrac for each homeowner is the following:

Name of borrower
Whether property is in foreclosure
Which bank owns the foreclosed property
Street address, city, state, zip code, county
# Bedrooms, # bathrooms, square footage
Lot size
Year of construction
Purchase date
Purchase amount
Number of open loans
Total amount of open loans
Estimated market value
Loan to value ratio
Equity
Loan date
Loan amount
Loan type – conventional or non-conventional
Loan interest rate
Lender name
Lender name for loan two
Bankruptcy

Adversaries can easily obtain Realtytrac data that includes real-time information on mortgage lending with several loan term and property fields. Variables about foreclosure status, bankruptcy, loan-to-value ratio, amount of total loans (first and subordinate liens) can enable marketers to readily identify borrowers in distress and target them for abusive refinance loans or loan modification or foreclosure scams. Realtytrac data exceeds information about properties provided in HMDA data and is therefore more useful to marketers.

Credit Bureau Prescreening and Marketing Reports

The nationwide consumer reporting agencies (credit bureaus) sell information derived from people’s credit and other consumer reports. Credit bureaus collect and hold data on most adults in the United States, and have aggressively marketed their data products to lenders, insurance companies, and even employers.

Prescreening reports are detailed marketing lists meant for use by lenders and insurance companies that intend to make firm offers of credit or insurance to the people listed in the report. These reports include individuals’ names and addresses and are available in lists broken out by credit score, other credit history data, including delinquencies or negative information, and/or personal characteristics. Equifax advertises that their prescreening reports can be customized using 1,500 credit-related data points.[7] Although prescreening reports are available only to lenders and insurance companies making firm offers of credit or insurance, many predatory lenders and scam companies fall into that category and can easily purchase information on individuals segmented by credit score or credit history for marketing.

Credit bureaus also sell summary information on household, census block, census tract and zip code levels that can enable predatory marketing. Equifax, for example, sells aggregated FICO scores on a zip + 4 code level that enables financial institutions to market products targeted to customers with a specified credit risk profile. Zip + 4 codes are small areas consisting of approximately seven to ten households.[8] Credit score data for zip + 4 units is available for sale for less than two cents per record, and data sellers will match the aggregate credit score data with names and addresses. People living in zip codes where residents have low aggregate FICO scores can be easily targeted by marketers of risky and abusive products.

The credit bureaus advertise that these zip + 4 products are not subject to the Fair Credit Reporting Act (FCRA) because they use aggregate rather than individual data, but we believe that these products should be subject to FCRA and treated as prescreening reports. Even if, however, access to zip + 4 data products is restricted to lenders and insurers eligible to purchase prescreening reports, subprime lenders and other finance companies seeking to target people with poor credit for high-cost loans would still have full access to this low-cost and easily available data.

Data Brokers

Financial institutions and other marketers can also purchase the services of data brokers that obtain detailed data from a variety of sources, including combing the internet for information on people’s purchasing habits and hobbies. The Federal Trade Commission (FTC) in a report called, Data Brokers: A Call for Transparency and Accountability, documents that sophisticated brokers can help their clients target people for marketing pitches. Data brokers provide their clients with detailed information on customers including age, gender, net worth, real property attributes, household income, and credit card usage. The brokers conduct data analysis to segment customers into groups with names like “Underbanked” or “Financially Challenged.” These names suggest that the marketing pitches may take advantage of people’s financial vulnerabilities. The data brokers can also group consumers into buckets based on estimates of their creditworthiness or utilization of bankcards. Finally, the FTC suggests that the data brokers do not monitor their clients’ use of the data carefully, merely asking their clients to read the terms and conditions on their websites.[9]

Charge 2 – Expansion of HMDA data to include indicators of creditworthiness will make the credit scoring of individuals publicly available, exposing them to predatory marketing, cyber-crimes, and identity theft.

Rebuttal: Dr. Yezer asserts that expanding HMDA to include creditworthiness indicators may violate the Fair Credit Reporting Act (FCRA) and the Right to Financial Privacy Act (RFPA). He challenges the disclosure of variables related to the race, ethnicity, nationality, and gender of borrowers in connection with variables such as credit scores, points and fees, loan-to-value ratio, interest rates, and discount points. While Dr. Yezer erroneously complains about FCRA and RFPA violations associated with the HMDA data, the improved HMDA data is designed to further protect borrowers through its ability to more easily track predatory lending, which would enable public agencies, lenders, and community groups to take steps to stop the abusive lending before it causes another financial crisis.

The Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010 expanded HMDA data to include additional borrower characteristics and loan terms and conditions in order to improve its ability to aid in fair lending enforcement and the detection of abusive lending. Without solid data regarding loan terms and conditions, HMDA data would be ineffective in spotting and helping to deter predatory lending before it becomes widespread (a lesson we learned too well with the pre-Dodd Frank HMDA data in the years leading up to the financial crisis).

The data on creditworthiness would also enable researchers and government agencies to evaluate whether the denial of access to credit for borrowers was related to a risk-based evaluation of the lending or whether denials of creditworthy applicants are possibly discriminatory. We believe that data on creditworthiness is amenable to anonymization through the “bucketing” of values, assignment of intervals which would be sufficient to allow for inferences without compromising the identity of the borrower. Furthermore, as it currently stands, Dr. Yezer’s complaint is meaningless because, under the CFPB’s proposal credit scores would not be released to the public.

At other points in the paper, Dr. Yezer suggests that the new Dodd-Frank data elements of debt-to-income ratio, loan-to-value ratio, and pricing information can “allow imputation of the approximate credit score of the borrower.[10] This suggests that Dr. Yezer is privy to the algorithms of credit score models, which is doubtful. Therefore, it is unlikely that adversaries can impute FICO scores or credit scores of other models using the Dodd-Frank data.

Dr. Yezer suggests that older adults taking out reverse mortgages might be especially vulnerable to predatory marketing since adversaries could identify older adults in the HMDA data.[11] The adversaries would then approach the older adult borrowers via email or some other means and appear to be a trusted advisor since they know details of their loans and financial condition. This scenario assumes that it is easy to identify borrowers in the HMDA data. For the reasons outlined above, this is not the case.

Also, Dodd-Frank added the enhanced information about reverse mortgages and loan terms and conditions because opacity in the marketplace in the years before the financial crisis exposed older adults to scams. The additional information is designed to help regulatory agencies and members of the public spot worrisome lending practices targeted to the elderly and curb them before they cause widespread harm and equity stripping. Contrary to Dr. Yezer’s assertion, data on borrower age, reverse mortgages, and loan terms and conditions will help protect older adults rather than expose them to scams.

Dr. Yezer is particularly focused on asserting that minorities are the borrowers most at risk of harm from the new HMDA disclosures. He contends that since minorities receive relatively few loans, adversaries will have the easiest time matching HMDA data to public deed records and identifying minorities. While this is possible in the case of unusual applicants such as the single female Japanese applicant for which Dr. Yezer professes much concern, we do not believe this is possible across the board given the difficulty and complexity of combining HMDA and public deed records for the purposes of identifying borrowers.[12]

Dr. Yezer’s fixation on the violation of privacy for minorities is perverse since a major purpose of collecting the demographic data in HMDA is to detect and remedy inequitable and discriminatory lending practices. This false paradox, that privacy is being sacrificed for the achievement of equity and fairness in lending, is intended to blunt a public policy of the past four decades that has endeavored to achieve fair markets and the absence of discrimination. It is notable that despite the apparent alarm over privacy violation, Dr. Yezer fails to cite one instance in which misuse of the HMDA dataset has resulted in harm. To our knowledge, no federal agency has testified before Congress or issued a report highlighting that an invasion of citizen privacy is due to HMDA. As noted earlier, there have been numerous data breaches of commercial datasets which have severely compromised the privacy of tens of millions of citizens.

Last, the CFPB’s proposed policy guidance carefully discusses the benefits and costs of HMDA disclosure, particularly the new enhanced data that includes the elements required by the Dodd-Frank Act. The CFPB discounts the possibility of identity theft since HMDA data lacks personally identifying information like Social Security numbers. The CFPB acknowledges possibilities like the single female Japanese borrower in largely white neighborhoods being identified. To diminish the possibilities of these occurrences, the CFPB proposed to modify the disclosure of certain variables and to publicly release ranges or buckets rather than precise numbers. As discussed previously, the key matching variable of loan amount will be disclosed as a midpoint of a range in order to make matching more difficult.

The CFPB is careful not to categorically deny the possibilities of identifying borrowers but concludes that the benefits of disclosure of enhanced data outweighs the costs. The possibilities of using the HMDA data for widespread predatory marketing is remote as shown above. Other means of predatory marketing were employed before the HMDA enhancements and would likely still be the preferred methods. Instead of harm, the Dodd-Frank HMDA data would more likely aid policymakers and stakeholders in identifying and curbing abusive practices and bolstering fairness and transparency in the marketplace.

Dr. Yezer’s paper is dated October 2017. The CFPB proposed its policy guidance for the release of Dodd-Frank HMDA data on September 24. The CFPB solicited public comments through November 24, 2017. The proposed policy guidance included a lengthy discussion of the benefits and costs of HMDA data. It does not blindly disclose all the Dodd-Frank data elements but withholds particularly sensitive elements like credit scores from public dissemination and modifies the disclosure of other elements.

Dr. Yezer’s paper acts like the proposed policy guidance does not even exist. He maintains that the CFPB has not conducted a privacy impact statement (PIA) while he or his funder, the Mortgage Bankers Association, must have known about the proposed policy guidance.[13] To act as though the CFPB has not even conducted a PIA or an analysis similar to a PIA is simply disingenuous. And then to imply falsehoods regarding the disclosure of sensitive data elements like date of application, property address, or credit score when the CFPB will not disclose these elements casts further doubt on the honesty and impartiality of the paper. One suspects that the paper is intended to stir up opposition to HMDA disclosure rather than carefully considering the benefits and costs with the objective of maximizing benefits while minimizing costs.

Charge 3 – New HMDA data disclosures release information about borrowers that is commonly suppressed in other government datasets, such as those of the U.S. Census Bureau (USCB).

Rebuttal: Dr. Yezer suggests that standards imposed by the USCB should be a model for HMDA data dissemination. The geographical area in census data tends to be states or metropolitan and rural portions of states. The Current Population Survey (CPS) for example, contains detail on households but the geographical information is limited to reports on demographic data in “principal cities,” other metropolitan areas, and non-metropolitan areas of states. Likewise, the American Housing Survey (AHS) contains information on mortgages and house prices but only differentiates between center cities and suburbs. AHS does not identify lenders.[14]

If HMDA mimicked the release of Census data, it would not be able to achieve its statutory purposes. Lending patterns in cities, suburbs, and rural areas could be analyzed and provide some useful information. But HMDA is intended to reveal whether lenders are meeting the credit needs of borrowers and communities. Communities are neighborhoods at the census tract level that contain about 4,000 people on average. If HMDA data did not contain information at the census tract level, it could not reveal whether neighborhoods differing by the proportion of minorities or modest income people were receiving credit at relatively equal amounts or whether significant disparities were occurring, suggesting that lenders were not effectively serving credit needs of all neighborhoods. HMDA’s statutory purposes suggest a pragmatic, problem-solving aspect to the data, such as how can stakeholders improve access to credit to neighborhoods receiving relatively few loans. If HMDA data does not contain information at the neighborhood level, this problem-solving purpose is defeated to the detriment of stakeholders trying to build wealth and revitalize communities.

Charge 4 – HMDA as it exists is not only prone to misuse, but is inaccurate and obsolete, providing misleading information and could/should be replaced by commercially available data.

Rebuttal: Dr. Yezer asserts that commercial datasets result in superior identification and analysis of credit flows to neighborhoods than HMDA. It is true that the current methods of HMDA data collection were established prior to the proliferation of alternative credit and mortgage financing provided by some internet-based non-depository institutions and transactions not using a financial institution. Many home purchase transactions are cash-only sales (about 30 percent of all sales according to Dr. Yezer) that are unreported to the CFPB.[15] The inclusion of cash sales in this discussion is misplaced. The purpose of HMDA is not to track property sales and transactions; those are already public domain in most counties. HMDA was always intended as a database of loan applications and as such should not be faulted for not including cash sales. By tracking the communities that access relatively little credit, HMDA allows us to better understand the persistence of cash sales in those communities, conversely making Dr. Yezer’s point not only moot but lending veracity to the need for HMDA.

While some commercial databases include loans not recorded by HMDA, they do not capture borrower race or ethnicity, leaving a gap in the reporting of demographic information on borrowers. Dr. Yezer asserts that techniques like identification of race/ethnicity by surname matching could be used to estimate the demographic information on borrowers with high confidence levels. However, the reliability of identification of demographic characteristics using this technique on a large scale for a database like HMDA with millions of loans is untested.

If HMDA was abandoned as Dr. Yezer suggests, should commercial sources be required to collect data on borrower/purchaser race and ethnicity, or would surname analysis and matching algorithms along with geographic identification provide sufficient estimations? Additionally, the datasets would have to be available to the public for researchers to make adequate inferences. If the data were withheld, it would only result in greater market opacity and monopolization by corporations of a critical data source. If a data source better than HMDA is available, the logical conclusion from Yezer’s recommendation would be to provide public access to the data, and allow researchers and community-based organizations to enhance the public interest in achieving greater equity and transparency of the home lending market. But how would we choose one commercial data source over another and which commercial vendor would allow its data to become publicly available at no cost? Dr. Yezer’s recommendation appears more to justify his rhetoric against HMDA and suggests a naivety regarding the for-profit and proprietary nature of commercial databases.

Dr. Yezer faults HMDA for missing significant amounts of housing purchases, yet this criticism does not acknowledge the statutory purpose of HMDA.[16] Congress enacted HMDA in 1975 because it was concerned that “some depository institutions have sometimes contributed to the decline of certain geographic areas by their failure pursuant to their chartering responsibilities to provide adequate home financing to qualified applicants on reasonable terms and conditions.”[17] In its definitions section, the law describes depository institutions but also non-depository institutions that were expected to report data. Congress was focused on assessing the extent to which financial institutions (depository and non-depository institutions) were serving credit and housing needs responsibly. HMDA’s statute was not aimed at covering cash sales, which are not conducted by entities incorporated as financial institutions. Dr. Yezer is criticizing HMDA for missing cash sales, a transaction that Congress did not intend for HMDA to cover.

Rather than advocate for HMDA’s cessation because it is incomplete, it could be expanded to include cash sales to address Dr. Yezer’s concern that HMDA is currently providing a “biased and incomplete view of financing of residential property.”[18] While not covering all lenders because some are exempt due to loan threshold triggers, HMDA is a valuable data source since it provides robust coverage of most banks and a large number of non-depository institutions. Denigrating HMDA without offering a feasible alternative does not serve constructive policy objectives.

Charge 5 – HMDA is a clear invasion of privacy and consumers must be given an explicit opt-out if HMDA data collection is continued

Rebuttal: Dr. Yezer states that if HMDA continues to be collected and disseminated, “mortgage applicants should be told that there is no privacy in HMDA data disclosures and they have the ability to opt out of having any information regarding their income, assets, or credit score collected as part of HMDA.”[19] In addition to the sarcasm, the statement is blatantly inaccurate. As stated above, the CFPB will not be disclosing any credit score information in the publicly available data. Even if HMDA data for a particular borrower was matched with public deed records, it is highly unlikely that an adversary would be able to figure out someone’s assets and dollar value of wealth. The statement also suggests that the federal government, when implementing a statute designed to prevent discriminatory and abusive lending, will recklessly and carelessly invade borrowers’ privacy to their detriment.

If Dr. Yezer is so concerned about consumer privacy, would he advocate for similar protections for commercial databases? Do consumers receive similar warnings about credit reporting agencies and credit score models? Equifax and the other credit bureaus do not ask for consumers’ permission to collect credit history data and to generate credit scores which they sell at a profit to lenders and other corporations. If Equifax had asked for permission, perhaps far fewer than 145 million consumers would have had their private and personal information like Social Security numbers imperiled? If HMDA is to be an-opt in model with dire warnings issued to borrowers, then credit bureaus should be under a similar requirement. As anybody that has applied for any form of credit knows, within days one can expect to be inundated with offers from other credit providers. This suggests that borrower data is sold freely, without knowledge or consent, on a secondary market already and without the nine month delay of HMDA.

In fact, HMDA does have disclosures to home loan applicants. The disclosure states that demographic data collection is intended to help the federal government enforce anti-discrimination statutes. The applicant is informed that answering the questions about demographic data is voluntary. The federal government instructs lending institutions to use the following language on their data collection forms:

The following information is requested by the federal government for certain types of loans related to a dwelling in order to monitor the lender’s compliance with equal credit opportunity, fair housing, and home mortgage disclosure laws. You are not required to furnish this information, but are encouraged to do so.[20]

Conclusion

In conclusion, Dr. Yezer raises the issue of privacy as a pretext for an assault on current HMDA data and proposed enhancements to mortgage data collection. The perverse suggestion that the privacy of minorities will be most impinged by enhancement of HMDA data collection distracts from part of the purpose of HMDA, to assure the non-discriminatory access to mortgage credit regardless of race, ethnicity or gender. In calling for an expansion of existing privacy measures to USCB standards, the recommendations erode the necessary geographic basis for assessing access to credit at the neighborhood level. The paper acknowledges that the proposed standard would make certain types of data collection impossible at all but the state level, preventing the analysis of credit flows in specific neighborhoods or the detection of “redlining” activity.

Privacy invasions of the type Dr. Yezer suggests using HMDA data are highly unlikely and Dr. Yezer himself is unable to find evidence they have occurred or will occur. He asserts that linking HMDA data to publicly available county level deed records is easily achievable. However, county-level data is collected in a decentralized and non-standardized manner that is not readily available for downloading and manipulation by adversaries. There are more than 3,100 counties in the United States. Creating a process for identifying borrowers across even a few counties would be daunting, let alone trying to do this regionally or nationally. In the more than forty years of HMDA’s existence, no federal agency has reported instances of privacy invasion due to HMDA.

The suggestions in Dr. Yezer’s paper represent an aggressive challenge to more than forty years of public policy efforts to increase data availability and transparency. If the suggestions were enacted, it would place community groups and the public at large at a severe disadvantage by rolling-back data availability and consolidating it in commercial hands. However, if commercially available data regarding mortgages and home sales were made available free to researchers and public interest groups, it could aid in detecting inequities in credit and lending, but only if the data included demographic variables. In the final analysis, the suggestion to replace HMDA data with commercially available data is not serious but part of an aggressive effort to denigrate and diminish efforts to ensure a non-discriminatory and fair marketplace.

For more information, contact Josh Silver, Senior Advisor, NCRC at jsilver@ncrc.org. This letter represents the views of the following undersigned organizations.

Sincerely,

Association for Neighborhood and Housing Development

California Reinvestment Coalition

Calvin Bradford & Associates, Ltd.

Consumer Action

Empire Justice Center

Massachusetts Communities Action Network

National Association of Human Rights Workers

National Community Reinvestment Coalition

National Consumer Law Center, on behalf of our low-income clients

National Fair Housing Alliance

New Economy Project

Ohio Fair Lending

Reinvestment Partners

S J Adams Consulting

Western New York Law Center

Woodstock Institute

[1] Anthony Yezer, Personal Privacy of HMDA in a World of Big Data, Working Paper Series, Elliott School of International Affairs, The George Washington University, October 2017, http://www2.gwu.edu/~iiep/assets/docs/papers/2017WP/YezerIIEP2017-21.pdf

[2] Yezer, p. 12.

[3] Yezer, p. 24.

[4] The CFPB uses the term adversaries to describe those intent on using HMDA and other data for harmful purposes. This paper adopts the same terminology.

[5] Yezer, p. 27.

[6] http://www.realtytrac.com/

[7] https://www.equifax.com/business/prescreen

[8] See http://www.equifax.com/assets/USCIS/aggregated_FICO_scores_ps.pdf and https://www.teamupturn.com/static/files/Knowing_the_Score_Oct_2014_v1_1.pdf

[9] Federal Trade Commission, Data Brokers: A Call for Transparency and Accountability, May 2014, via https://www.ftc.gov/system/files/documents/reports/data-brokers-call-transparency-accountability-report-federal-trade-commission-may-2014/140527databrokerreport.pdf

[10] Yezer, p. 4.

[11] Ibid.

[12] Yezer, p. 23.

[13] Yezer, p. 36.

[14] Yezer, p. 35.

[15] Yezer, p. 6.

[16] Dr. Yezer is focusing on cash only sales, which he states is about 30 percent of all sales, p. 6.

[17] https://www.law.cornell.edu/uscode/text/12/2801

[18] Yezer, p. 9.

[19] Yezer, p. 9.

[20] https://www.ffiec.gov/hmda/pdf/regulationc2004.pdf

More from NCRC

Start typing and press enter to search