Added on May 10, 2017
As of April 2017, the CFPB has updated their BISG methodology. There are two things that have changed and RATA will handle the changes in two phases. The first and biggest change is that they have started to use the 2010 Census name table whereas up until now it was the 2000 Census name table (see details of counts below). This change increased the number of surnames in the table by about 10% and in running tests it seemed to reduce the Surname Not Found rate by about 10%, so it is a pretty significant improvement. The new name table has been created and tested for Comply and will be released via Auto-Updates today. For clients not using Auto-Updates for system data the file will also be available on the support site.
The second update to the BISG process is much less significant but will require a code change in the software. Basically, they are handling O's and D's in names like O'Donnell and D'Angelo differently to make the process match them better. This modification should only affect a very small percentage of names and will be included in the Comply 17 release in September.
As always, let us know if you have any questions or comments.
From the CFPB Website - https://github.com/cfpb/proxy-methodology
In the summer 2014 edition of Supervisory Highlights,3 the Bureau previously reported that examination teams use a Bayesian Improved Surname Geocoding (BISG) proxy methodology for race and ethnicity in their fair lending analysis of non-mortgage credit products. The BISG methodology relies on the distribution of race and ethnicity based on place-of-residence and surname, which are publicly available information from Census. The method involves constructing a probability of assignment to race and ethnicity based on demographic information associated with surname and then updating this probability using the demographic characteristics of the census block group associated with place of residence. The updating is performed through the application of a Bayesian algorithm, which yields an integrated probability that can be used to proxy for an individual's race and ethnicity.4
Through March of 2017, examination teams had relied on the surname list derived from the 2000 Decennial Census of the Population in their construction of the BISG proxy for race and ethnicity.5 In December 2016, the U.S. Census Bureau released a list of the most frequently occurring surnames based on data derived from 2010 Decennial Census of the Population. The updated 2010 list generally uses the same definitions and formats as the list based on the 2000 Census but includes updated values for total counts and race and ethnicity shares associated with each surname.6 In total, the new surname list provides information on the 162,253 surnames that appear at least 100 times in the 2010 Census, covering approximately 90% of the population.7 While 146,516 names appear on both the 2000 and 2010 surname lists, the 2010 list contains 15,737 names that do not appear on the 2000 list, whereas the 2000 list contains 5,155 names that do not appear on the 2010 list.8
As of April 2017, examination teams are relying on an updated proxy methodology that reflects the newly available surname data from the Census Bureau. Our updated proxy methodology relies on the race and ethnicity shares for the 162,253 names that appear on the 2010 list and supplements this list with the race and ethnicity shares for the 5,155 names that appear on the 2000 list but not on the 2010 list, resulting in a list of 167,409 surnames in total.9