Entity Resolution is the process of linking and grouping of constituting records for entities. For an organization, the most important entities are people (customers, employees, contractors), assets (transferable, non-transferable), products (tangible, intangible) and geography (locations, maps, boundaries). Every instance of an entity will have one or more identifying attributes. For example, people can have many identifiers such as Employee ID, Social Security, TIN, Email Address, Drivers License Number, Passport Number or State ID Number. These are called Personal Identification Information (PII) and help uniquely identify a person. Other associated information such as Phone Number, Merchandise Rewards Number or Login ID are also important to manage, since these bring other pertinent information about the person and help to build a better picture about that person’s involvement, persona and choices.

An entity resolution process merges multiple (and/or duplicate) records of the same entity into a single record in such a way that the golden (or final) record can inherit all associations from all contributing records. Records are matched based on the fields they have common and some other adjustable parameters. Usually the common fields are chosen from the list of fields containing identification information (PII in the case of people). Sometimes the constituting records look disparate, but they might point to the same identity. In some cases, they might point to the same entity, but there is no easy way to decide. Entity resolution performs various logical operations such as fuzzy matching, Soundex, parsing or error distance calculation to match the common values and decide whether those records belong to the same entity. Once a proper match is found, the algorithm combines those matched records into a cluster and assigns a unique number to it. A Golden Record is chosen from all the records from each cluster while gathering information from all accompanied fields.

For example, a bank database might contain the below records as gathered from different data source systems:

  1. JAMES BROWN‘ staying at ‘123 N Highland Street, Apt. 1002, Clevelnd, OH
  2. JAMES K. BROWN‘ staying at ‘123 Highland Street, Apartment 1002, Cleveland, OH, 44101‘ with phone number ‘123-456-7890
  3. J. BROWN‘ staying at ‘123, North Highland Str., Cleveland, OH, 44101-2203‘ with e-mail address ‘jmsb@mydomain.com
  4. Mr. BROWN‘ staying at ‘Cleveland, OH‘ with social security number ‘987-65-4321

In this case, matching will be performed against Name and Address, as they are the common fields. The first 3 records are clustered together because they have closely matching names and addresses. Hence the first 3 records belong to the same person, but the 4th record is not. Below would be the expected output once the golden records are formed from the 2 actual entities above.

  1. JAMES K. BROWN‘ staying at ‘123 N Highland Str, Apt 1002, Cleveland, OH, 44101-2203‘ with phone number ‘123-456-7890‘, e-mail address ‘jmsb@mydomain.com
  2. Mr. BROWN‘ staying at ‘Cleveland, OH‘ with social security number ‘987-65-4321

 

Householding Process Explained

A household is a marketing term to denote a group of accounts. Householding is a process to group all the entities (individuals or organizations) who have similar family names and live at the same geographic location.

The below example explains a common householding scenario.

Account NumberNameAddressZip CodeSocial Security Number
1Mary Johnson123 Sun Street #234567-0900123-45-6789
2Mary and Bill Johnson123 Sun Street, Apt 234567123-45-6789
3Katy Johnson123 East Sun Street34567222-33-4444
4Bill and Mary Johnson123 Sun Street NE34567-1234123-45-6789
5Mike Smith123 Sun Street East34567345-45-6767
6Katy Johnson123 Sun Street East98999678-09-0000
7Katy Johnson386 Main Street34567119-33-4545

Accounts 1, 2, 3 and 4 will be combined into the same household because they all have the same last name and address. Because address information can be entered in different ways, the process ignores certain parts of the address, including the +4 portion of the zip code, apartment numbers and directions in street names.

The last three accounts do not belong to this household because:

  • Account 5 address is similar to the other accounts, but last name is different than other accounts.
  • Account 6 name and address is the same but the zip code is different than other accounts.
  • Account 7 name and zip code is the same, but address and social security number is different than other accounts.

Once the households are formed, the system allows displaying all the accounts that belong to the households.

 

Challenges in a Householding Process

There are many challenges that might produce incorrect household grouping through entity resolution. Some of them are mentioned below:

  • Too little information available to perform a proper matching operation
  • Non-standard information
  • Too loose or too tight matching logic
  • Improper parsing while matching on common fields

Below are some best practices that can lead to better householding results:

  • Allow as many common fields as possible for matching. That increases the chances of matching. One can select multiple alternative matching rules based on business requirements. e.g.
    • Family Name and Address
    • Family Name, Address and Social Security Number
    • Family Name, Social Security Number and Club Membership
  • Match all the common fields’ values after cleaning, standardization and noise removal
  • Select most effective matching parameter values
  • Tweak parsing and matching techniques based on requirements that cover the most scenarios
  • Separate company specific words to perform effective parsing and matching
  • Perform identification analysis to classify individuals and organizations to achieve better matching

 

Conclusion

Entity Resolution is a multi-step process constituting the below steps. These steps are a good start, however they may not be exhaustive to completely reach the desired outcome.

  • Standardize all necessary fields
  • Decide matching and parsing rules
  • Generate matches
  • Cluster all the records
  • Review the related clusters
  • Perform golden record selection
  • Process clustered records

Once a strong householding process is established, the marketing department can save in print costs (improved customer-per-household factor), increase campaign effectiveness (focused households), improve customer experience (household level account summary), generate better customer insight (choices, inclinations) and simplify analytics processes (trends, future needs).