You know you have duplicates in your database. But after you merge them in Insycle, some duplicate records remain floating in your CRM, causing problems with your processes.
In this article, we’ll cover some simple tips that you can use to expand your deduplication templates and catch the maximum number of duplicates in your database.
When it comes to deduplicating your data, there may not be a single template that can identify and merge all duplicate records. Different fields may require different matching criteria, and a one-size-fits-all approach can often overlook edge cases or specific scenarios.
As a best practice, you should break your duplication issues into smaller problems and address them individually. Avoid trying to solve everything at once. Deduplication will typically require multiple passes and iterations.
When creating multiple templates, start with the easier, more straightforward fields first, such as names or email addresses. These fields typically have a higher likelihood of surfacing duplicate records. When setting up your templates, it's generally best to start with Exact Match. Once you've addressed the low-hanging fruit, you can then iterate through more complex fields or edge cases, where the Similar Match option might be helpful.
For example, if working with contact records, you could have several templates that all use the name, but each uses various additional fields and parameters. You could have templates to deduplicate by:
- Similar name, same email
- Same name, same related company or account
- Similar name, same IP address
- Same name, same business phone or mobile phone number
When you have a set of templates that address your duplication issues for a record type, you can bundle them into a Recipe and run them together.
Every merge operation relies on matching fields to identify duplicate records in your database.
You want to use fields in Step 1 that contain unique information that is unlikely to be shared by any other record unless it is a duplicate. For example, a contact record that shares the same first name, last name, and phone number as another record is highly likely to be a duplicate. For companies, things like company names and website addresses are good unique identifier fields.
Reliable, Often Used Matching Fields:
- First and last name
- Company name
- Domain or URL
- Phone number
- Mailing address
- ID number
- External system ID
Tip: Avoid Overly Broad Matching Criteria
To avoid identifying non-duplicate records as duplicates, don't create templates that are too broad.
For example, if you used only "First Name" as your matching field, you could accidentally merge every person with a matching first name together, even though they work at different companies and are not the same people.
Make sure your match fields are a unique identifier.
Unique identifiers are data that is unlikely to be shared by any other record unless it represents the same underlying entity. Fields commonly used in deduplication include phone numbers, email, mailing addresses, and ID numbers.
The Comparison Rule lets you define what kind of likeness to look for when deciding if field values should be considered a match.
Using Similar Match instead of Exact Match can be a great way to identify records that are only slightly different. It looks for values that may be close but with a one-character difference (like a typo, extra character, or missing character) and broadens the search. This search behaves like when Google shows results for a slightly different term or says, “Did you mean...”
For example, if a Company Name of, “Acme” is found, it could include records with the Company Name values “Akme, acm, Acma,” etc., as a match.
However, it is very important that you consider the field you're using it on. Similar Match uses looser criteria that cast a wider net for what can be considered duplicates, so it's not appropriate for every field. For example, you wouldn't want to use Similar Match on a Phone Number field because people with similar (but different) phone numbers may be identified as duplicate records.
If using ID fields to identify duplicates, note that they will only work with Exact Match, not Similar Match.
Insycle also allows you to ignore elements within your fields, so only relevant portions of the values are analyzed.
- Ignore Symbols and Whitespace when comparing phone numbers.
- Ignore HTTP, www, subdomain, or top-level domain (.com vs co.uk) when comparing websites or email domains is a great way to catch more advanced duplicates.
- Insycle comes preloaded with terms to ignore. If you select Common Terms, click the Terms button to view and edit this list on the Common Terms tab.
- If you select Text (substrings), click the Terms button, then the Ignored Text tab, and enter text to be ignored. Separate multiple substrings (or phrases) with a new line.
*If you’ve set up Ignored terms or strings, don’t forget to also enable them. Select the Ignored > Common Terms or Text (substrings) checkbox.
Setting Match Parts in Step 1 allows you to hone in on specific portions of field values. If the values' beginning or ending portions are all unique, you can limit the comparison to that part.
For example, you can instruct Insycle to only look at:
- First X Words
- First X Characters
- Last X Words
- Last X Characters
Sometimes, you might want to match duplicates using data in two separate fields. For example, you might want to compare your Phone Number field to a Mobile Phone Number field to identify duplicates.
Using the Related Fields feature, you can use two different fields (that contain similar data) as matching fields to catch more duplicates.
You can set up Related Fields in the Advanced tab.
Common Examples of Related Field Matching
Matching Field | Related Fields |
---|---|
Phone Number | Mobile Phone Number, Company Phone |
Email Domain | Website, Company Domain |
Email Address | Additional Email Addresses |
Mailing Address | Company Address |
The Conditions tab provides rules that one or more of the records in a duplicate group must meet. These options let you choose fields that are required, can be empty, or specify values that must be included.
The Conditions tab provides rules one or more of the records in a duplicate group will need to meet.
- Value Required in All Records - Each record must contain a value in this field to be considered a duplicate.
- Empty Allowed in Any Record - A record can still be considered a duplicate if this field is blank. Allowing empty values requires using two or more fields to identify duplicates.
- At Least One Record With Non-Empty - At least one record in the duplicate group must contain a value.
- At Least One Record Match - At least one record in the duplicate group must match the specified value, and the other records cannot be blank. If none of the records have the specified value, the duplicate group will not be merged.
- Only One Record Match - If more than one record in a duplicate group contains the specified field value, the duplicate group is skipped (not merged).
- Within Timeframe - Set a time parameter that can find duplicates created or modified within a specific timeframe, such as the last 20 minutes.
Insycle Recipes allow you to organize multiple templates into a multi-step data maintenance process for automation, training, and organization.
A Recipe is a collection of templates ordered into numbered steps that are run in sequence. You can add Insycle's built-in or your own custom templates to a Recipe.
Recipes can also be automated, running on a monthly, weekly, or daily basis. Then, you ensure that your duplicates are being identified and merged continuously, hands-free.
Additional Resources
Related Help Articles
- Bulk Merge Duplicate People, Companies
- Module Overview: Merge Duplicates
- Deduplication Best Practices
- Merge Duplicates with Blank Fields
Related Blog Posts
- Hidden Duplicates: 11 Advanced Ways to Identify & Deduplicate Customer Data
- Why HubSpot Duplicate Contacts are Hurting Your Marketing Team and Straining Your Budget
- Salesforce Duplicate Management: How to Automate Salesforce Deduplication
- Data Duplication and HubSpot: Dealing With Duplicates and the Impact They Have on Your Business