Deduplication Best Practices

 

Duplicate records inhibit your entire organization. Your marketing team can't effectively segment and personalize communications. Sales teams step on each other's toes and lack vital context in conversations. Support teams miss important information, and analysis and reporting are skewed.

Insycle helps you merge duplicate contacts, companies, deals, and other objects flexibly and powerfully with the Merge Duplicates module.

However, getting the best results from your deduplication efforts involves considering how duplicates are matched and how that data is merged.

Here are some best practices to ensure good results.

Select Fields with Unique Values to Identify Matching Records

Insycle has to use the existing data in your CRM to identify duplicate records. Matching duplicates requires unique identifiers—data that is unlikely to be shared by any other record unless it is a duplicate. If you don't use unique identifiers, you may identify unrelated records as duplicates.

For instance, many CRMs match first names, last names, and email addresses. If all of those match, or are similar, you can confidently determine that the record is a duplicate.

step-1-fields.png

Commonly used unique identifying fields include:

  • First and last name
  • Company name
  • Email address
  • Email or website domain
  • Phone number
  • Website URL
  • Various ID numbers
  • Mailing address
Strategically Expand Your Matching Criteria to Catch More Duplicates

Most CRMs include very basic duplicate detection systems. Often, they identify duplicates using name and email or company name. This is an effective way to find surface-level duplicates.

But by broadening the way that you match duplicates, you'll be able to identify more of them in your database.

First, you can use both exact and similar matching in Insycle. With similar matching, you can match duplicates even when the data has slight differences between the two duplicate records.

Using these options, Insycle's pre-built deduplication templates can help you match duplicates in a variety of ways.

You can require that field values are an Exact Match to be considered a duplicate:

step-1-company-name-domain.png

Or you can use the Similar Match Comparison Rule on one or more of the fields to account for close variations on a value:

step-1-salesforce-accounts.png

If you'd like to look at the data in two different fields (that contain similar data) as if it were one, you can set up Related Fields under the Advanced tab. For example, you might want to look at both the Phone Number and Mobile Phone Number fields for duplicate values:

step-1-advanced-related-field.png

The Conditions tab provides rules one or more of the records in a duplicate group will need to meet:

  • Value Required in All Records - Each record must contain a value in this field to be considered a duplicate.
  • Empty Allowed in Any Record - A record can still be considered a duplicate if this field is blank. Allowing empty values requires using two or more fields to identify duplicates.
  • At Least One Record With Non-Empty - At least one record in the duplicate group must contain a value.
  • At Least One Record Match - At least one record in the duplicate group must match the specified value, and the other records cannot be blank. If none of the records have the specified value, the duplicate group will not be merged.
  • Only One Record Match - If more than one record in a duplicate group contains the specified field value, the duplicate group is skipped (not merged).

merge-duplicates-salesforce-contacts-step-1-conditions-all-5.png

Building multiple templates for deduplication allows you to catch and merge more duplicates that are gumming up your processes.

Learn more about detecting duplicates using advanced methods:
Hidden Duplicates: 11 Advanced Ways to Identify & Deduplicate Customer Data

Understand Similar Matching
Insycle's Merge Duplicates module gives you two options for matching fields—exact match and similar match.

The Similar Match Comparison Rule found in Step 1 will detect matches that are two keystroke deviations away from each other, like:

  • insertion: bar → barn
  • deletion: barnbar
  • substitution: barnbark

It looks for values that may be close but with a one-character difference (maybe a typo) and broadens the search.

step-1-email-only.png

This search behaves like when Google shows results for a slightly different term or says, “Did you mean...” For example, if an Email address of “hueyy@coahulldu.co” is found, it could include records with the Email values “hue.y@coahulldu.co, huey@coahulldu.co,” etc., as a match.

step-2-group-w-similar-match.png

Similar matching can be a great way to detect more duplicates, but you have to be thoughtful about how it is used, or you risk matching non-duplicate records and merging them together.

Similar Match uses looser criteria that cast a wider net for what can be considered duplicates. It is best to try Similar Match with very open and generic fields after trying everything else. When you do use it, make sure to carefully review the results to ensure the duplicates being identified are what you're expecting.

For example, if you used Similar Match on a phone number, other contacts who work at the same organization might be mistakenly identified as duplicates because they have similar numbers. 

Implement Automation

To save time and ensure that your duplicates are merging consistently on an ongoing basis, you can set up ongoing, automated deduplication using scheduled templates or Recipes. HubSpot users can also integrate Insycle Recipes with Workflows

You can view all your scheduled automations at any time on the Operations > Automations page.

merge-duplicates-step-5-review-update-automate-daily.png
Considerations for Master Record Selection and Data Retention Rules

When you merge duplicates, you should consider how data is merged.

Rules to Automatically Chose the Master Record

Under Step 4 on the Records tab, you define how all of the matching duplicate groups should be merged at scale. To do this, you need to create a series of rules that tell Insycle how to select the record from each group to become the master. The master is the record that will remain after the merge.

For example, if you had four records representing the same company, they would make up one duplicate group with four records, all of which would be merged into one master record. The other three records would no longer exist.

The master record for each duplicate group is determined using rules via an elimination process. Rules are read in order, from top to bottom. As soon as a record meets one of the criteria, this becomes the master and the rest of the rules are skipped. So if a record meets the first criterion (in the example below, the first criterion is “highest number of marketing emails clicked”), it is chosen as the master. 

step-4-master-select-record-rules.png

Configure Rules That Determine Values to Keep

The Merge Duplicates module allows you to control the values saved in the master record after the merge, regardless of the default merge behavior. By adding each field you want to control the data retention for and selecting a Condition, you can tell Insycle where the data for the field should be taken from and how to handle it.

Under Step 4, click the Fields tab. For each field you want to control the data retention for, you need to select a Field and tell it where the data for the field should be taken from. This is merged into the master. Any data that is not in the master or not copied to the master is deleted.

The Criteria dropdown gives you different options for choosing the data to keep:

merge-duplicates-hubspot-contacts-step-4-field-rules-9.png

  • From master record – Use the value that exists in the master record.
  • From master record (even empty) – If the field on the selected master record is blank, keep it that way. Don’t automatically fill it in with a value from the most recently updated record.
  • Most frequent value – If the same value appears in multiple records, use the one that appears most frequently.
  • From record where value – Select data from one of the records in the duplicate group based on the values. These options vary depending on the field type. For example, retain the data in the Email field that is using a professional domain rather than a free one.
  • From record based on other field value – Look at the value in a different field to decide which value from the duplicate group should be kept. The example above highlights how a Last Modified Date value can be used to determine which Lifecycle Stage value to use.
  • Combine and append all values – You can merge the values from the selected field for all records in the group. For example, if there is some type of Notes field, you could keep the notes from all of the records in the duplicate group.
  • Collect all values from other field – Select a destination field to copy and combine values into, then select what field the data should come from for each record in the duplicate group. For example, this could be used to keep the record Owner values of all duplicates in a group and combine them into a custom field.
  • Collect non-master values from other field - Aggregate the values of all the duplicates that are not the master and not the same as the master, meaning all instances of that value are excluded from collection. This can be especially helpful if you want a record of the object IDs that were removed, so you can also remove them from another system and keep the master. Select a destination field to copy and combine values into, then select what field the data should come from for each record in the duplicate group. 

Learn more about master records:
Bulk Merge Duplicate People, Companies

Start in Preview Mode

When you click the Review button in Insycle, you'll get to choose from two modes: Preview Mode and Update Mode.

preview mode

Preview mode does not update your live CRM data, but instead produces a CSV showing you the results that the deduplication process would have produced.

When you first run a new deduplication template, its always a good idea to run it in Preview Mode to ensure that it is working as you intended. Once you confirm that it is, you can run it in Update Mode or schedule it for automation.

Additional Tips

  • Begin with easy-to-find duplicates. Iterate through fields and rules you know will surface duplicates. Don’t expect to resolve all your duplicates by setting up and running this process once. You will need to run this process multiple times for different fields or nuanced variations.
  • Each time you get a merge process to run the way you want in your database, save it as a template. When you have a solid set of templates that reliably resolve most of your duplicates, you can put them together as a Recipe that can run on a regular, automated schedule.
  • You may also need to look for edge cases that fall outside your standard rules. These may be operations you run manually, so you can adjust based on what you find.
  •  Do some experimentation. Use the Preview mode CSV report to analyze patterns in the duplicates. You may learn what is causing the duplicates and learn how to avoid having them in the first place. You may also want to explore your data in the Grid Edit module to understand what you have so you can design templates that catch all potential variations.

Advanced How-Tos

Step 1: Setting Up the Fields to Find Duplicates

Each row in your matching fields setup is cumulative, so records must meet all of the criteria. For example, looking for records that have the same First Name and Last Name and Phone Number returns only results where all three values are the same.

  The minimum required length for the matching values is four characters or more. Values such as "Joe" or "Ace" will be disregarded.

Field Name Comparison Rule Ignored Match Parts

Pick a field that you think has some duplicate values.

Running a very simple match operation like just First and Last Name is okay for giving you an idea of what you have, but it is too broad to use for reliable analysis and deduplication. There may be legitimate duplicate names–different people with the same first and last name. You need additional, unique criteria to narrow it down.

Choosing Unique Identifiers

Matching duplicates requires unique identifiers—data that is unlikely to be shared by any other record unless it is a duplicate. If you don't use unique identifiers, you are likely to identify unrelated records as duplicates and may accidentally merge them.

Many CRMs match first names, last names, and email addresses. If all of those match, or are similar, you can confidently determine that the record is a duplicate.

Other unique identifying fields that are commonly used in deduplication include:

    • Phone number
    • Domain name
    • Mailing address
    • ID number
Step 1: Matching Using Two Different Fields

Sometimes, you might want to match duplicates using data in two separate fields. For example, you might want to compare your Phone Number field to a Mobile Phone Number field to identify duplicates.

Using the Related Fields feature, you can use two different fields (that contain similar data) as matching fields to catch more duplicates.

You can set up Related Fields in the Advanced tab.

bulk-merge_2.png

Common Examples of Related Field Matching

Matching Field Related Fields
Phone Number Mobile Phone Number, Company Phone
Email Domain Website, Company Domain
Email Additional Email Addresses
Address Company Address
Step 1: Allowing Empty Values When Matching

When using two or more fields to identify duplicates, records can still be considered matches even if one of the field values is blank. You just need to specify which field(s) allow a blank value.

Under Step 1, configure your matching rules in the Simple tab, then click the Conditions tab.

step-1-conditions-tab-arrow.png

All the matching fields you included will automatically appear with the Value Required in All Records condition selected. Change the condition to Empty Allowed in Any Record to allow empty values for certain fields. You can also use the At Least One Record with Non-Empty condition to help you determine which is the master record. Make sure at least one field remains required and is a reliable unique identifier to ensure the records are really duplicates.

step-1-conditions-empty-not-empty.png

For example, on the Simple tab, you may have the matching fields: First Name, Last Name, and Phone Number. But on some of your records, the Phone Number field may be empty. Using the conditions "Empty Allowed in Any Record," or "At Least One Record with Non-Empty," all records with the same name, same phone number, and no phone number will be considered duplicates.

step-2-group-w-empty.png

Step 4: Selecting Priority Match vs Absolute Match

step-4-priority-match-no-arrow-2023-06-01.png

Priority Match: Looks through the master selection rules in order, one by one. As soon as a record meets one of the criteria, Insycle makes the master selection and skips the rest of the rules on the list. The vast majority of duplicate templates should use Priority Match.

Absolute Match: The master record must meet all of the listed rules in the Record tab in Step 4. If a record does not match every rule listed, no master record will be identified. Absolute Match is appropriate for less flexible master selection.

For example, if a company wanted to ensure the chosen master record is in their sales pipeline and already has a sales rep working the record, they can choose Absolute Match and set the Record rules:

  • Lifecycle Stage is lead
  • Contact Owner exists

Choosing Absolute Match can often result in no master record being identified since the record has to match every rule listed, so in most cases, you should select Priority Match.

Step 4: Understanding Master Record Selection

Let's say we have four records that represent the same person—Marta Vaskovitch. The Merge Duplicates module will identify this as one duplicate group consisting of four records.

Here is the data that we have for this duplicate group:

mceclip11.png

Here are the master selection rules we have set up:

step-4-record-mktg-eml-high-low-not-gmail.png

We haven't sent any emails to Marta yet, so when Insycle processes the first three rules—Marketing emails clicked, emails bounced, and emails opened—Insycle cannot eliminate any record because they all have the same value of zero.

In the next rule about contact owner, records 61301, 61201, and 61251 are eliminated since no contact owner exists for those records. Now, only one record remains, 61351, therefore that's the master record.

mceclip13.png

Step 4: Considerations When Picking a Master Record

For contacts, it's often useful to pick master records based on engagement. For example, the highest number of email clicks or the most recent email opened. You can also use other statuses to pick a master record, such as the furthest along in your sales lifecycle or the most recently updated record.

merge-duplicates-hubspot-contacts-step-4-record-tab-7-rules.png

For companies, it's often useful to use associated records to determine the master record, such as the highest number of associated contacts or deals.

If you have a connected app, like Salesforce or an ERP system, pick the master record that is syncing with the other apps.

merge-duplicates-hubspot-companies-step-4-record-tab-5-rules.png

Step 4: Control What Field Data is Retained

Though it's possible that duplicate records may be exactly the same, often there is only partial data overlap between them. When data is split between two different records, both records may contain unique and important information about the customer you'd like to keep.

The Merge Duplicates module allows you to control the values saved in the master record after the merge, regardless of the default merge behavior. By adding each field you want to control the data retention for and selecting a Condition, you can tell Insycle where the data for the field should be taken from and how to handle it.

For example, if merging HubSpot companies, by default the HubSpot field “Merged Company IDs” would not be populated with the Record IDs of the duplicates that were merged into the master record. 

Say you want to save all of the Record IDs from records that are merged together and deleted. You can add a new custom field, “Insycle Merged Record IDs,” to your CRM.

Then in the Merge Duplicates module, under the Fields tab of Step 4, add a rule to override the default merge behavior. Select the "Insycle Merged Record IDs" field, the "Collect all values from other field" criteria, and "Record ID" as the other field. 

step-4-fields-collect-all-values-from-other-field.png

You can use the Preview to see how this will preserve the Record IDs of all the duplicates in each duplicate group.

step-4-collect-all-values-from-other-field-CSV-example.png

Step 4: Data Retention Setup Examples

The master record can use values from several different records from the duplicate group, based on the rules that you set in the Fields tab under Step 4. 

By default, any fields not specified here will use the master record values. However, if the master field is blank, the value from the most recently updated duplicate will be used.

In this first example, the Ownership value from the record with the most recent Modified Date will be kept, and all the Account Owner values from the records in the duplicate group will be saved to the Merged Owners custom field.

merge-duplicates-step-4-field-rules-owner.png

In this example, the most recent interaction data for several fields will be used in the merged record.

merge-duplicates-intercom-step-4-field-rules-latest.png

In the example below, six master field rules have been set up, including two different rules for the Lifecycle Stage. Insycle will look at the first of the two, and if it finds a record that matches the criteria, the second Lifecycle Stage rule will be ignored. In the example, if a record in the duplicate group with the "Lifecycle Stage" of "Customer" is found, then the next rule looking for the "Lifecycle Stage" of "SQL," would be ignored.

step-4-fields-two-lifecycle-stage-rules.png

Step 4: Customizing Merge Logic

For situations where there are no common rules you can apply for identifying duplicates for all or some of the records, you may need more granular control for picking records to include or exclude from the process. In these cases, you can use CSV files to customize your bulk merging, designate master records, and exclude records from deduplication. Then you can import the CSV from the Magical Import, and use the Merge Duplicates module for complete control over the final merge operation. Learn how to customize merging Duplicates in bulk using a CSV.

Frequently Asked Questions

How can I find duplicates when one field is empty?

When using two or more fields to identify duplicates, records can still be considered matches even if one of the field values is blank. You just need to specify which field(s) allow a blank value.

Under Step 1, configure your matching rules in the Simple tab, then click the Conditions tab.

step-1-allow-empty_1.png

All the matching fields you included will automatically appear with the Value Required in All Records condition selected. Change the condition to Empty Allowed in Any Record to allow empty values for certain fields. You can also use the At Least One Record with Non-Empty condition to help you determine which is the master record. Make sure at least one field remains required and is a reliable unique identifier to ensure the records are really duplicates.

step-1-conditions-empty-not-empty.png

For example, on the Simple tab, you may have the matching fields: First Name, Last Name, and Phone Number. But on some of your records, the Phone Number field may be empty. Using the Condition "Empty Allowed in Any Record" or "At Least One Record with Non-Empty," all records with the same name, same phone number, and no phone number will be considered duplicates.

step-1-allow-empty-review.png

Can I match duplicates using two different fields?

Yes. This can be done, for example, if you want to look at both the Phone Number field values and Mobile Phone Number field values as a single pool of values to compare between records and identify duplicates.

Using the Related Fields feature, you can use two different fields (that contain similar data) as matching fields to catch more duplicates. You can set up Related Fields in the Advanced tab.

merge-duplicates-step-1-advanced-related-phone-field.png

How do I ensure that I am not merging non-duplicate records together?

Currently, there are two ways to make sure that the records that you are merging are indeed duplicate records.

First, always run your deduplication templates in Preview Mode before running them in Update Mode. This produces a CSV that shows you how your records would have been merged. Then you can ensure that your Merge Duplicates template is working as expected and not merging non-duplicate records together.

Additionally, to ensure a smooth merge process, consider narrowing down the matching settings in Step 1. Try the Exact Match Comparison Rule instead of Similar Match. Then make sure that you are using actual uniquely identifying fields—first name, last name, email, and phone number are popular choices. The more tightly defined your filter is, the less likely you are to merge non-duplicate records.

Insycle is having trouble determining a master record. What could be causing this issue?

If the Result column of the CSV report displays this error:

Cannot determine master record because multiple records (#) satisfy the master selection rules. In ‘Master Selection’, change/add/reorder the rules such that only one record satisfies them (if cannot determine master based on field values, use ‘ID is lowest’ as the last rule).

This error means that based on the master rules you set, Insycle could not figure out which would be the master.

Check Step 4 to ensure that you have Priority Match selected and not Absolute Match.

step-4-priority-match-w-arrow-2023-06-01.png

With Priority Match, the rules configured in the Records tab of Step 4 are processed in order and your master record only has to match one rule. Using Absolute Match, your master record would have to meet all of the rule criteria. The majority of the time it is best to select Priority Match.

If Priority Match was used, then none of the records meet any of the criteria on the list more than the others. In this case, you'll need to experiment with Step 4, reordering or adding additional rules for fields likely to have unique values.

I already have a list of duplicates, can Insycle bulk merge them?

Yes. You can use an existing CSV with duplicate record details. The file needs to includ the record IDs and a "Deduplication Master" column, specifying which records should be the master, kept after the merge. Next, create a custom field "Deduplication Master" in your CRM to facilitate the merging. Use the Magical Import module to import the edited CSV file into the CRM, populating the new custom field. Finally, utilize this custom field to merge the duplicate records in the Merge Duplicates module.

Learn more about customizing bulk deduplication from a CSV.

Can I select which data is retained in my master record on a field-by-field basis?

Yes, Insycle allows you to select which field data is retained in the master record using the Fields tab under Step 4. See the Bulk Merge Duplicate People, Companies article for more details.

step-4-field-salesf-accts-all-criteria-2023-06-01.png
Can I exclude some records from deduplication?

Yes. You can exclude records from deduplication by creating a CSV with a "Deduplication Exclude" field.

First, you'll export a Preview CSV from the Merge Duplicates module, add an exclude column, and specify which records should be excluded from the merge process. Next, create a custom field in your CRM to facilitate the merging. Use the Magical Import module to import the edited CSV file into the CRM, populating the new custom field. Finally, utilize this custom field to merge the remaining duplicate records in the Merge Duplicates module.

Learn how to customize bulk deduplication using a CSV.

My team needs to review and approve the master, can I accommodate that with Insycle?

Yes, there are several ways to share details and get approval before merging duplicates.

You can manually approve master records and mark them in a CSV, then use Insycle to bulk deduplicate down to those master records. Consult with this Customize Bulk Deduplication Using Exclusions and Pre-Defined Masters article to learn more.

Or, you can run the Merge Duplicates module in Preview mode and then deliver the preview CSV that Insycle generates. The CSV report that Insycle generates includes your entire merge operation down to individual duplicate groups but does not update your live data. Then your team can approve the merge based on this report, before running Merge Duplicates in Update mode.

Additionally, team members can review duplicates and manually select the master for each record under Step 4. Review the Manually Merge Duplicates article for more detail.

step-4-manual-select.png

Do the field values I use to match need to be exactly the same?

No, the field values do not need to match exactly. The Similar Match Comparison Rule found in Step 1 looks for values that may be close but with a one-character difference (maybe a typo) and broadens the search.

step-1-email-only.png

This search behaves like when Google shows results for a slightly different term, or says “Did you mean...” For example, if a Company Name of, “Acme” is found, it could include records with the Company Name values “Akme, acm, Acma,” etc., as a match.

step-2-group-w-similar-match.png

You should be careful when using Similar Match as the looser criteria can incorrectly identify non-duplicates as duplicates. 

Review the Understanding Similar Matching best practices for more detail.

Why can I only process 50 duplicate groups at a time?

Insycle shows 50 records on the module screen as a preview; this isn't the entire list of records. see To everything, include All records when you view the Preview CSV report.

Insycle can process thousands of duplicate groups in one operation. Potentially, you could deduplicate your entire database in one operation. 

How many duplicates can I merge into one master record?

You can merge up to 100 duplicate records into a single master record. 

If you have duplicate groups that contain more than five records, you may want to change the value in Skip duplicate groups with more than 5 records per group under Step 3 to make sure you can get them all.

merge-duplicates-step-3-bulk.png

This is a precaution to ensure that if you use a duplicate matching filter that is too broad in Step 1, you do not accidentally merge many non-duplicate records together. If you are going to set this number at a high level, it is a good idea to run Preview Mode first to make sure your deduplication template is operating as you intend.

Are there any limits on the number of records that can be identified and merged with my paid subscription?

All plans include unlimited usage, unlimited users, and unlimited operations. During the free trial, there is a cap of 500 records updated, cleansed, or merged. See the pricing page for more details. 

Additional Resources

Related Help Articles

Related Blog Posts