Deduplication Best Practices

 

Duplicate records impede your entire organization. Your marketing team can't effectively segment and personalize communications. Sales teams step on each other's toes and lack vital context in conversations. Support teams miss important information, and analysis and reporting are skewed.

With the Merge Duplicates module, Insycle helps you merge duplicate contacts, companies, deals, and other objects flexibly and powerfully.

However, getting the best results from your deduplication efforts involves considering how duplicates are matched and how that data is merged.

Here are some best practices to ensure good results.

Select Fields with Unique Values to Identify Matching Records

Insycle uses the existing data in your CRM to identify duplicate records. Matching duplicates requires unique identifiers—data that is unlikely to be shared by any other record unless it is a duplicate. If you don't use unique identifiers, you may identify unrelated records as duplicates.

For instance, many CRMs capture first names, last names, and email addresses. If all of those match or are similar, you can confidently determine that the record is a duplicate.

merge-duplicates-hubspot-contacts-step-1-first-last-email-domain-exact.png

Define the type of likeness to look for when deciding whether field values should be considered a match.

Commonly used unique identifying fields include:

  • First and last name
  • Company name
  • Email address
  • Email or website domain
  • Phone number
  • Website URL
  • Various ID numbers
  • Mailing address
Strategically Expand Your Matching Criteria to Catch More Duplicates

Most CRMs include very basic duplicate detection systems. These systems often identify duplicates using name, email, or company name, which is an effective way to find surface-level duplicates.

But by broadening how you match duplicates, you'll be able to identify more of them in your database.

First, you can use both exact and similar matching in Insycle. With similar matching, you can match duplicates even when the data between two records differs slightly.

Using these options, Insycle's pre-built deduplication templates can help you match duplicates in various ways.

You can require that field values are an Exact Match to be considered a duplicate:

merge-duplicates-hubspot-companies-step-1-simple-tab-name-domain.png

Or you can use the Similar Match Comparison Rule on one or more of the fields to account for close variations on a value:

merge-duplicates-salesforce-accounts-step-1-simple-tab-name-website.png

If you'd like to look at the data in two different fields (that contain similar data) as if it were one, you can set up Related Fields under the Advanced tab. For example, you might want to look at both the Phone Number and Mobile Phone Number fields for duplicate values:

merge-duplicates-hubspot-contacts-step-1-advanced-tab-first-last-phone-related-field.png

The Conditions tab provides rules that one or more of the records in a duplicate group will need to meet. See the Duplicate Identification Rules reference in the Module Overview for complete details.

merge-duplicates-salesforce-contacts-step-1-conditions-tab-7.png

Building multiple templates for deduplication lets you catch and merge more duplicates that gum up your processes.

Explore other advanced approaches to merging duplicates:

Understand Similar Matching

Insycle's Merge Duplicates module gives you two options for matching fields—Exact Match and Similar Match Comparison Rule, found on the Simple tab of Step 1

Similar Match will detect matches that are close but differ by two keystroke deviations, often typos, like:

  • insertion: bar → barn
  • deletion: barnbar
  • substitution: barnbark

This is often called "fuzzy matching" and helps find records with minor differences. The behavior is similar to how Google suggests alternative spellings when you search.

merge-duplicates-salesforce-contacts-step-1-simple-tab-email-only-similar-match.png

For example, if an Email address of “hueyy@coahulldu.co” is found, it could match records with Email values such as “hue.y@coahulldu.co, huey@coahulldu.co,” etc.

step-2-group-w-similar-match.png

Similar matching can be an effective way to find more duplicates, but you need to be thoughtful about how it’s applied, or you might match non-duplicate records and merge them together.

Similar Match uses more lenient criteria that cast a wider net for what can be considered duplicates. It is best to try Similar Match with very open and generic fields after trying everything else. 

When you do use it, make sure to carefully review the results to ensure the duplicates being identified are what you're expecting. For example, if you used Similar Match on a phone number, other contacts who work at the same organization might be mistakenly identified as duplicates because they have similar numbers. 

Master Record Selection and Data Retention Rules

When you merge duplicates, consider how the data will be merged.

Rules to Automatically Select the Master Record

Under Step 3 on the Master tab, you define how all of the matching duplicate groups should be merged at scale. To do this, you need to create a series of rules that tell Insycle how to select the record from each group to become the master. The master is the record that will remain after the merge.

For example, if you had four records representing the same company, they would make up one duplicate group with four records, all of which would be merged into one master record. The other three records would no longer exist.

The master record for each duplicate group is determined using rules via an elimination process. Rules are read in order, from top to bottom. As soon as a record meets one of the criteria, this becomes the master, and the rest of the rules are skipped. So if a record meets the first criterion (in the example below, the “highest number of marketing emails clicked”), it is chosen as the master.

merge-duplicates-hubspot-contacts-step-3-master-tab-7-rules-engagement-lifecycle.png

Rules That Determine Values to Keep

The Merge Duplicates module allows you to control the values saved in the master record after the merge, regardless of the default merge behavior. By adding each field you want to control the data retention for and selecting a Criteria and Condition, you can tell Insycle where the data for the field should be taken from and how to handle it.

Under Step 3, click the Fields tab. For each field you want to control the data retention for, you need to select a Field and tell it where the data for the field should be taken from. This is merged into the master. Any data that is not in the master or not copied to the master is deleted.

The Criteria dropdown gives you various options for choosing the data to keep, and the Group Fields let you keep values of multiple fields from the same record.

merge-duplicates-hubspot-contacts-step-3-fields-tab-10-rules-732px.png

Learn more about these options in the Master Selection and Field Data Retention Rules reference in the Module Overview article.

What happens to field data if I don’t create custom rules?

There is no need to create rules for every field in your CRM - Insycle automatically handles fields without specific rules using a "fill in the blanks" approach. When the master record has empty fields, Insycle copies values from the most recently updated record in the duplicate group where that data exists. For example, if the master record's Industry field is empty but another record in the duplicate group has an Industry value, that value will automatically be copied to the master record. This means you only need to create custom retention rules for the handful of fields that require special handling rather than setting up rules for all your fields.

Start in Preview Mode

When you click the Review button under Step 4, you can choose between two modes: Preview Mode and Update Mode.

merge-duplicates-step-4-review-preview-mode-tab.png

Preview mode does not update your live CRM data, but instead produces a CSV showing you the results that the deduplication process would have produced.

When you first run a new deduplication template, it's always a good idea to run it in Preview Mode to ensure that it works as you intended. Once you confirm that it is, you can run it in Update Mode or schedule it for automation.

Implement Automation

To save time and ensure your duplicates merge consistently, you can set up automated deduplication using scheduled templates or Recipes

Additionally, HubSpot users can integrate Insycle Recipes into HubSpot Workflows, and Salesforce users can use Recipes with Salesforce Flows. This enables event-based triggers, such as new-record creation or attribute changes, to ensure deduplication occurs in real time.

Automate a Single Template

First, return to the Template menu at the top of the page and click Copy to save your configurations as a new version of whatever template you started with. Then click the pencil to edit your new template name.

save-template-copy-and-rename.png

Under Step 4, click the Review button, and select Update mode.

On the When tab, select Automate, and configure the frequency you'd like the template to run. When finished, click Schedule.

merge-duplicates-step-4-review-update-automate-daily.png

You can view all your scheduled automations at any time on the Operations > Automations page.

Automate Multiple Templates in a Recipe

Recipes are collections of templates, organized into numbered steps, that can be executed in succession. You can add pre-built or custom templates to a Recipe.

Then, that Recipe can be automated to run all of those templates one after another, in order, on a set schedule. Recipes can be used to organize your processes, train your employees, or for use in HubSpot Workflow automation.

Once you have a good set of templates, navigate to Operations > Recipes.

To create a new Recipe, click the +New button.

recipes-new-button-w-arrow.png

Click the + Template button to add Insycle's default templates or your own customized templates to a Recipe.

recipes-add-template-button-w-arrow.png

Every Recipe can handle only one object type. Once you choose a template, the record object (contact, company, deal) becomes the type for the entire Recipe. So, if you add a contact template, you can only add other contact templates to the Recipe.

When finished, click the Save button in the grey menu bar.

recipes-save-button-w-arrow.png

Next, you need to schedule the Recipe to run automatically. Click the Review button and select Update mode. 

On the When tab, select Automate and configure the frequency you'd like the template to run. When finished, click Schedule.

recipe-review-update-automate-hubspot-daily.png

Additional Tips

  • Begin with easy-to-find duplicates. Iterate through fields and rules you know will surface duplicates. Don’t expect to resolve all your duplicates by setting up and running this process once. You will need to run this process multiple times for different fields or nuanced variations.
  • Save templates. Each time you get a Merge Duplicates process to run the way you want in your database, save it as a template. When you have a solid set of templates that reliably resolve most of your dupes, you can put them together as a Recipe that can run on a regular, automated schedule.
  • Look for edge cases that fall outside your standard rules. These may be templates that you run manually, allowing you to make adjustments based on your findings.
  • Do some experimentation. Use the Preview mode CSV report to analyze patterns in the duplicates. Add additional fields to the CSV by clicking the gear icon in Step 2 and including them in the Layout. You can learn what causes the duplicates and how to prevent them from occurring in the first place. You can also explore your data in the Grid Edit module to understand what you have, allowing you to design templates that capture all potential variations.

Advanced How-Tos

Backing Up Fields Before Merging

To back up select fields during your merge setup, include additional fields in the CSV report. This ensures you have the data later for undoing changes or for general review.

Under Step 2 in the Merge Duplicates module, click the icon-gear-18x18.png gear icon in the header.

merge-duplicates-step-2-gear-arrow-646w.png

On the Layout tab, add any extra fields to the Visible Fields list.

merge-duplicates-hubspot-contacts-job-title-layout-646w.png

The fields will be included in the CSV report.

merge-duplicates-hubspot-contacts-job-title-csv-646w.png

Alternatively, to capture all field data before running a large merge operation, you can export records directly from your CRM to a CSV file. This can preserve all the record details, which can help later for undo and/or just for general review.

Use a filter to work with a subset of your data

If your database is large or you're getting an overwhelming number of duplicate groups, use the Filter button in Step 1 to narrow down the records Insycle analyzes. Filtering to a subset — for example, contacts created in the last 30 days, or companies in a specific region — makes it easier to validate your configuration before running it across your full database. A filtered dataset also processes faster.

To add a filter, click the Filter button in Step 1, choose a field, select a condition, and set a value. The filter is applied before the matching step runs.

dup10.png

Most of the options in the Field dropdown match the fields that are found in your HubSpot records, and for contact records, there are three additional options related to the Email value: 

  • Email Username: The portion of the email address before the “@.” For example, if the email address were “maria@acmewidgets.com,” the username value would be “maria.” 
  • Free Email Provider Domain: Choose True to filter out records where the email domain is Gmail, Hotmail, Yahoo, or any of about 10,000 other free email providers. This filter helps ensure these are real clients, or can determine which record is the legitimate one, because it’s most likely that customer companies aren't using free Gmail accounts (though a contact may have accidentally emailed us from one at some point). 
  • Email Top-Level Domain: The top-level domain (TLD) is everything that follows the final dot of a domain name. For example, in the domain name acmewidgets.com, '.com' is the TLD. Some other popular TLDs include '.org', '.uk', and '.edu'. 
Start narrow, then broaden

When setting up your matching fields for the first time, start with your highest-confidence criteria — fields that, together, are very unlikely to match unless the records are true duplicates. Exact Match on email address plus first and last name is a reliable starting point for contacts. Domain Name is a strong starting point for companies.

merge-duplicates-hubspot-companies-step-1-similar-name-domain-exact-phone.png

Once you're confident your configuration is catching true duplicates without pulling in false positives, you can broaden your criteria — for example, by adding Similar Match to catch typos and slight variations: 

merge-duplicates-hubspot-companies-step-1-comparison-rules-646w.png

Or, by using Related Fields to compare values across two fields that contain similar data (such as Email and Additional Email).

merge-duplicates-hubspot-contacts-step-1-advanced-first-last-email+additional-646w.png
Understanding how master selection works

Insycle evaluates your Master tab rules from Step 3 in order, eliminating records that don't match each rule until only one remains. If multiple records still match after all rules are evaluated, no master can be determined, and the group will show an error in the CSV.

For example, imagine having four duplicate records for the same contact. (In this image, we are examining the records in the Grid Edit module.)

mceclip11.png

In the Merge Duplicates module, you have configured the first three Master rules based on email engagement metrics, but all four records have identical values of zero, so no records are eliminated. Your fourth rule checks for an active Contact Owner — three records have no owner, so they're eliminated. The one remaining record becomes the master.

This is why rule order matters. Place your most reliably differentiating rules — like record owner, lifecycle stage, or engagement activity — where they're most likely to yield a clear winner. If you're frequently seeing errors in your CSV, revisit your rule order and consider adding a tiebreaker rule, such as earliest Create Date or latest Last Modified Date, as a final fallback.

merge-duplicates-hubspot-contacts-step-3-master-tab-6-rules.png

In this CSV report example, you can see that the one record with an active owner was chosen as the master.

mceclip13.png
Granular Control for Picking Duplicate Records

For situations where there are no common rules you can apply for identifying duplicates for all or some of your records, you may need more granular control over which records are included or excluded from the process.

Bulk Solutions

There are two options for doing this in bulk. You can upload a CSV file of known duplicate record ID pairs directly into the Merge Duplicates module using the CSV tab in Step 1. This lets you bypass field-based duplicate detection entirely and work from a list of specific record pairs you've already identified. From there, you continue through Steps 2–4 to configure master selection rules and field retention settings, and then run the merge. 

For more complex scenarios — such as designating master records or excluding specific records from deduplication using custom attributes — you can also use the Magical Import module in combination with Merge Duplicates for complete control over the process. 

Learn how to customize merging duplicates in bulk using a CSV.

Single Record Solution

To do this one record at a time, you can use Manual mode of the Merge Duplicates module.

In Manual mode, you have complete control over which records are merged together by selecting them from the Record Viewer. Manual mode should be reserved for cases that require a careful, controlled process. Learn more about merging duplicates in Manual mode.

Frequently Asked Questions

How do I ensure the merged record maintains an active owner?

Currently, neither HubSpot nor Salesforce provides an automated way to prioritize active owners during the merge process. You'd need to verify owner status manually for each merge operation.

However, Insycle's Merge Duplicates module includes an option to prioritize an active owner. 

First, you could add a Master rule under Step 3 to tell Insycle to select the record from each group with an active owner as the master record.

Add a rule with the following parameters:

  • Field: Record owner
  • Condition: active user

merge-duplicates-hubspot-contacts-step-4-record-tab-active-owner-646x247.png

Second, you could create a Field rule to retain the owner who is an active user.

Add a rule with the following parameters:

  • Field: Owner
  • Criteria: From record where value
  • Condition: active user

merge-duplicates-hubspot-contacts-step-4-fields-tab-active-owner.png

Insycle is having trouble determining a master record. What could be causing this issue?

If the Message column of the CSV report displays this error:

Change rules in Step 3 'Master Selection'. Failed to pick master record because multiple records (X) meet the selection criteria. In 'Master Selection', change, add, or reorder the rules such that only one record matches (if cannot determine master based on field values, use 'Record ID is lowest' as the last rule).

None of the records meet more of the rules than the others do.

There are a couple of things you can try to resolve this:

  1. On the Master tab in Step 3, experiment with reordering or adding additional fields that are likely to have unique values.
  2. At the bottom of the Master tab in Step 3, ensure By Priority is selected, not Absolute.merge-duplicates-step-3-by-priority-match-w-arrow-646w.png
    With By Priority, your master record only has to match one rule. Using Absolute, your master record would have to meet all of the rule criteria. In most cases, it is best to select By Priority.
    If By Priority was used, then none of the records in the duplicate group meet any of the criteria on the list more than the others. In this case, you'll need to experiment, reordering or adding additional rules for fields likely to have unique values.
  3. As a last resort, you can add a rule on the Master tab in Step 3 that says Record ID is lowest, or Create Date is earliest.merge-duplicates-hubspot-contacts-step-3-master-tab-last-resort-rules.png
How do I ensure that I am not merging non-duplicate records together?

There are two ways to ensure the records you are merging are indeed duplicates.

First, always run your deduplication templates in Preview Mode before running them in Update Mode. This produces a CSV file showing how your records would have been merged. Then you can ensure that your Merge Duplicates template is working as expected and not merging non-duplicate records together.

Additionally, to ensure a smooth merge, consider narrowing the matching settings in Step 1. Try the Exact Match Comparison Rule instead of Similar Match. Then make sure that you are using actual, uniquely identifying fields—first name, last name, email, and phone number are popular choices. The more tightly defined your filter is, the less likely you are to merge non-duplicate records.

I have a list of duplicates I need to merge. Can I deduplicate them using Insycle?

Yes. You can use an existing list of duplicates with Insycle to deduplicate it in bulk, following these steps:

  1. Prepare a CSV file with columns for the record IDs and a "Merge Master" column. In the "Merge Master" column, mark which record should be kept after merging.
  2. Create a custom field called "Merge Master" in your CRM.
  3. Use the Magical Import module to import your CSV file into the CRM, updating the "Merge Master" field for the relevant records.
  4. Go to the Merge Duplicates module and set up a filter to select records based on the "Merge Master" field.

Learn more about deduplicate records using a CSV.

Can I select which data is retained in my master record on a field-by-field basis?

Yes, Insycle allows you to select which field data to retain in the master record using the Fields tab in Step 3. See the Merge Duplicates module field data retention rules reference for more details.

merge-duplicates-hubspot-contacts-step-3-field-tab-donated-owner-phone-lifecycle-IDs-646w.png
My team needs to review and approve the master. Can I accommodate that with Insycle?

Yes, there are several ways to share details and get approval before merging duplicates.

You can manually approve master records and mark them in a CSV file, then use Insycle to bulk deduplicate into those master records. See the Customize Bulk Deduplication Using Exclusions and Pre-Defined Masters article to learn more.

Or, you can run the Merge Duplicates module in Preview Mode and then deliver the preview CSV that Insycle generates. The CSV report includes your entire merge operation down to individual duplicate groups, but does not update your live data. Then your team can approve the merge based on this report, before running Merge Duplicates in Update Mode.

Additionally, team members can review duplicates and manually select the master for each record under Step 3 by selecting Manual mode. Review the Manually Merge Duplicates article for more details.

merge-duplicates-hubspot-contacts-step-3-manual-646w.png

Additional Resources

Related Help Articles

Related Blog Posts