How to Merge Duplicates across Salesforce and HubSpot While the Sync Is Active
Your company is using both Salesforce and HubSpot and is running into problems with duplicate records in one CRM or the other, and sometimes both.
You have a data sync set up between the two systems but don't know how to deduplicate effectively to ensure the cleanup effort is consistent across CRMs. In addition, you can't deduplicate companies from within the HubSpot app while the sync is active.
With Insycle's Merge Duplicates module, you can merge duplicate people or companies into the same master record in both CRMs and ensure the records stay synced moving forward in bulk and automatically. This can be done when the Salesforce-HubSpot sync is active, and the master records will continue syncing after the merge. You can also control the merge process by defining rules for picking the master record and which field values should be retained.
Process Summary
- Set up your Salesforce-HubSpot sync settings.
- Create a custom field in each CRM to identify the master record.
- Deduplicate your Salesforce records.
- Deduplicate your HubSpot records.
Step-by-Step Instructions
First, make sure your settings to sync Salesforce with HubSpot are set up for this process to work. This is a required step.
In HubSpot, navigate to Settings > Integrations > Connected Apps > Salesforce "Actions" Button > Go to settings.
On the app settings page, click the Object tab.
To ensure that duplicates are not automatically deleted when you merge in Salesforce, under Deleting [Objects], verify that the settings are as follows:
- When a Salesforce contact is deleted → Do nothing In HubSpot
- When a Salesforce lead is deleted → Do nothing In HubSpot
To label records that are deemed the master for each set of duplicates, you'll need to create a custom field in both platforms. In each CRM, Salesforce and HubSpot, add a custom field named “Deduplication Master Record.” This must be added to any synced record/object type you plan to deduplicate.
Insycle will automatically populate this field with the correct value. To prevent users from accidentally changing its value, you may want to hide this field from the default layout or make it non-editable from the view.
Add the Custom Field in Salesforce for Each Object Type
In Salesforce navigate to Setup > Objects and Fields > Object Manager. Select the object type, click Fields & Relationships, then click the New button.
Enter the following properties:
- Data type: Checkbox
- Field Label: Deduplication Master Record
- Default Value: Unchecked
Repeat these steps to add the Deduplication Master Record field to each object type synced with HubSpot that you'll need to deduplicate.
Add the Custom Field in HubSpot for Each Object Type
In HubSpot, navigate to Settings > Objects > select the object type > Manage [object] properties, and click the Create property button.
Enter the following properties:
- Label: Deduplication Master Record
- API name: deduplication_master_record
- Data type: Single checkbox
Repeat these steps to add the Deduplication Master Record field to each object type synced with Salesforce that you'll need to deduplicate.
Set Property Mapping for Deduplication Master Record Field
Next, you need to set the object settings to copy the value of the custom field from Salesforce into HubSpot (one way).
In HubSpot, navigate to Settings > Integrations > Connected Apps > Salesforce, and select the object type tab. Click the [Object] property mappings tab.
Click the Add new field mapping button and use the dropdown menus to select the "Deduplication Master Record" HubSpot property and Salesforce fields.
For the Sync Rule, select Always use Salesforce.
Follow the same process for each synced record/object type.
Now, you can start merging Salesforce duplicates with Insycle.
Navigate to Data Management > Merge Duplicates, select the Salesforce database and record type.
In Step 1, choose Salesforce fields and criteria the values must meet to be considered a duplicate.
In the example below, we are looking for Salesforce contacts with the exact same First Name and Last Name and Email Domain.
Under Step 4, configure the rules that specify which record from each set of duplicates should become the master—the record that will remain after the merge that all the other duplicate records will merge into.
After Insycle has identified the master record, it will use the selection rules from the Field tab to automatically pick which values from a duplicate group will be used in the master record.
For each field you want to control the data retention for, you need to tell Insycle where the data for the field should be taken from. This is merged into the master. Any data that is not in the master or not copied to the master will not be kept when the records are merged.
As part of the merge process, Insycle will automatically populate the Deduplication Master Record field with the value “True” for the record that is chosen as the master.
For further instructions on configuring your deduplication, see the Bulk Merge Duplicate People, Companies article.
To finish deduplicating your Salesforce records, continue with step 5. Preview Deduplication Changes, then Apply Merge to CRMs below.
After the merge in Salesforce, the Deduplication Master Record values will automatically sync from Salesforce to HubSpot. This field can then be used to identify the same record as the master in HubSpot.
Under Step 1, use the same criteria as for the Salesforce deduplication to determine which HubSpot records should be considered duplicates.
Under Step 4 on the Record tab, configure one rule—records with a Deduplication Master Record value of, "Yes" (or "True," depending on the setup) should be selected as the master.
This will ensure that the master record on HubSpot aligns with the master record on Salesforce. Due to the sync, the “Deduplication Master Record” value is available in HubSpot.
Preview Merges in CSV Report
After you have the deduplication rules set up for each CRM, you should preview the changes you are making to your data. That way, you can check to ensure your merge configuration is working as expected before those changes are pushed to your live database.
Under Step 5, click the Review button and select Preview mode.
Click the Next button to go to the Notify screen, where you can select recipients for the email report. You can also add additional context to the message.
On the When tab, click the Run Now tab, and select which records to apply the change to (in most cases this will be All), then click the Run Now button.
Insycle will generate a preview CSV and send it to your email. Open the CSV file from your email in a spreadsheet application.
The Duplicate Group ID indicates which records will be merged together.
The Status column indicates:
- Duplicate – The record is part of a duplicate group.
- Master – The master record chosen for the duplicate group based on default behavior and your Record rules. Review the selections in this row to determine whether the appropriate records are being chosen.
- Master (After) – This appears only if at least one or more fields have been specified in Step 4 on the Fields tab. For each duplicate group, the Master (After) row shows the values the final record will contain based on your Field rules and the default behavior.
- Error – If Insycle is not able to determine which record would be the master, an error message will appear here. See the Troubleshooting section below for more detail.
If everything looks good, return to Insycle and move forward with applying the changes.
Apply Changes to Your CRM Records
When you're satisfied with the results in your preview, you can merge the records in your CRM.
Under Step 5, click the Review button, and this time select Update mode.
On the When tab, you should use Run Now the first time you apply these changes to the CRM.
Save Templates and Setup Automation to Maintain Formatting
After you've seen the results in the CRM and you are satisfied with how the operation runs, you can set up ongoing automated deduplication for Salesforce and HubSpot records with Insycle templates.
With automation, you'll save time and ensure that Salesforce and HubSpot are consistently deduplicated while keeping the sync active.
Advanced How-Tos
Deduplicating across Salesforce and HubSpot while the sync is active is a bit tricky.
When you merge duplicates, you merge the records into a single master record. If you merge records in two platforms, you must ensure that all duplicates in both platforms are merged into the same master record synced between HubSpot and Salesforce. When the master record differs, this breaks the sync between the two platforms.
By creating a custom field synchronized across both CRMs that effectively says, "This is the master record!" you ensure both CRMs use the same master record when the deduplication process is run in Insycle.
The CRM you deduplicate first will set the master, so you'll set up complete master selection rules in Step 4 of the Merge Duplicates module. When you run this merge duplicates operation on the first CRM, Insycle will set the "Deduplication Master Record" field value to "True/Yes" on the record identified as master.
When you run the merge operation on the second CRM, all you need to select the same master record is the "Deduplication Master Record" value of "True/Yes."
Use the filter to work with a segment or smaller pool of records. Then Insycle will only analyze the remaining records for duplicates. To add filters, click the Filter button, then choose the field to look at, select the condition, and set the value to look for. The filter is applied before the matching step runs.
You may want to use a filter if:
- You know you only want to work with a subset of your data. In this case, there’s no need to run the operation on your whole database.
- There are an overwhelming number of duplicate results. Add a filter to work with a reasonably sized subset while you work to get the configuration right.
- You want the operation to run faster. A refined segment can speed things up since there are fewer records to analyze.
Most of the options in the Field dropdown match the fields that are found in your CRM, and for contact records, there are three additional options related to the Email value:
- Email Username: The portion of the email address before the “@.” For example, if the email address were “maria@acmewidgets.com,” the username value would be “maria.”
- Free Email Provider Domain: Choose True to filter out records where the email domain is Gmail, Hotmail, Yahoo, and about 10,000 other free email providers. This filter helps ensure these are real clients, or can determine which record is the legitimate one because it’s most likely customer companies aren't using free Gmail accounts (though a contact may have accidentally emailed us from it at some point).
- Email Top-Level Domain: The top-level domain (TLD) is everything that follows the final dot of a domain name. For example, in the domain name acmewidgets.com', '.com' is the TLD. Some other popular TLDs include '.org', '.uk', and '.edu'.
Each row in your matching fields setup is cumulative, so records must meet all of the criteria. For example, looking for records that have the same First Name and Last Name and Phone Number returns only results where all three values are the same.
The minimum required length for the matching values is four characters or more. Values such as "Joe" or "Ace" will be disregarded.
Pick a field that you think has some duplicate values.
Running a very simple match operation like just First and Last Name is okay for giving you an idea of what you have, but it is too broad to use for reliable analysis and deduplication. There may be legitimate duplicate names–different people with the same first and last name. You need additional, unique criteria to narrow it down.
Choosing Unique Identifiers
Matching duplicates requires unique identifiers—data that is unlikely to be shared by any other record unless it is a duplicate. If you don't use unique identifiers, you are likely to identify unrelated records as duplicates and may accidentally merge them.
Many CRMs match first names, last names, and email addresses. If all of those match, or are similar, you can confidently determine that the record is a duplicate.
Other unique identifying fields that are commonly used in deduplication include:
-
- Phone number
- Domain name
- Mailing address
- ID number
Define what kind of likeness to look for when deciding if field values should be considered a match.
It's a good idea to start with Exact Match and easy-to-find duplicates. Iterate through fields and rules you know will surface duplicates, then look for edge cases. Similar Match can be helpful for finding those.
- Exact Match looks for values that match exactly, with no differences from one record to the next. Any unique identifying fields should use Exact Match.
-
Similar Match looks for values that may be close but with a one-character difference (like a typo, extra character, or missing character) and broadens the search. This search behaves like when Google shows results for a slightly different term, or says “Did you mean...”
For example, if a Company Name of, “Acme” is found, it could include records with the Company Name values “Akme, acm, Acma,” etc., as a match.
Similar Match uses looser criteria that cast a wider net for what can be considered duplicates. It is best to try Similar Match with very open and generic fields after trying everything else. When you do use it, make sure to carefully review the results to ensure the duplicates being identified are what you're expecting.
If using ID fields to identify duplicates, note that they will only work with Exact Match, not Similar Match.
Specify parts of a field value to ignore, such as specific text, whitespace, or characters. These will not be considered part of the matching process.
- Ignore Symbols and Whitespace when comparing phone numbers.
- Ignoring HTTP, www, subdomain, or top-level domain (.com vs co.uk) when comparing websites or email domains is a great way to catch more advanced duplicates.
- Insycle comes preloaded with terms to ignore. If you select Common Terms, click the Terms button to view and edit this list on the Common Terms tab.
- If you select Text (substrings), click the Terms button, then the Ignored Text tab, and enter text to be ignored. Separate multiple substrings (or phrases) with a new line.
Note: If you’ve set up Ignored terms or strings, don’t forget to also enable them. Select the Ignored > Common Terms or Text (substrings) checkbox.
Define specific portions of the field value to compare.
Compare the entire value, the first word, any two words, just the first five characters, last nine characters, etc.
Sometimes, you might want to match duplicates using data in two separate fields. For example, you might want to compare your Phone Number field to a Mobile Phone Number field to identify duplicates.
Using the Related Fields feature, you can use two different fields (that contain similar data) as matching fields to catch more duplicates.
You can set up Related Fields in the Advanced tab.
Common Examples of Related Field Matching
Matching Field | Related Fields |
---|---|
Business Phone | Mobile Phone, Other Phone |
Email Domain | Website, Company Domain |
Address | Company Address |
When using two or more fields to identify duplicates, records can still be considered matches even if one of the field values is blank. You just need to specify which field(s) allow a blank value.
Under Step 1, configure your matching rules in the Simple tab, then click the Conditions tab.
All the matching fields you included will automatically appear with the Value Required in All Records condition selected. Change the condition to Empty Allowed in Any Record to allow empty values for certain fields. You can also use the At Least One Record with Non-Empty condition to help you determine which is the master record. Make sure at least one field remains required and is a reliable unique identifier to ensure the records are really duplicates.
For example, on the Simple tab, you may have the matching fields: First Name, Last Name, and Phone Number. But on some of your records, the Phone Number field may be empty. Using the Condition, Empty Allowed in Any Record, or At Least One Record with Non-Empty, all records with the same name, same phone number, and no phone number will be considered duplicates.
When customers encounter an issue when trying to make a transaction, they often seek help from one of your support channels. However, whenever a contact is created from a chat, like Facebook Messenger, Hubspot Chat, and others, very little information is provided—usually just a name and timestamp. This makes finding other instances of the same contact, such as their customer record, difficult.
With the Merge Duplicates module, under Step 1, you can use the Conditions tab to match contacts with the same name that were created or modified within the same period of time.
First, select the fields in the Simple tab. Then, on the Conditions tab, select the Within Timeframe condition and set the Minutes, Hours, or Days criteria.
When setting up your Salesforce deduplication process for contact records, it's often useful to pick master records based on engagement. For example, the highest number of email clicks, or the most recent email opened. You can also use other statuses to pick a master record such as the furthest along in your sales lifecycle, or the most recently updated record.
For accounts, it's often useful to use associated records to determine the master record. For example, the highest number of associated contacts or deals.
When setting up your Hubspot deduplication, you'll use the Deduplication Master Record field to match the Salesforce master selection.
Priority Match: Looks through the master selection rules in order, one by one. As soon as a record meets one of the criteria, Insycle makes the master selection and skips the rest of the rules on the list. The vast majority of duplicate templates should use Priority Match.
Absolute Match: The master record must meet all of the listed rules in the Record tab in Step 4. If a record does not match every rule listed, no master record will be identified. Absolute Match is appropriate for less flexible master selection.
For example, if a company wanted to ensure the chosen master record is in their sales pipeline and already has a sales rep working the record, they can choose Absolute Match and set the Record rules:
- Customer Priority is High
- Contact Owner exists
Choosing Absolute Match can often result in no master record being identified since the record has to match every rule listed, so in most cases, you should select Priority Match.
Though it's possible that duplicate records may be exactly the same, often there is only partial data overlap between them. When data is split between two different records, both records may contain unique and important information about the customer you'd like to keep.
By default, Insycle will keep the master record values; if the master field is blank, the value from the most recently updated duplicate will be used. The Merge Duplicates module allows you to control the values saved in the master record after the merge, regardless of the default merge behavior. By adding each field you want to control the data retention for in the Fields tab under Step 4 and selecting a Criteria, you can tell Insycle where the data for the field should be taken from and how to handle it.
On the Fields tab under Step 4, the Criteria dropdown gives you various options for choosing the data to keep, and Group Fields lets you keep values of multiple fields from the same record:
- From master record – Use the value that exists in the master record.
- From master record (even empty) – If the field on the selected master record is blank, keep it that way. Don’t automatically fill it in with a value from the most recently updated record.
-
Most frequent value* – If the same value appears in multiple records, use the one that appears most frequently.
- From record where value – Select data from one of the records in the duplicate group based on the values. These options vary depending on the field type. For example, retain the data from the Annual Revenue field that has the highest value.
- From record based on other field value – Look at the value in a different field to decide which value from the duplicate group should be kept. The example above highlights how a Last Modified Date value can be used to determine which Account Owner value to use.
- Combine and append all values* – You can merge the values from the selected field for all records in the group. For example, if there is some type of Notes field, you could keep the notes from all of the records in the duplicate group.
- Collect all values from other field* – Select a destination field to copy and combine values into, then select what field the data should come from for each record in the duplicate group. For example, this could be used to keep the read-only Record ID values of all duplicates in a group and combine them into a custom field.
- Collect non-master values from other field* – Aggregate the values of all the duplicates that are not the master and not the same as the master, meaning all instances of that value are excluded from collection. This can be especially helpful if you want a record of the object IDs that were removed, so you can also remove them from another system. Select a destination field to copy and combine values into, then select what field the data should come from.
* Indicates Criteria options that do not allow you to Group Fields.
For example, if merging Salesforce accounts, you may want to save all of the Account IDs from records that are merged together and deleted. You can add a new custom field, “Merged Account IDs” to your CRM.
Then in the Merge Duplicates module under the Fields tab of Step 4, add a rule to override the default merge behavior. Select the "Merged Account IDs" field, the "Collect non-master values from other field" criteria, and "Account ID" as the other field.
You can use the Preview to see how this will preserve the Account IDs of all the duplicates in each duplicate group.
You could set multiple criteria for the same field, which can be useful when establishing a hierarchy for selecting the value so "Value A" takes precedence over "Value B."
In this example, two different rules are defined for the 'Lifecycle Stage' field. The first rule states that if a record has the value "Customer," that value will be merged into the master record. However, if no records match the "Customer" criteria, the second rule for "Opportunity" comes into play.
Troubleshooting
Most of the time when Insycle can't find duplicates, it is due to your matching rules in Step 1. To better understand how to set up your rules, it is important to analyze the underlying data. A useful exercise can be to set up a simple filter to look for exact matches of First Name and Last Name.
When you click Find, these rules can show you a broad overview of what duplicates are potentially in your database and what fields might be useful to include in your matching fields. These settings are just for discovery and should not be used for a final merge operation; many people can have the same first and last names and are not duplicates.
To get further context, on Step 2, click the layout gear button on the right side of the title bar. Here, you can add any field in your database as a column to the duplicate group review to better understand the data inside these records.
If the Message column of the CSV report displays this text:
Change rules in Step 4 'Master Selection'. Failed to pick master record because multiple records (X) meet the selection criteria. In 'Master Selection', change, add, or reorder the rules such that only one record matches (if cannot determine master based on field values, use 'Record ID is lowest' as the last rule).
This means that based on all the rules, Insycle could not figure out which record in the duplicate group would be the master. None of the records meet more of the rules than others.
There are a few things you can try to resolve this:
- Under Step 4, on the Record tab, experiment with reordering or adding additional fields that are likely to have unique values.
- In the Step 4 heading, check to ensure that you have Priority Match selected and not Absolute Match.
With Priority Match, your master record only has to match one rule. Using Absolute Match, your master record would have to meet all of the rule criteria. The majority of the time, it is best to select Priority Match.If Priority Match was used, then none of the records in the duplicate group meet any of the criteria on the list more than the others. In this case, you'll need to experiment with the Record tab, reordering or adding additional rules for fields likely to have unique values.
- As a last resort, you can add a rule on the Record tab of Step 4 that says Record ID is lowest, or Create Date is earliest.
It can take a while for Insycle to find and match duplicates if the fields being used to identify them have very long values. The longer the values, the longer it takes Insycle to process the data and generate the results. This might come up when looking for matches based on long ID numbers, LinkedIn bio links, or other URLs with long strings attached (ex, https://www.linkedin.com/in/svadin%C3%ADr-n%C4%9Bmec-1234b31a3/).
You can speed this up by limiting how much of the value Insycle looks at.
If the beginning or ending portion of the values are all unique, you can limit the comparison to the first or last several characters using the Match Parts parameter under Step 1.
Or use the Ignore Text (Substrings) parameter, then click the Terms button.
On the Ignored Text tab of the popup, add the common portion of the URL or text string.
Tips for Bulk Merging Duplicates
- Begin with easy-to-find duplicates. Iterate through fields and rules you know will surface duplicates. Don’t expect to resolve all your duplicates by setting up and running this process once. You will need to run this process multiple times for different fields or nuanced variations.
- Each time you get a Merge Duplicates process to run the way you want in your database, save it as a template. When you have a solid set of templates that reliably resolve most of your dupes, you can put them together as a Recipe that can run on a regular, automated schedule.
- You may also need to look for edge cases that fall outside your standard rules. These may be templates you run manually so you can make adjustments based on what you find.
- Do some experimentation. Use the Preview mode CSV report to analyze patterns in the duplicates. You may learn what is causing the duplicates and learn how to avoid having them in the first place. You may also want to explore your data in the Grid Edit module to understand what you have so you can design templates that catch all potential variations.
Frequently Asked Questions
Yes. You can use Insycle to merge records while the sync is active, even though this isn't supported within the HubSpot app. To learn more, see the article Deduplicate HubSpot Companies and Salesforce Accounts.
No. You can deduplicate either Salesforce or HubSpot records first—the "Deduplication Master Record" field will be populated automatically.
Yes. The Deduplication Master Record field is a key requirement for deduplicating across Salesforce and HubSpot without breaking the sync. Keeping the master records consistently labeled across both platforms is how you are able to keep the sync active.
Yes, if your HubSpot objects have attachments, the attachment will be merged into the master record. Note though that there may be a short delay before the attachment appears in the merged record.
When merging HubSpot contact records using the “From master record (even empty)” data retention rule, the property history in HubSpot shows that Insycle set the value to “empty.” This is a nuance of how HubSpot manages the history of empty values. You can verify that the master record value before the merge was indeed empty by reviewing the Activity Tracker report in Insycle.
Additional Resources
Related Help Articles
- Deduplicate Across Salesforce Leads and Contacts
- Deduplicate HubSpot Companies and Salesforce Accounts
- Bulk Merge Duplicate People, Companies
- Salesforce Merge Duplicates Overview
Related Blog Posts
- How to Merge Duplicates in HubSpot and Salesforce and Keep them Syncing
- Salesforce Duplicate Management: How to Automate Salesforce Deduplication
- Hidden Duplicates: 11 Advanced Ways to Identify & Deduplicate Customer Data
- Data Duplication and HubSpot: Dealing With Duplicates and the Impact They Have on Your Business