The fusionSpan Blog

How to Effectively Compare Data from Two Different Salesforce Systems

By Alan Luo |October 14, 2021
Best Practices

Comparing data between systems is necessary for businesses that are looking to migrate, adjust or back up data. The issue, however, is that data comparisons are tedious, repetitive, and time-consuming. Users have to detect changes that occurred in multiple cells under the same record, with potentially millions of records existing in the compared objects.

In a member-based organization, associations need to compare contacts, sales, orders, memberships, and objects before implementing a data migration. The objective is to avoid data duplication, as duplicate records can cause record issues such as membership miscalculation, address errors, mis-identified contacts, and more.

The good news is that there are code packages specifically designed to handle data comparison. At fusionSpan, our analysts regularly use one of these packages to speed up data comparison, ensuring that the process is kept simple.

In the following paragraphs, I will describe what steps we take here at fusionSpan to complete our data comparison. Our example compares data from two Salesforce systems.

New call-to-action

 

Step 1: Downloading and Prepping the Data

Say we have two Salesforce systems, one new and another old. We want to compare the data between the new system and the old system. The first step is to download data from both systems. There are multiple ways to download data from salesforce. To ensure that we get all the fields from the objects we export, we usually use the native Salesforce “Data Export” function.

After obtaining the data from both systems, we need to define/select which fields we would be comparing. Then we remove all the fields we don’t need. This can be done in Excel by manually deleting columns or done in R or Python by subsetting the data.

Another thing to take note of is that the new system must have an External ID that maps to the old system. For instance, if the old system contact ID was “abc”, but it was changed to “def” when that same record was brought over to the new system, the new system should have an “External ID Field” that stores the old system ID “abc”. Only with this field will comparing/updating records between systems be possible.

Step 2: Using CompareDF package from R

CompareDF is an MIT Licensed software package developed by Alex Joseph and a group of data enthusiasts. The package is used to easily compare two different datasets using R. This R package is useful in that it provides a comprehensive report as to what changes were made on the datasets, cell by cell. The image below is a Rscript that compares Salesforce Contact Data, it loads the data, finds the common key for records, then does the comparison:

r script

In our example, we are trying to figure out which contacts had their state and phone number changed between 2019 and 2020. In the screenshot below you’ll see contact data for 2019. I’ve determined that id, first_name, last_name and email are unchanged, so the only columns I need to focus on are “state” and “phone_number”.

comun script r

After running the script, an excel file is generated that contains the report.

compare r script

Step 3: Interpreting the Compare Report

In the compare report shown below, the chng_type “+” sign means the record exists in the new contact file, while the “-” sign indicates that the record exists in the old contact file.

compare data

Red and green symbolize there being a difference between the datasets. We can see that the first record Denna Allington changed her state and phone_number between 2019 and 2020. Originally living in Pennsylvania, she moved to Alaska, with her phone number changing from 3100 to 3131.

The second record shows that Gian McNeill changed his state from California to Colorado, but didn’t change his phone number. At the bottom of the report. We can see that only green text exists, meaning that 14-18 were new records that only existed in the new contacts.

The comparedf package has successfully compared each record cell by cell and color coded the differences. The script is easy to write and execution time is short.

Most of the time on this project is spent on cleaning and auditing the data. Salesforce objects usually have a lot of columns, and the user needs to decide what fields they’d want to compare. A lot of the cleanup and audit process can be automated in R.

New call-to-action
Alan Luo

Alan has been working in the field of Ecommerce for 4 years and has extensive knowledge in Online Marketing and Data Analytics. He has practical experience scraping and cleaning data and has tackled a few data science projects in his career. During his free time, he would listen to 80s pop and J-rock, or would try a few riffs on his guitar.

More posts