Last night I booked a flight from Christchurch to London return online. It's around 28 hours each way in two sectors. The booking site offered dozens of routes and I eventually selected my flights based on seat availability, journey duration, transit time and cost. When I went to pay, the booking site rejected my payment saying one of the flights was fully booked. So what just happened?

This is an example of what's wrong with syncing - the working data becomes stale and is only accurate for a few seconds after syncing the data from the real source i.e. the flights I selected were already booked. Frustratingly, I had to "book" two more flights before I was able to find an available flight that I could pay for.

I have no idea how often flight information is synced to booking sites but appreciate that the complex algorithms need a local dataset to provide a tidy list of flight options in a timely manner. So, yes, there may be good reasons to sync your data between two systems but there is a cost in this approach - bad data. 

Syncing data

Another common syncing scenario is collecting web form field values and submitting them to a CRM.  Can syncing bite us in this scenario? Well yes! Imagine a best-case scenario - your marketing pitch on social media goes viral and visitors inundate your website and complete lead forms. In the CRM, all is quiet - nothing has been synced and the sales team goes to lunch. Sometime later, say 3 hours, the sync task fires and creates 1000 new leads in the CRM. Aside from the delay this creates in responding to queries, any changes to an existing CRM record made during this interval will be overwritten by the sync.

Syncing creates other issues too! Consider a lead form being completed by an existing contact in the CRM. The individual may have previously been converted from a lead. The sync task will look for an existing lead, not find one so will create a new lead. The visitor now exists in the CRM as a lead and a contact.

In another example, a lead form has multiple check boxes that allow visitors to select their email subscription preferences. Ideally, the checkboxes should be pre-populated when the form loads but syncing does not support this. The visitor is unaware of their current subscriptions.

Another challenge is creating related objects in the CRM. Before creating a child object the identifier (ID) of the parent object must be known and referenced in the child. This requires a level of complexity beyond most syncing engines.

It is undesirable to sync an entire database to a destination system each time a sync occurs. To avoid this, syncing engines look at the "Last Modified" date of records in either system to determine if the record has changed and therefore needs to be synced. This process introduces the following anomaly. If any object's schema is changed, all records will be synced for that object. For example, it is common for a CRM field (not value) to be created, edited, or deleted. If this occurs, the last modified date of every record will be changed (and therefore synced) yet the record data may not have changed.

Probably the worst attribute of syncing is the degradation of data quality. There are two ways this happens:

  • Syncing has no ability to pre-test the destination data before it overwrites it i.e. there is no opportunity to apply business logic when creating or updating. For example, populated record fields can easily be overwritten with blank values
  • Record updates made immediately prior to a sync are lost. This creates unknown data states in both systems i.e. there is no single source of truth

So how do we overcome these challenges? The solution is to create a single source of truth and always directly read or write data to/from the source using real-time transactions. This eliminates the gotchas associated with syncing and is highly effective when accessing simple records from large databases.

Is real-time transacting THAT amazing? It can be but there are some situations where data duplication is required. The flight booking situation is an example where complex logic needs to make multiple calls to a remote data source. While a couple of remote calls are fast, the processing time can become intolerable if a dozen or more calls are required. Unlike syncing solutions, which are often generic API mapping matrices (or any-to-any solutions), real-time integrations tend to be purpose-built for the technologies being integrated. They often have components in both systems.

FuseIT specializes in Sitecore to Salesforce integration. Our S4S integration is a real-time solution that maps web forms to Salesforce and enables the implicit and explicit personalization of Sitecore from Salesforce. Please contact us for more information or to see a demo of this in action.