Know your data

You cannot change the data you need to migrate, but you can determine exactly the problems that need to be handled in order for the data to be useful once the migration is completed.

I have listed some of the main problems you will face when moving data from one system to another.

Conflicting point of view

Conflicting point of view is found when the old and the new system represents data in different ways. One system might represent an order as a set of order-lines with full reference to product objects in the system, other systems might represent the same data as just a text field.

Some systems have strict views on data integrity and rigorously enforce mandatory fields, referential integrity and compliance with rules. Other systems might have a more casual approach that allows users to override the standard settings.

Depending upon the format you receive your source data in, you will need to identify these issues and resolve the differences.

Duplicate content

A friend of mine was fond of telling me a story about a telephone company that had used a free-text field for addresses in its IT system. This field was manually filled out each time a phone was sold to a customer. During the construction of a new CRM solution, they found that a single customer had more than 10.000 different representations of its address.
Misspellings, address-changes and just plain stupid data-entry had accumulated during the years.

You cannot change the past, but you will have to shine a light on problems like this in order for people to understand the complexity of the task you have been given.

In the case of the telephone company, the right solution would probably be to ignore the address data in the old system and get the information from another source.

Business processes

Business processes also impact data in the IT system. If the new and old system supports different ways of conducting business you are sure to find differences in the data model. To take the classical example of Netflix and Blockbuster, I am sure that they both had a data model for movies, but I am also quite sure that it is not identical.

Changes in data due to differences in processes are hard to spot without knowing the context the data is used in. You will need to talk to the business and process owners to understand how the new system intends to use the data.

Historical artifacts

If you need to migrate historical data you will typically discover that the data will contain more and more variance the further back in time you go. One of the reasons is that the world is changing:

  • Companies change names, merges or go out of business
  • Countries change their currency
  • Tax and audit rules evolve.

Is it really necessary for the new system to be able to support all this now deprecated data? Do you really want your new system to have a GUI where you can select “German D-Mark” even though Germany changed currency to the euro more than 10 years ago?

You will need to determine what kind of data will become first-class citizens in the new system. Most data will just become static read-only data that cannot be edited but is there for reference.

Artifacts from system maintenance

Lifecycle management of the old system contributes to the problem. As business requirements changed over the years, the old system evolved. The people previously tasked with fixing the system will have been focused on making the new stuff work, not about leaving behind pristine historic data.

Columns or entire tables in the database might now be obsolete and not used by the application. Names in the GUI might no longer correspond to the names in the database.

It is a classical mistake to confuse data entities due to name changes. Testing is the solution, make sure your migration is easy to test by nontechnical people, they will spot your mistakes.

Audit data

Who did what, when? In some systems, audit-data will comprise 90% of all the data in the system.

Don’t migrate audit data, you are better off leaving data in a read-only data dump rather than do a full data integration in the new system.