Datamigration KPI

Everyone wants to track the progress of software projects, but how do you know when you are done with a migration project. Let me share my thought on how to track and predict progress in migration projects.

Framing your analogy correctly

I recently talked with my friends at Hopp.tech and was reminded about
a common pitfall that people unfamiliar to migrations often stumble upon.
It is a perception mistake, where you build a wrong analogy and this analogy leads you to focus on the wrong aspects of the project.

It is easy to think of the data in the source system as an object that you have to move/migrate from A to B. That puts the emphasis on the data as an object of value in it self, but that is not the case.

You want to move business processes that supports real customers, the data is just a means to an end. The business processes are going to be performed on the target system, not the source system so your focus should be on the target system.

The data in the source system only need to be moved to the extent that it is needed to support business processes on the target system.

Estimates and tracking progress
I work on complex migration projects of proprietary legacy systems where each migration has no prior knowledge to base estimates on. This makes tracking progress complicated and fault prone. But if you remember that the aim is to support business processes in the target system you can build
estimates around that.

Track progress by meassuring your ability to service the customers from the target system.

“%customers” – How many real customers can be serviced by processes through the new system.

While this is the ultimate aim the metric is not great for tracking progress in the beginning of the migration. “%customers” will hover around 0% for a while until you have a baseline solution that can import most datafields.
It is not uncommon for “%customers” to be at 0% even though you have mapped 80% of the datafields.

For this reson, it makes sence to use another metric to track progress in the early days of the project
“%datafields” – How many data fields in the target system can you map data to from the source system.

These metrics are somewhat interlinked, but usually you will find that “%datafields” will come closer to 100% much sooner than “%customers”

I believe the best practice is to switch metrics during the project.
You start with “%datafields” and transition to meassuring “%customers” when that metric begins to be more than 0%.

Conceptually that divides the migration project into two phases: “Mapping fields” and “Refining the migration”

Mapping fields
Mapping fields is where you match datafields in target and source systems.
When you are at 100% you will know where to aquire data for each and every field you need in the target system.

Each field specification should be the common-scenario data-value – that which is most probably the right mapping. It is OK at this stage to ignore corner-cases where specific scenarios impact the value of the field.

The important goal here is to get almost-correct data into the system to evaluate the system and process impact of the additional data.

Refining the migration
Most systems are not uniform, they contain a variety of sub-concepts.
A customer might be a VIP customer, in bad standing, have an open order or other special information that makes it different that the other customers.

This variety will also show up in your data migration, so the rules for migrating a VIP customer might need other mapping rules to migrate than was needed for regular customers.

These corner cases will be your focus for the remainder of the migration.
Identify each of the cases, prioritize the order they need to be solved in and do them one at a time.

Projecting future progress

Ideally you would like a metric that scales linearily with the progress in the project but unfortunately
neither “%datafields” nor “%customers” do that.
Both metrics are logarithmic, progress is easier at the start and get gradually slower.

While “%customers” as a number is not linear, the time it takes to solve a particular special corner case is roughly linear.
That means that when you have a list of special cases that needs attending to and a velocity for solving them you can roughly predict when you will be done with all of them.

Your velocity in this part of the project is often tied to the organisational ability to make business decisions not your ability to code and map datafields.

Also remember that the end date of the migration should be when the effort to automatically migrate data exceeds the cost of manually fixing the data in the target system, NOT when you have migrated 100% of the data automatically.