The following is a brief explanation of the reference architecture that JoEm uses when implementing migration projects.
- Source: The old system(s) where the data that must be moved is located. For most systems, this will be the database of the system, but it could also include file-based sources
- Extractor: A function that extracts data from the source system and places it in the Source Lake. You will typically want to keep this program very simple and not make any changes to the data you extract. Exceptions might be to denormalize the data.
- Source Lake: A database where you are able to store snapshots of how the data in the source systems looked like at a specific point in time. It might seem unnecessary, but it is vital that you can store snapshots of source data so the data you work on is static.
- Mapping: A function that is able to convert data from the source lake format to a Target lake format.
- Target lake: A database where you are able to store the results of using a mapping function on a Source lake object. There will typically be at least 2 target lake formats for each target system. A format that closely resembles the format that the target system accepts and a debug-friendly format. It will however also include data such as error logs and metadata about the mapping
- Loader: A function that is able to move data from the Target Lake format into the format of the target system. The loader should preferably be an integral component in the target system that is able to load data from the Target Lake since that provides the best system integrity in the target system.
- Target: The new system(s) where the data is supposed to end up.
- Controller: The software that enables the team to control the flow of a migration. This software has a number of purposes
- Activate extractors and mappings to perform migrations both for the whole dataset, but also from small subsets.
- Visualize the content of the Target Lake in a friendly format.
- Profiler: A BI solution that enables the team to explore and understand if the implemented mapping components perform as intended