Managing Data Load and Migration for a Master Data Management (MDM) Project

January 17th, 2023 by Marc A.Paolo Sr Director, Client Success

Data load and migration are an integral and central part of any Master Data Management (MDM) implementation.

Most MDM projects involve two types of data loads:

  • Initial Data Load (IDL)
  • Ongoing Data Loads (ODL)

Data loads occur in various frequencies, which can include:

  • One-time – often an IDL
  • Batch
  • Real time

The method of the data load must also be considered.  Possible methods include:

  • Direct SQL load (available in some, but not all, MDM systems)
  • Flat file upload facilitated by MDM system’s built-in tools
  • Load via API
  • Load by data transport, middleware, or ETL tool.

Once the source data has been identified, often with the assistance of client Subject Matter Experts (SMEs), we begin with a data modelling and mapping exercise to ensure the source data will fit the target system.  In many cases, is necessary to understand data volumes, amount of duplication, invalid data in fields, missing data in required fields, and other data anomalies and characteristics; this informs the modelling and mapping results and is used in other aspects of MDM implementation, such as for developing match rules and understanding enterprise reference data.

Through the initial modelling and mapping exercise, we also take care to understand the level of confidentiality of the data.  If PHI and PII are involved, then we ensure only authorized personnel are allowed to handle the data, and the data must only be handled on client systems.  Our consultants are trained to prevent data breaches during data loads.

Mapping considerations often require the assistance of client subject matter experts; in many (most) cases, the data models from the source and target systems do not match, and judgment and experience are required to ensure all data from the source has an appropriate landing place in the target system.  In many cases, data transformations are also required, and these must be understood and designed into the load process.

Once modelling and mapping are complete, we conduct a small initial data load using a fractional subset of the initial data load set. The initial data load is used to validate the assumptions and design decisions made in the modelling and mapping exercise. The sample data load also is used to confirm connectivity between the source and target, where applicable.

The following are examples of the many things our data teams will validate upon load:

  • Record counts must match, and failed records must be accounted for, with reasons for failure understood.
  • Data types must be preserved or converted as expected.
  • Lists of values must load or map correctly.
  • Data integrity must be preserved on a field-by-field basis.
  • Field lengths must be validated to avoid truncating data.
  • Fields must be mapped correctly between the source and target.
  • There must be no corruption due to incompatible character sets or other issues introduced by the transport method.
  • NULL vs. blank characters must be handled correctly.
  • Date/Time fields must be loaded correctly – either with or without conversion, as needed.
  • Examination of failures (or unintended successes) due to required fields and other validation rules.
  • Time stamps and other audit information (such as “created by” fields) must be preserved or recorded correctly.

As a rule, Fresh Gravity will automate this process whenever possible and permissible.

Upon validating the sample data load worked correctly, a full data set will be loaded, and the same items will be examined as with the sample load, plus any issues due to volume will also be evaluated.

The process is similar for Ongoing Data Loads (ODL), however, additional considerations come into play, such as:

  • Desired behaviour when a record is updated in full or in part.
  • Desired behaviour when new records are inserted.
  • Source of the ongoing updates can often be different than the source of the initial data load, and these differences must be handled carefully.
  • Desired behaviour when records are hard deleted or marked for deletion/inactivation in the source system.

If you need help with a Data Load for your MDM project, please write to us at marc.paolo@freshgravity.com or sudarsana.roychoudhury@freshgravity.com.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fresh Gravity Celebrates Nine Years of Success and Growth

We are thrilled to share that on April 8th we celebrated the ninth anniversary of Fresh Gravity’s founding. It’s an incredible milestone that wouldn’t have been possible without the hard work, dedication, and unwavering commitment of the Fresh Gravity family. We would also like to thank all our clients and partners for trusting us. Thank […]

» READ MORE

The Importance of Validated Data Systems under GxP Guidelines 

Imagine this: you’re working on a new drug, a potential lifesaver. Every decision you make relies on the data you collect. If that data is shaky, well, that life-saving drug could turn dangerous fast. Scary, right?  In the highly regulated sectors of pharmaceuticals, biotechnology, and medical devices, adherence to Good Practice (GxP) guidelines is paramount. […]

» READ MORE