AI in data migration

Data Migration
  • Insight
  • 10 minute read
Mykhailo  Saienko

Mykhailo Saienko

Senior Manager, Cloud & Digital, PwC Switzerland

Nina Wolf

Nina Wolf

Senior Manager Data Transformation & Analytics, PwC Switzerland

Pramukhee  Sirsi

Pramukhee Sirsi

Manager Technology & Data, PwC Switzerland

Artificial Intelligence (AI) is set to be the defining advancement technological advance of the twenty-first century, and we have already seen evidence of AI transforming several industries. Data migration is no exception to this.

Technology pundits foresee a future where up to 80% of the data migration human effort in data migration will be reduced via AI and automation. However, your company does not need to be at the forefront of innovation to harness the power of AI and make the activities of data migration activities easier. This article provides a peek into some of the most prevalent use cases of AI in the area of data migration. 

Challenges with traditional data migration and AI use cases

Traditional data migration faces several challenges that creates risk of delay, corrupt or incorrect data. Here is a link to our first blogpost for more details on the challenges with traditional data migration and how to overcome them. In our second blogpost we mentioned potential business use cases for automation in data migration.

In this blogpost we focus on AI, generally it brings a higher level of intelligence and flexibility to data migration, making it more effective at improving data quality, supporting advanced mapping and validating activities, identifying and handling complex relationships, and ensuring compliance. While automation improves efficiency by streamlining repetitive tasks, AI elevates the migration process by introducing adaptability, context-aware decision-making, and continuous learning while reducing the efforts needed to develop the necessary code and documentation. 

1. Data cleansing

Applying AI to data migration can significantly improve data quality, addressing one of the most critical challenges in migrating data between systems. AI enhances the data migration process by identifying, correcting, and improving the quality of data before, during, and after the migration.

Data cleansing (which occurs before the data is actually migrated) is one of the steps where AI-driven tools can conduct data profiling and anomaly detection, and involves analyzing the source data to detect inconsistencies, redundancies, and errors.

Role of AI and use cases:

  • Identifying outliers and anomalies in data that may go unnoticed by traditional profiling methods.
  • Clustering and pattern recognition to highlight common issues, such as duplicated customer records or inconsistent data entry formats.
  • More recent Natural Language Processing (NLP) and GenAI technology allows makes it possible to perform much more intelligent classification and profiling operations of structured and unstructured data, for which intensive manual effort was previously required.

 

2. Improvement in data quality

Poor data quality often stems from inconsistent formats, typographical errors, or incorrect values. AI tools can correct these issues by recognizising patterns in the data and making intelligent adjustments before the actual migration. AI can detect data inconsistencies in the data by comparing current data with historical norms or known good data patterns.

Role of AI and use cases:

  • NLP can interpret text fields and suggest corrections for misspellings, inconsistent terminology, or formatting errors.
  • Machine learning algorithms can flag and correct numerical data errors, such as mismatches in accounting data or incorrect product quantities.
  • AI uses probabilistic matching to analyzse similarities in fields (e.g., name, address, phone number), and confidently merge or delete duplicates while maintaining data integrity.
  • Some machine learning algorithms can impute missing data points based on the existing patterns in complete records. However, this is to be used with caution as this may lead to the introduction of unintended bias in the dataset.   

3. Advanced data mapping

Data mapping is an important step in the process of data migration.  A simplistic example: “CH” in the source system is mapped to “Switzerland” in the target system. AI enhances the data mapping by detecting complex relationships between fields in the source and target systems. Unlike manual or rule-based approaches, AI can identify patterns and correlations in data that may not be immediately obvious to human analysts. This includes recognizising variations in naming conventions, discrepancies in data formats, or even relationships between seemingly unrelated data points. AI enhances data mapping by making it faster, more accurate, and capable of handling the complexities of modern data environments.

Role of AI and use cases:

  • Creation of data mapping suggestions based on the schema description of the source and the target systems, as well as the description of the business processes relying on the affected data and any instructions in natural language provided by human experts.
  • Data gap detection: New NLP models can identify when either the data in the source system is not reflected in the target data scheme or there are data fields in the target scheme which cannot be covered by data in the source system. This is especially relevant for large multi-national companies when consolidating local databases into a unified database, e.g., for SAP transformations or implementing unified data models.
  • Identifying obsolete and inconsistent (master) data: Very often, the data which requires the most manual checks and delays, is often the data with the least relevance for the company operations, i.e., old data which is not longer used nor maintained anymore, corrupt entries which went undetected and were superseded by newer entries, etc. using a mixture of analytics and AI methods, one can detect such entries mostly automatically and exclude them from data migration.

4. Post-migration data validation

AI can assist with post-migration reconciliation by comparing the migrated data with the source data, identifying discrepancies, and ensuring data integrity without the need for extensive manual validation.

Most recent NLP and GenAI models can be employed to automate the end-to-end validation of data flows, checking that transformations, aggregations, and calculations produce correct results in the target system. AI can also cross-reference results against historical data to detect inconsistencies and validate data accuracy through automated regression testing. This approach can be particularly powerful if combined with the description of the business processes which will draw on the target data scheme, as it is generated, for example, by the PwC BPMN AI tool.

Role of AI and use cases:

  • Referential integrity: AI models can automatically verify that all dependencies are satisfied in the migrated data. This approach goes beyond simply identifying missing records or checking foreign keys, such as semantic mismatches (e.g. Berlin being attributed to the UK, an employee being  wrongly attributed to the wrong business unit, etc). Newer GenAI models can also suggest corrective actions which are hard to generate using traditional automation methods.
  • Intelligent anomaly detection for business logic validation: Beyond verifying that data has been accurately moved, newer NLP and GenAI models can also validate that the migrated data adheres to the expected business logic. AI models can be trained or prompted to understand the normal behaviour of business-critical datasets, such as sales orders, transactions, or inventory levels, and flag any anomalies after migration that do not align with the historical patterns. If, for example, an unusually high volume of transactions appears post-migration, AI’s anomaly detection can quickly determine whether this is a migration error (e.g., duplicated records or missing transactions) or a valid business event. This context-aware validation ensures that the data not only complies with the target schema but also maintains logical integrity based on business expectations.

5. Compliance & governance checks

Assessing the compliance towith the data governance policies set by the regulator &and your company is an activity that can be outsourced to AI, ensuring that sensitive information is handled correctly. A financial institution migrating customer data might deploy AI to identify, anonymizise or mask Personally Identifiable Information (PII) in order to comply with regulations.   

AI’s cooler cousin, generative AI, has some proven applications in document generation – for auditing and future migration reference purposes or for reporting and monitoring purposes. The value provided versus the effort involved, however, needs to be determined before a company makes an investment in Gen AI.  

Role of AI and use cases:

  • AI can automatically classify data based on predefined criteria, such as sensitive data, regulatory requirements or data that scores highly as a trade secret / differentiator. As an example, a healthcare company migrating patient records can program AI to automatically classify data into categories such as “Protected Health Information” (PHI) or non-PHI. By identifying and classifying the data, it makes it easier to plan the next steps and prioritize secure handling / migration of the PHI data. 
  • AI can be leveraged to generate a log, a detailed audit trail of the data migration process which can then be used as a reference for any future migrations, regulatory audits and internal reviews. As an example, a government agency migrating citizen records can utilizise AI to automatically generate an audit trail that documents every action taken during the migration process. This includes who accessed the data, what transformations were applied (if any), and what changes were made, providing a comprehensive log and record.

Are you ready to harness the transformative power of AI in your data migration project? Whether you are looking to enhance data quality, increase speed or automate the time- consuming traditionally manual process of data mapping, we can help you realizise your goals. Please get in touch if should you require any further information. 

Contact us

Joscha Milinski

Partner and Data Strategy & Management Leader, PwC Switzerland

+41 58 792 23 58

Email

Nina Wolf

Senior Manager Data Transformation & Analytics, PwC Switzerland

+41 79 193 07 00

Email