This article provides a high-level overview of the Powered by Jumbo Customer Migrator.
Prerequisites
- Access to the PBJ Admin platform
- “Migrator Admin” capability enabled
- CSV file with customer data in the correct format
What Is The PBJ Customer Migrator?
The PBJ Customer Migrator allows customer data from your existing CRM software to be quickly and easily imported into a PBJ system. This set of services is available through PBJ AdminUI. All that is needed is the correct capabilities to access the customer migrator and CSV files with the correct customer data and formatting.
How Does The PBJ Customer Migrator Work?
A CSV file includes data for one file type only. Within each file, the data is keyed by an external ID that must be unique within that file. Each row in the file defines a Desired End State for the record of whichever specification. This may lead to a new record being inserted in PBJ, or it may lead to an existing record being updated in PBJ.
As data is processed by the system, each row of each file that the user submits is updated to include derived information (such as the PBJ Client ID associated with the row), or any validation errors in accepting that data.
After a batch file is processed, the user may download the marked-up batch to examine the results.
Customer Data that can be migrated:
- Customer details
- Customer notes
- Payment methods
- Ticket history
- Recurring purchase
- Communication preferences
PBJ Customer Migrator Logic
Migration Specifications
To make sure you are uploading a file with the correct format, click on the “Download these spec documents” link on the customer migrator page to download starter templates of the CSV files. Further steps here.
Customer Details
Customer Notes
Payment Methods
Ticket History
Recurring Purchase
Communication Preferences
Card Gateway Migration
Technical Processes Of The PBJ Customer Migrator
Technical File Upload Process
When a file is uploaded, the following events occur:
- REST Handler is invoked to manage the file-upload request;
- Once complete, the CSV file is parsed to be valid CSV, and split into rows;
- The first row, which contains the labels for all the columns, is compared against known types to determine the file type;
- If a file type is detected, run validation on the file;
- If the file passes validation, then save the file_upload and the csv_row data for the whole CSV file into the Metrics Data Warehouse;
- Mark the file status as MIGRATE_FILE_READY, set each row as MIGRATE_ROW_READY.
Determining Incoming File Type
In order to determine what the file type is based on the 1st row (column titles):
- Iterate through each file type as defined in the pkg/model directory;
- Get the expected Column Names from the “migrate” struct tag;
- Compare each struct tag in order vs the column names found in the file;
- Generated fields (as defined by the “generated struct tag”) are optional. This allows for a “generated” results file to be re-submitted for migration;
- Repeat for each file type until a successful match is found;
- If no match is found, return an error.
File Validation
Once a CSV file is detected as the appropriate type, validate every row of the file using the following :
- Iterate through each column in the row and compare to each column in the struct that has a “migrate” struct tag;
- If struct tag “required” – ensure the data is not empty;
- If struct tag is “unique” – check all rows to ensure the data in that column is not repeated in the whole file;
- Check that the field contents is valid for the given type (string / int / uint / bool / date);
- If the struct has a “Validate()” method – then call the Validate() method on the row to check any custom validations;
- If any validations fail, then return an error to the user (on AdminUI), and do not migrate.
Technical Migration Process
When the user selects a file for migration, the following occurs :
- Start a worker thread for this file upload;
- Set the file_upload status to MIGRATE_FILE_INPROGRESS;
- Send all the rows to the MIGRATOR.JL topic in kafka, where the status is MIGRATE_ROW_READY;
- Mark each sent row as status MIGRATE_ROW_PENDING;
- Stamp the file_upload record with the timestamp on column file_upload_sent_timestamp;
- Wait for replies on the JL.MIGRATOR topic;
- Mark off each row seen by updating the status to MIGRATE_ROW_SUCCESS / MIGRATE_ROW_ERROR depending on the status;
- Stamp additional fields in the CSV row with additional details from the JL kafka message (external PBJ ID, ERROR, etc);
- Check if all rows have been received – if so, then mark the file_upload as status MIGRATE_FILE_COMPLETE / MIGRATE_FILE_COMPLETE_WITH_ERRORS);
- If the global clock expires whilst waiting for data, then set the file_upload status to MIGRATE_FILE_STOPPED to signal the user that the migration seems to have stalled;
- If complete, then wind up the worker thread.
Database Store Of CSV Data
The Migrator service uses a postgreSQL database to store migration specific data. This is used to buffer incoming CSV data before being sent to the Metrics Data Warehouse, and used to markup the same CSV data with responses from the data warehouse servers once the data is migrated.
On local dev environments, the Migrator uses its own postgreSQL instance, but on production environments, the migrator database is hosted as a schema in the production Metrics Data Warehouse.
The Migrator database is divided into 2 sections :
- Static Lookup Data
- file_types / file_status / csv_row_status – being simple lookup tables for status codes
- CSV Data
- file_uploads – header records, 1 per file upload.
- csv_rows – 1 record per row in each CSV upload.
It is important to note that the CSV row data is stored in the postgreSQL database as a single string, 1one per record in the csv_row table. The header row with the column names is not stored in the postgreSQL database.
Therefore any changes to the layout of the CSV file for a given file type does not require any database to accommodate this.
The generated fields such as PBJ_CUSTOMER_ID, ASSIGNED_ID, CRN (customer reference number) and ERROR are stored as separate fields in the same csv_row database record, which again leaves the original contents unaffected.
Technical File Downloading Process
When a user selects “download” on a completed file migration, the following occurs:
- The system will generate a fresh new CSV file, in memory, using the data from the file_upload / csv_row tables, including additional “generated” fields such as the PBJ ID or ERROR;
- This in-memory CSV file is then streamed to the browser with a HTTP file download header;
- Note that all these CSV files are generated in-memory, and on demand.
Service Error Recovery
On startup, the system will check all file_upload records, and respawn a worker thread for any file_uploads that have started but have not yet completed.
This way, if the service is restarts, crashes or recovers, then any in-progress migrations will resume until completion.
Component Structure
The Migrator service composes a number of subsystems that work together to provide the capability of uploading, validating, importing and reporting on the results of each CSV batch file.
PBJ AdminUI
The AdminUI interface provides a screen that allows CSV files to be interactively uploaded, with options to begin the migration of the file and view the results of the migration.
Go Service
The Go service provides a number of REST endpoints that are hit directly by the AdminUI frontend. In any installation, there is typically a single instance of the Go service, configured with a minimal build. The Go service has relatively small CPU / Memory requirements compared to the other components of the system.
These endpoints are used to upload data to the system, where the Migrator service parses the CSV file and performs initial validation. If there are errors, these are reported back to AdminUI.
If the file passes validation, the file data (file_upload header, and one record for each row in the CSV file), is written to the postgreSQL database with status metadata.
Note that the original CSV file is not saved anywhere, it is parsed and converted to an internal store in the database with one CSV row equals one record.
When the user requests that the validated file should be migrated from the AdminUI, this hits a REST endpoint on the Migrator service to begin the migration process.
On Migrate Data – the Migrator service will then create a message on the MIGRATOR.JL queue for each row in the file, and mark the row in the database as sent. The Migrator service will then wait for replies to each message sent by monitoring the JL.MIGRATOR topic, and mark off each row as complete, as well as update the row in the database with any additional metadata.
Additional metadata per row includes :
- Any processing errors at the JL end
- Any assigned PBJ ID
- CRN (customer reference number) created in JL
After the batch is complete, the user may select Download File from the AdminUI interface, which requests the data from the Migrator service via a REST call.
On Download File – the Migrator service will then reconstruct the CSV file from its database record, including additional fields such as the assigned PBJ ID and any error messages.
This generated CSV file can be edited by the user and re-submitted if desired.
PHP Consumer
The PHP consumer reads messages from the Kafka topic MIGRATOR.JL, where each message represents 1 row of 1 file upload.
Each record defines a Desired End State for the given file type, which may involve inserting new data, or it may involve updating existing data.
The consumer then applies the following logic, in general:
- Determine the file type;
- If the record of the data exists in JL, then update the data from the message;
- If the record does not exist in JL, then insert the data from the message;
- Create a result payload and send it to JL.MIGRATOR topic.
The actual business logic varies by file type.
Incoming messages must have a customer-external ID, and may have a PBJ ID as well. This combination of ID fields is used to determine whether the message is dealing with a record insert or a record update.
On the result payload which is returned, the following data may be provided:
- Any errors in being able to process the message
- Any assigned PBJ IDs
- CRN (customer reference number)
With the assigned PBJ IDs – in the case of a record insert, the assigned PBJ ID would be the new ID assigned by PBJ and the CRN will be generated via the PBJ ID and is unique for every customer.
If the original message did not provide a PBJ ID, and the system determined that the data existed in JL and needed to be updated, then the Assigned PBJ ID would be the PBJ ID of the record affected.
This aspect can be slightly complicated because there is no insert/update intent recorded on the incoming message, it’s just a statement of Desired End State for the given data.
Kafka message queue
All communication between the Go Service and the PHP consumer is async via Kafka, with 2 topics in use :
- MIGRATOR.JL for messages from the Go microservice to the PHP consumer
- JL.MIGRATOR for messages from the PHP consumer back to the Go microservice
The full communication round trip looks like this :
AdminUI → HTTP/REST → Migrator Service (Go) → MIGRATOR.JL kafka topic → PHP Consumer → JL.MIGRATOR kafka topic → Migrator Service (Go) → HTTP response → AdminUI
Depending on the size of the deployed platform (Prod vs Staging vs Local, etc), the Kafka configuration will typically have a number of partitions, and an equal number of PHP consumers.
Messages sent over Kafka are therefore distributed in a round-robin format across the partitions and consumers and effectively handled in parallel at the consumer end.
The method used to avoid race conditions in this parallel processing is to apply the validation rule that each CSV File upload has a unique value for the external ID of the record being referenced. Therefore, within each batch, there will only be a max of one row for any given record that needs to be inserted/updated.
As long as one batch at a time is processed, there is no possibility of race conditions.
If multiple batches of the same type are migrated at the same time, without waiting for one to complete, then there may be a race condition if the second batch contains data for the same external IDs as the first batch. Because of the Desired State approach to the data updates, the last message seen will define the state for that given record.
It is recommended to process each batch until completion before processing any subsequent batches