Migration Failback

On this page Carat arrow pointing down
Note:

These instructions assume you have already installed MOLT and completed the prerequisites for your source dialect.

Prepare the CockroachDB cluster

Tip:

For details on enabling CockroachDB changefeeds, refer to Create and Configure Changefeeds.

If you are migrating to a CockroachDB self-hosted cluster, enable rangefeeds on the cluster:

icon/buttons/copy
SET CLUSTER SETTING kv.rangefeed.enabled = true;

Use the following optional settings to increase changefeed throughput.

Warning:

The following settings can impact source cluster performance and stability, especially SQL foreground latency during writes. For details, refer to Advanced Changefeed Configuration.

To lower changefeed emission latency, but increase SQL foreground latency:

icon/buttons/copy
SET CLUSTER SETTING kv.rangefeed.closed_timestamp_refresh_interval = '250ms';

To lower the closed timestamp lag duration:

icon/buttons/copy
SET CLUSTER SETTING kv.closed_timestamp.target_duration = '1s';

To improve catchup speeds but increase cluster CPU usage:

icon/buttons/copy
SET CLUSTER SETTING kv.rangefeed.concurrent_catchup_iterators = 64;

Grant target database user permissions

You should have already created a migration user on the target database (your original source database) with the necessary privileges. Refer to Create migration user on source database.

For failback replication, grant the user additional privileges to write data back to the target database:

icon/buttons/copy
-- Grant INSERT and UPDATE on tables to fail back to
GRANT INSERT, UPDATE ON ALL TABLES IN SCHEMA migration_schema TO migration_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA migration_schema GRANT INSERT, UPDATE ON TABLES TO migration_user;
icon/buttons/copy
-- Grant INSERT and UPDATE on tables to fail back to
GRANT SELECT, INSERT, UPDATE ON source_database.* TO 'migration_user'@'%';
FLUSH PRIVILEGES;
icon/buttons/copy
-- Grant INSERT, UPDATE, and FLASHBACK on tables to fail back to
GRANT SELECT, INSERT, UPDATE, FLASHBACK ON migration_schema.employees TO MIGRATION_USER;
GRANT SELECT, INSERT, UPDATE, FLASHBACK ON migration_schema.payments TO MIGRATION_USER;
GRANT SELECT, INSERT, UPDATE, FLASHBACK ON migration_schema.orders TO MIGRATION_USER;

Configure Replicator

When you run replicator, you can configure the following options for replication:

Connection strings

For failback, MOLT Replicator uses --targetConn to specify the destination database where you want to replicate CockroachDB changes, and --stagingConn for the CockroachDB staging database.

--targetConn is the connection string of the database you want to replicate changes to (the database you originally migrated from).

For example:

--targetConn 'postgres://postgres:postgres@localhost:5432/molt?sslmode=verify-full'
--targetConn 'mysql://user:password@localhost/molt?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt'
--targetConn 'oracle://C%23%23MIGRATION_USER:password@host:1521/ORCLPDB1'

--stagingConn is the CockroachDB connection string for staging operations:

--stagingConn 'postgres://crdb_user@localhost:26257/defaultdb?sslmode=verify-full'

Secure connections

  • To keep your database credentials out of shell history and logs, follow these best practices when specifying your source and target connection strings:

    • Avoid plaintext connection strings.
    • Provide your connection strings as environment variables. For example:

      export SOURCE="postgres://migration_user:a%2452%26@localhost:5432/molt?sslmode=verify-full"
      export TARGET="postgres://root@localhost:26257/molt?sslmode=verify-full"
      

      Afterward, reference the environment variables in MOLT commands:

      --source $SOURCE
      --target $TARGET
      
    • If possible, use an external secrets manager to load the environment variables from stored secrets.

  • Use TLS-enabled connection strings to encrypt data in transit from MOLT to the database. When using TLS certificates, ensure certificate files are accessible to the MOLT binary on the same machine.

    For example, a PostgreSQL connection string with TLS certificates:

    icon/buttons/copy
    postgresql://migration_user@db.example.com:5432/appdb?sslmode=verify-full&sslrootcert=/etc/molt/certs/ca.pem&sslcert=/etc/molt/certs/client.crt&sslkey=/etc/molt/certs/client.key
    
  • URL-encode connection strings for the source database and CockroachDB so special characters in passwords are handled correctly.

    • Given a password a$52&, pass it to the molt escape-password command with single quotes:

      icon/buttons/copy
      molt escape-password --password 'a$52&'
      

      Use the encoded password in your connection string. For example:

      postgres://migration_user:a%2452%26@localhost:5432/replicationload
      
  • Remove sslmode=disable from production connection strings.

TLS certificate and key

Always use secure TLS connections for failback replication to protect data in transit. Do not use insecure configurations in production: avoid the --disableAuthentication and --tlsSelfSigned Replicator flags and insecure_tls_skip_verify=true query parameter in the changefeed webhook URI.

Generate self-signed TLS certificates or certificates from an external CA. Ensure the TLS server certificate and key are accessible on the MOLT Replicator host machine via a relative or absolute file path. When you start failback with Replicator, specify the paths with --tlsCertificate and --tlsPrivateKey. For example:

icon/buttons/copy
replicator start \
... \
--tlsCertificate ./certs/server.crt \
--tlsPrivateKey ./certs/server.key

The client certificates defined in the changefeed webhook URI must correspond to the server certificates specified in the replicator command. This ensures proper TLS handshake between the changefeed and MOLT Replicator. To include client certificates in the changefeed webhook URL, encode them with base64 and then URL-encode the output with jq:

icon/buttons/copy
base64 -i ./client.crt | jq -R -r '@uri'
base64 -i ./client.key | jq -R -r '@uri'
base64 -i ./ca.crt | jq -R -r '@uri'

When you create the changefeed, pass the encoded certificates in the changefeed URL, where client_cert, client_key, and ca_cert are webhook sink parameters. For example:

icon/buttons/copy
CREATE CHANGEFEED FOR TABLE table1, table2
INTO 'webhook-https://host:port/database/schema?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}'
WITH ...;

For additional details on the webhook sink URI, refer to Webhook sink.

Replication flags

The following MOLT Replicator flags are required for continuous replication. For details on all available flags, refer to the MOLT Replicator documentation.

Flag Description
--stagingSchema Required. Staging schema name for the changefeed checkpoint table.
--bindAddr Required. Network address to bind the webhook sink for the changefeed. For example, :30004.
--tlsCertificate Path to the server TLS certificate for the webhook sink. Refer to TLS certificate and key.
--tlsPrivateKey Path to the server TLS private key for the webhook sink. Refer to TLS certificate and key.
--metricsAddr Enable Prometheus metrics at a specified {host}:{port}. Metrics are served at http://{host}:{port}/_/varz.
  • The staging schema is first created during initial replication setup with --stagingCreateSchema.

  • When configuring a secure changefeed for failback, you must include --tlsCertificate and --tlsPrivateKey, which specify the paths to the server certificate and private key for the webhook sink connection.

Tuning parameters

Configure the following replicator flags to optimize replication throughput and resource usage. Test different combinations in a pre-production environment to find the optimal balance of stability and performance for your workload.

Note:

The following parameters apply to PostgreSQL, Oracle, and CockroachDB (failback) sources.

Flag Description
--parallelism Control the maximum number of concurrent target transactions. Higher values increase throughput but require more target connections. Start with a conservative value and increase based on target database capacity.
--flushSize Balance throughput and latency. Controls how many mutations are batched into each query to the target. Increase for higher throughput at the cost of higher latency.
--targetApplyQueueSize Control memory usage during operation. Increase to allow higher throughput at the expense of memory; decrease to apply backpressure and limit memory consumption.
--targetMaxPoolSize Set larger than --parallelism by a safety factor to avoid exhausting target pool connections. Replicator enforces setting parallelism to 80% of this value.
--collapseMutations Reduce the number of queries to the target by combining multiple mutations on the same primary key within each batch. Disable only if exact mutation order matters more than end state.
--enableParallelApplies Improve apply throughput for independent tables and table groups that share foreign key dependencies. Increases memory and target connection usage, so ensure you increase --targetMaxPoolSize or reduce --parallelism.
--flushPeriod Set to the maximum allowable time between flushes (for example, 10s if data must be applied within 10 seconds). Works with --flushSize to control when buffered mutations are committed to the target.
--quiescentPeriod Lower this value if constraint violations resolve quickly on your workload to make retries more frequent and reduce latency. Do not lower if constraint violations take time to resolve.
--scanSize Applies to failback (replicator start) scenarios only. Balance memory usage and throughput. Increase to read more rows at once from the CockroachDB staging cluster for higher throughput, at the cost of memory pressure. Decrease to reduce memory pressure and increase stability.

Replicator metrics

MOLT Replicator can export Prometheus metrics by setting the --metricsAddr flag to a port (for example, --metricsAddr :30005). Metrics are not enabled by default. When enabled, metrics are available at the path /_/varz. For example: http://localhost:30005/_/varz.

Cockroach Labs recommends monitoring the following metrics during replication:

Metric Name Description
commit_to_stage_lag_seconds Time between when a mutation is written to the source CockroachDB cluster and when it is written to the staging database.
source_commit_to_apply_lag_seconds End-to-end lag from when a mutation is written to the source CockroachDB cluster to when it is applied to the target database.
stage_mutations_total Number of mutations staged for application to the target database.
apply_conflicts_total Number of rows that experienced a compare-and-set (CAS) conflict.
apply_deletes_total Number of rows deleted.
apply_duration_seconds Length of time it took to successfully apply mutations.
apply_errors_total Number of times an error was encountered while applying mutations.
apply_resolves_total Number of rows that experienced a compare-and-set (CAS) conflict and which were resolved.
apply_upserts_total Number of rows upserted.
target_apply_queue_depth Number of batches in the target apply queue. Indicates how backed up the applier flow is between receiving changefeed data and applying it to the target database.
target_apply_queue_utilization_percent Utilization percentage (0.0-100.0) of the target apply queue capacity. Use this to understand how close the queue is to capacity and to set alerting thresholds for backpressure conditions.
core_parallelism_utilization_percent Current utilization percentage of the applier flow parallelism capacity. Shows what percentage of the configured parallelism is actively being used.

You can use the Replicator Grafana dashboard to visualize the metrics.

For Oracle-specific metrics, import the Oracle Grafana dashboard.

To check MOLT Replicator health when metrics are enabled, run curl http://localhost:30005/_/healthz (replacing the port with your --metricsAddr value). This returns a status code of 200 if Replicator is running.

Stop forward replication

  1. Wait for replication to drain, which means that all transactions that occurred on the source database have been fully processed and replicated to CockroachDB. There are two ways to determine that replication has fully drained:

    • When replication is caught up, you will not see new upserted rows logs.
    • If you set up the replication metrics endpoint with --metricsAddr in the preceding steps, metrics are available at:

      http://{host}:{port}/_/varz
      

      Use the following Prometheus alert expression to observe when the combined rate of upserts and deletes is 0 for each schema:

      sum by (schema) (rate(apply_upserts_total[$__rate_interval]) + rate(apply_deletes_total[$__rate_interval]))
      
  2. Cancel replication to CockroachDB by entering ctrl-c to issue a SIGTERM signal. This returns an exit code 0.

Start Replicator

  1. Run the MOLT Replicator start command to begin failback replication from CockroachDB to your source database. In this example, --metricsAddr :30005 enables a Prometheus endpoint for monitoring replication metrics, and --bindAddr :30004 sets up the webhook endpoint for the changefeed.

    --stagingSchema specifies the staging database name (_replicator in this example) used for replication checkpoints and metadata. This staging database was created during initial forward replication when you first ran MOLT Replicator with --stagingCreateSchema.

    icon/buttons/copy
    replicator start \
    --targetConn $TARGET \
    --stagingConn $STAGING \
    --stagingSchema _replicator \
    --metricsAddr :30005 \
    --bindAddr :30004 \
    --tlsCertificate ./certs/server.crt \
    --tlsPrivateKey ./certs/server.key \
    -v
    

Create the CockroachDB changefeed

Create a CockroachDB changefeed to send changes to MOLT Replicator.

  1. Get the current logical timestamp from CockroachDB, after ensuring that forward replication has fully drained:

    icon/buttons/copy
    SELECT cluster_logical_timestamp();
    
        cluster_logical_timestamp
    ----------------------------------
      1759246920563173000.0000000000
    
  2. Create the CockroachDB changefeed pointing to the MOLT Replicator webhook endpoint. Use cursor to specify the logical timestamp from the preceding step.

    Note:

    Ensure that only one changefeed points to MOLT Replicator at a time to avoid mixing streams of incoming data.

    Tip:

    For details on the webhook sink URI, refer to Webhook sink.

    icon/buttons/copy
    CREATE CHANGEFEED FOR TABLE employees, payments, orders \
    INTO 'webhook-https://replicator-host:30004/migration_schema/public?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \
    WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}';
    
    icon/buttons/copy
    CREATE CHANGEFEED FOR TABLE employees, payments, orders \
    INTO 'webhook-https://replicator-host:30004/migration_schema?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \
    WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}';
    
    icon/buttons/copy
    CREATE CHANGEFEED FOR TABLE employees, payments, orders \
    INTO 'webhook-https://replicator-host:30004/MIGRATION_SCHEMA?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \
    WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}';
    

    The output shows the job ID:

            job_id
    -----------------------
      1101234051444375553
    
  3. Monitor the changefeed status, specifying the job ID:

    SHOW CHANGEFEED JOB 1101234051444375553;
    
            job_id        | ... | status  |              running_status               | ...
    ----------------------+-----+---------+-------------------------------------------+----
      1101234051444375553 | ... | running | running: resolved=1759246920563173000,0 | ...
    

    To confirm the changefeed is active and replicating changes to the target database, check that status is running and running_status shows running: resolved={timestamp}.

    Warning:

    running: resolved may be reported even if data isn't being sent properly. This typically indicates incorrect host/port configuration or network connectivity issues.

  4. Verify that Replicator is reporting incoming HTTP requests from the changefeed. To do so, check the MOLT Replicator logs. Since you enabled debug logging with -v, you should see periodic HTTP request successes:

    DEBUG  [Aug 25 11:52:47]  httpRequest="&{0x14000b068c0 45 200 3 9.770958ms   false false}"
    DEBUG  [Aug 25 11:52:48]  httpRequest="&{0x14000d1a000 45 200 3 13.438125ms   false false}"
    

    These debug messages confirm successful changefeed connections to MOLT Replicator. You can disable verbose logging after verifying the connection.

Troubleshooting

Failback issues

If the changefeed shows connection errors in SHOW CHANGEFEED JOB:

Connection refused
transient error: Post "https://replicator-host:30004/molt/public": dial tcp [::1]:30004: connect: connection refused

This indicates that Replicator is down, the webhook URL is incorrect, or the port is misconfigured.

Resolution: Verify that MOLT Replicator is running on the port specified in the changefeed INTO configuration. Confirm the host and port are correct.

Incorrect schema path errors

This error occurs when the CockroachDB changefeed webhook URL path does not match the target database schema naming convention:

transient error: 400 Bad Request: unknown schema:

The webhook URL path is specified in the INTO clause when you create the changefeed. For example: webhook-https://replicator-host:30004/database/schema.

Resolution: Verify the webhook path format matches your target database type:

  • PostgreSQL or CockroachDB targets: Use /database/schema format. For example, webhook-https://replicator-host:30004/migration_schema/public.
  • MySQL targets: Use /database format (schema is implicit). For example, webhook-https://replicator-host:30004/migration_schema.
  • Oracle targets: Use /DATABASE format in uppercase. For example, webhook-https://replicator-host:30004/MIGRATION_SCHEMA.

For details on configuring the webhook sink URI, refer to Webhook sink.

GC threshold error
batch timestamp * must be after replica GC threshold

This indicates starting from an invalid cursor that has been garbage collected.

Resolution: Double-check the cursor to ensure it represents a valid range that has not been garbage collected, or extend the GC TTL on the source CockroachDB cluster:

icon/buttons/copy
ALTER DATABASE defaultdb CONFIGURE ZONE USING gc.ttlseconds = {gc_ttl_in_seconds};
Duplicated data re-application

This occurs when resuming a changefeed from a cursor causes excessive data duplication.

Resolution: Clear the staging database to prevent duplication. This deletes all checkpoints and buffered data, so use with caution:

icon/buttons/copy
DROP DATABASE _replicator;

For more targeted cleanup, delete mutations from specific staging tables:

icon/buttons/copy
DELETE FROM _replicator.employees WHERE true;

See also

×