Paul Chilton

Rubyrep with PostgreSQL DB syncing

I’m using Rubyrep to sync databases between a pair of PostgreSQL instances for this blog. Thought I would share the process that I used with you.

Install Rubyrep

sudo gem install rubyrep

Configuration

Use the following of scripts/rubyrep.conf with:


RR::Initializer::run do |config|
  config.left = {
    :adapter  => 'postgresql',
    :database => 'example_development',
    :username => 'example',
    :password => 'example',
    :host     => '127.0.0.1',
    :port     => 5432,
  }
  config.right = {
    :adapter  => 'postgresql',
    :database => 'example_production',
    :username => 'example',
    :password => 'example',
    :host     => '127.0.0.1',
    :port     => 5555,
  }
  
  config.options[:auto_key_limit] = 2
  config.options[:rep_prefix] = 'rr_example'

  config.include_tables 'schema_migrations', :key => 'version'

  config.include_tables /./, :sync_conflict_handling => lambda { |sync_helper,conflict_type,rows|
    time_left  = rows[0]['updated_at'].to_i
    time_right = rows[1]['updated_at'].to_i
    if time_left > 0 and time_right > 0 and time_left != time_right
      if time_left > time_right
        sync_helper.update_record(:right, sync_helper.right_table(), rows[0])
      else
        sync_helper.update_record(:left, sync_helper.left_table(), rows[1])
      end
    end
  }
end

This gives a local left connection, and a remote right connection when configured with SSH tunneling. Conflict resolution is based on the updated_at column of the database.

SSH Tunneling

I already have SSH keys setup on the remote host. Using the following to create a connection locally on port 5555 that connects to the remote PostgreSQL port of 5432

ssh -f -N -T -C -L 5555:localhost:5432 example@example.com

Scan and sync

I do the scan and sync manually, rather than continual replication.


rubyrep scan -c scripts/rubyrep.conf
rubyrep sync -c scripts/rubyrep.conf

Fun things with replication

It is possible to have a master-master replication setup where databases are not always connected. Requires:

  • Application aware that databases are synced intermittently.
  • Unique ID columns. I’ve used GUID values. Even/odd sequences may work.
  • Use of updated_at column, for resolving conflicts.
  • Do not delete rows. These will sync back into existence.

There are some significant constraints on what can be done, but it does enable some unique applications.