Paul Chilton

Rubyrep with PostgreSQL DB syncing

I’m using Rubyrep to sync databases between a pair of PostgreSQL instances for this blog. Thought I would share the process that I used with you.

Install Rubyrep

sudo gem install rubyrep

Configuration

Use the following of scripts/rubyrep.conf with:


RR::Initializer::run do |config|
  config.left = {
    :adapter  => 'postgresql',
    :database => 'example_development',
    :username => 'example',
    :password => 'example',
    :host     => '127.0.0.1',
    :port     => 5432,
  }
  config.right = {
    :adapter  => 'postgresql',
    :database => 'example_production',
    :username => 'example',
    :password => 'example',
    :host     => '127.0.0.1',
    :port     => 5555,
  }
  
  config.options[:auto_key_limit] = 2
  config.options[:rep_prefix] = 'rr_example'

  config.include_tables 'schema_migrations', :key => 'version'

  config.include_tables /./, :sync_conflict_handling => lambda { |sync_helper,conflict_type,rows|
    time_left  = rows[0]['updated_at'].to_i
    time_right = rows[1]['updated_at'].to_i
    if time_left > 0 and time_right > 0 and time_left != time_right
      if time_left > time_right
        sync_helper.update_record(:right, sync_helper.right_table(), rows[0])
      else
        sync_helper.update_record(:left, sync_helper.left_table(), rows[1])
      end
    end
  }
end

This gives a local left connection, and a remote right connection when configured with SSH tunneling. Conflict resolution is based on the updated_at column of the database.

SSH Tunneling

I already have SSH keys setup on the remote host. Using the following to create a connection locally on port 5555 that connects to the remote PostgreSQL port of 5432

ssh -f -N -T -C -L 5555:localhost:5432 example@example.com

Scan and sync

I do the scan and sync manually, rather than continual replication.


rubyrep scan -c scripts/rubyrep.conf
rubyrep sync -c scripts/rubyrep.conf

Fun things with replication

It is possible to have a master-master replication setup where databases are not always connected. Requires:

  • Application aware that databases are synced intermittently.
  • Unique ID columns. I’ve used GUID values. Even/odd sequences may work.
  • Use of updated_at column, for resolving conflicts.
  • Do not delete rows. These will sync back into existence.

There are some significant constraints on what can be done, but it does enable some unique applications.

Enki based blog

It has been a while since I did any postings to my blog. Restarting on an Enki based blog, will start to post on a regular basis to improve my writing and communication skills.

Inspired by a post by Jeff Atwood about writing without writing