Notes

PgQ daemon

Londiste runs as a consumer on PgQ. Thus pgqadm.py ticker must be running on provider database. It is preferable to run it in same machine, because it needs low latency, but that is not a requirement.

For monitoring you can use pgqadm.py status command.

Table Names

Londiste internally uses table names always fully schema-qualified. If table name without schema is given on command line, it just puts "public." in front of it, without looking at search_path.

PgQ events

Table change event

Those events will be inserted by triggers on tables.

Such partial SQL format is used for 2 reasons - to conserve space and to make possible to redirect events to another table.

Registration change event

Those events will be inserted by provider add and provider remove commands. Then full registered tables list will be sent to the queue so subscribers can update their own registrations.

Currently subscribers only remove tables that were removed from provider. In the future it's possible to make subscribers also automatically add tables that were added on provider.

log file

Londiste normal log consist just of statistics log-lines, key-value pairs between {}. Their meaning:

Example:

{count: 110, duration: 0.88}

Commands for managing provider database

provider install

londiste.py <config.ini> provider install

Installs code into provider and subscriber database and creates queue. Equivalent to doing following by hand:

CREATE LANGUAGE plpgsql;
CREATE LANGUAGE plpython;
\i .../contrib/txid.sql
\i .../contrib/logtriga.sql
\i .../contrib/pgq.sql
\i .../contrib/londiste.sql
select pgq.create_queue(queue name);

Notes:

provider add

londiste.py <config.ini> provider add <table name> ...

Registers table on provider database and adds trigger to the table that will send events to the queue.

provider remove

londiste.py <config.ini> provider remove <table name> ...

Unregisters table on provider side and removes triggers on table. The event about table removal is also sent to the queue, so all subscriber unregister table from their end also.

provider tables

londiste.py <config.ini> provider tables

Shows registered tables on provider side.

provider seqs

londiste.py <config.ini> provider seqs

Shows registered sequences on provider side.

Commands for managing subscriber database

subscriber install

londiste.py <config.ini> subscriber install

Installs code into subscriber database. Equivalent to doing following by hand:

CREATE LANGUAGE plpgsql;
\i .../contrib/londiste.sql

This will be done under Londiste user, if the tables should be owned by someone else, it needs to be done by hand.

subscriber add

londiste.py <config.ini> subscriber add <table name> ... [--excect-sync | --skip-truncate | --force]

Registers table on subscriber side.

Switches

—expect-sync Table is tagged as in-sync so initial COPY is skipped.
—skip-truncate When doing initial COPY, don't remove old data.
—force Ignore table structure differences.

subscriber remove

londiste.py <config.ini> subscriber remove <table name> ...

Unregisters the table from subscriber. No events will be applied to the table anymore. Actual table will not be touched.

subscriber resync

londiste.py <config.ini> subscriber resync <table name> ...

Tags tables are "not synced." Later replay process will notice this and launch copy process to sync the table again.

Replication commands

replay

The actual replication process. Should be run as daemon with -d switch, because it needs to be always running.

It main task is to get a batches from PgQ and apply them in one transaction.

Basic logic:

copy (internal)

Internal command for initial SYNC. Launched by replay if it notices that some tables are not in sync. The reason to do table copying in separate process is to avoid locking down main replay process for long time.

When using either -s or -k to terminate a running londiste instance, londiste will check if a COPY subprocess is running and kill it first, by sending it SIGTERM. When replay starts, it will check if a table is in a state which should be handled by a COPY subprocess, and if it's the case will ensure that such a process exists and run it it necessary.

Basic logic:

State changes between replay and copy:

State                | Owner  | What is done
---------------------+--------+--------------------
NULL                 | replay | Changes state to "in-copy", launches londiste.py copy process, continues with it's work
in-copy              | copy   | drops indexes, truncates, copies data in, restores indexes, changes state to "catching-up"
catching-up          | copy   | replay events for that table only until no more batches (means current moment),
                     |        | change state to "wanna-sync:<tick_id>" and wait for state to change
wanna-sync:<tick_id> | replay | catch up to given tick_id, change state to "do-sync:<tick_id>" and wait for state to change
do-sync:<tick_id>    | copy   | catch up to given tick_id, both replay and copy must now be at same position. change state to "ok" and exit
ok                   | replay | synced table, events can be applied

Such state changes must guarantee that any process can die at any time and by just restarting it can continue where it left.

"subscriber add" registers table with NULL state. "subscriber add —expect-sync" registers table with ok state.

"subscriber resync" sets table state to NULL.

Utility commands

repair

it tries to achieve a state where tables should be in sync and then compares them and writes out SQL statements that would fix differences.

Syncing happens by locking provider tables against updates and then waiting unitl replay has applied all pending changes to subscriber database. As this is dangerous operation, it has hardwired limit of 10 seconds for locking. If `replay process does not catch up in that time, locks are releases and operation is canceled.

Comparing happens by dumping out table from both sides, sorting them and then comparing line-by-line. As this is CPU and memory-hungry operation, good practice is to run the repair command on third machine, to avoid consuming resources on neither provider nor subscriber.

compare

it syncs tables like repair, but just runs SELECT count(*) on both sides, to get a little bit cheaper but also less precise way of checking if tables are in sync.

Config file

[londiste]
job_name = test_to_subcriber
# source database, where the queue resides
provider_db = dbname=provider port=6000 host=127.0.0.1
# destination database
subscriber_db = dbname=subscriber port=6000 host=127.0.0.1
# the queue where to listen on
pgq_queue_name = londiste.replika
# where to log
logfile = ~/log/%(job_name)s.log
# pidfile is used for avoiding duplicate processes
pidfile = ~/pid/%(job_name)s.pid