bireme

Bireme is an incremental synchronization tool for the Greenplum / HashData data warehouse

View project on GitHub

Getting Started Guide

Part 1 Maxwell Data Source

In this part, we will demonstrate how to use bireme cooperated with maxwell to synchronize a table named demo.test in MySQL to another table named public.test in GreenPlum database. Before we start, users need to deploy MySQL, Kafka and GreenPlum.

Note: All tables must contain primary key.

1. Preparation

1.1 MySQL configuration

Maxwell can only operate when row-based replication is on.

$ vi my.cnf

[mysqld]
server-id=1
log-bin=master
binlog_format=row

Create maxwell user and grant it authority.

mysql> GRANT ALL on maxwell.* to 'maxwell'@'%' identified by 'XXXXXX';
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE on *.* to 'maxwell'@'%';

Create a new table demo.test.

create database demo;
create table demo.test (id int primary key, name varchar(10));

1.2 Create topic in Kafka

Create a topic named mytopic for Maxwell producing message.

bin/kafka-topics.sh --create --zookeeper zookeeperhost:2181 --replication-factor 1 --partitions 1 --topic mytopic

1.3 Create corresponding table in GreenPlum

create table test (id int primary key, name varchar(10));

1.4 Start Maxwell

bin/maxwell --user='maxwell' --password='XXXXXX' \
--host=mysqlhost --producer=kafka \
--kafka_topic = mytopic \
--kafka.bootstrap.servers=kafkahost:9092

Use --host option and --kafka.bootstrap.servers option to designate MySQL host and Kafka host.

2 Config bireme and start

2.1 Edit etc/config.properties

Specify connection infomation to the target database, as well as data source information.

target.url = jdbc:postgresql://gpdbhost:5432/XXXXXX
target.user = XXXXXX
target.passwd = XXXXXX

data_source = mysql

mysql.type = maxwell
mysql.kafka.server = kafkahost:9092
mysql.kafka.topic = mytopic

2.2 Edit table mapping file

As we appoint mysql as data source name, we need to create a file mysql.properties in etc directory and insert the following.

demo.test = public.test

2.3 Start bireme

bin/bireme start

When you get the message The bireme has started, bireme has successfully started. Then you could insert data in MySQL and it will be synced to GreenPlum.

If you need to stop, execute the following.

bin/bireme stop

Log files are located in logs directory.

JMX

Environment Variable Default Description
JMX_PORT   Enables JMX and specifies the port number that will be used for JMX.
JMX_AUTH false Whether JMX clients must use password authentication when connecting.
JMX_SSL false Whether JMX clients connect using SSL/TLS.
JMX_LOG4J true Whether the Log4J JMX MBeans should be disabled.

By default we allow local JMX connections. If you want to close JMX, set environment variable JMX_DISABLE=false.

HEAP SIZE

You can use MAX_HEAP variable to set the maxmium memory of JVM. The value is used to specify the JVM parameter -Xmx${MAX_HEAP}m

Part 2 Debezium Data Source

In this part, we will demonstrate how to use bireme cooperated with debezium to synchronize a table named public.source in Postgres to another table named public.target in GreenPlum database. For convenience, we will use docker to set up Postgres, Kafka and Kafka Connect.

Note: All tables must contain primary key.

1. Preparation

1.1 Set up in docker

Start Postgres

$ docker run -it --name Postgres -p 5432:5432 -d debezium/postgres:latest

Start Zookeeper

$ docker run -it --name Zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 -d debezium/zookeeper:0.5

Start Kafka

$ docker run -it --name Kafka -p 9092:9092 \
				-e ZOOKEEPER_CONNECT=Zookeeper:2181 \
				--link Zookeeper:Zookeeper -d debezium/kafka:0.5

Start Kafka Connect

$ docker run -it --name Connect -p 8083:8083 \
				-e BOOTSTRAP_SERVERS=Kafka:9092 \
				-e CONFIG_STORAGE_TOPIC=my_connect_configs \
				-e OFFSET_STORAGE_TOPIC=my_connect_offsets \
				--link Zookeeper:Zookeeper --link Kafka:Kafka --link Postgres:Postgres \
				-d debezium/connect:0.5

1.2 Monitor the Postgres database

We will use curl to submit to Kafka Connect service a JSON request message with information about the connector we want to start.

$ CONNECTOR_CONFIG='
{
    "name": "inventory-connector",
    "config": {
        "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
        "database.hostname": "postgres",
        "database.port": "5432",
        "database.user": "postgres",
        "database.password": "postgres",
        "database.dbname": "postgres",
        "database.server.name": "debezium"
    }
}'

$ curl -i -X POST -H "Accept:application/json" \
                -H "Content-Type:application/json" localhost:8083/connectors/ \
                -d "${CONNECTOR_CONFIG}"

1.3 Create table in Postgres and Greenplum separately

In Postgres

create table target (id int primary key, name varchar(10));

In Greenplum

create table source (id int primary key, name varchar(10));

2 Config bireme and start

2.1 Edit etc/config.properties

Specify connection infomation to the target database, as well as data source information.

target.url = jdbc:postgresql://gpdbhost:5432/XXXXXX
target.user = XXXXXX
target.passwd = XXXXXX

data_source = debezium

debezium.type = debezium
debezium.kafka.server = kafkahost:9092

Note: The value of data_source must be identical to the database.server.name configuration in Section 1.2 and we don’t need to designate a topic.

2.2 Edit table mapping file

As we appoint debezium as data source name, we need to create a file debezium.properties in etc directory and insert the following.

public.source = public.target

2.3 Start bireme

Reference the same step in Part 1.