PostgreSQL Cluster 功能開發 - Pgpool-II

藉由前文介紹的 PostgreSQL Physical Replication，我們得以將單一 PostgreSQL server 的資料同步至多個 Server，並且每個 Server 都能獨立處理 Read request，不過前文也有提到 PostgreSQL 原生並沒有支援 Load balancing, high availability 等功能、特性，我們需要多一層額外的 Proxy 來達到這些功能。這篇文章將介紹其中一種功能較完善的選擇 — Pgpool-II 以及其簡易配置。

以下內容、圖片與 Pgpool-II 配置大多參考 Pgpool-II 官方文件

What is Pgpool-II? What Features Does It Support?

Pgpool-II 管理一組 PostgreSQL server (Pool of PostgreSQL servers)，如前文所述可以視為一個中介層，即對 Application 來說他就是 PostgreSQL server，對 Server 來說則視 Pgpool-II 為 Client：

flowchart TD
    App["Application"] --> Pgpool["Pgpool-II"]

    Pgpool --> cluster

subgraph cluster["PostgreSQL Servers "]
    direction LR
    Master -- "Physical Replication" --> Slaves
end

其所支援的功能大致條列如下:

Performance
- Load balancing: 將 Read queries 分配至多個 Servers (Write queries 只能導至 Primary server)。
- Connection pooling: 類似 pgbouncer, pgcat，Reuse 已建立的連線來減少連線開銷。
- In-memory query cache: Cache SELECT Query 的結果來加快重複查詢速度。
High availability
- For PostgresQL databases
  - Replication: Pgpool-II-II 也有支援自己的 Replication，不過一般還是會建議用 PostgreSQL 原生的 Physical(Streaming) replication，詳見上一篇介紹。
  - Automatic failover: Primary server 故障時將其自動轉移至其他 Standby server，繼續提供服務。
  - Online Recovery: 將 Detached Server 重新以 Standby server 的身分 Attach 回來。
- For Pgpool-II
  - Watchdog: Coordination service，協調多個 Pgpool-II 節點，進行 Failure detection，避免 Single point of failure 和 Split brain 等現象。當目前的 Active Pgpool-II 故障時，Standby Pgpool-II 會被 Promoted 為 Active 然後接管 Virtual IP，讓上層 Application 連線。

Setup Steps and Configuration

從上一節可以看出 Pgpool-II 支援的功能相當豐富，這一節我將介紹其中關於 Load balancing 和 Automatic failover 大致的設定流程。

以下的 Command 和 Configuration 都僅供參考，請依實際環境與需求調整。

General Setup

Setup PostgreSQL physical replication. For detailed setup instruction, refer to the example in the last article.

Create role for replication and grant permission to show more information in SHOW POOL NODES command result:

1 2	psql -U postgres -tA -c "CREATE ROLE replication WITH LOGIN REPLICATION;" psql -U postgres -tA -c "GRANT pg_monitor TO replication;"

Setup general configuration in pgpool.conf:

backend_clustering_mode = 'streaming_replication'
listen_addresses = '*'
port = 9999
sr_check_user = 'replication' # stream replication check for detecting replication delay
sr_check_period = 5
health_check_user = 'replication' # health check for failover
health_check_period = 10

Load Balancing

Setup backend settings in pgpool.conf:

# backend_application_name*: to show more information in `SHOW POOL NODES` command result
backend_hostname0 = '${PRIMARY_IP}'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/data'
backend_application_name0 = 'primary'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '${REPLICA_1_IP}'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/data'
backend_application_name1 = 'replica_1'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = '${REPLICA_2_IP}'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/data'
backend_application_name1 = 'replica_2'
backend_flag1 = 'ALLOW_TO_FAILOVER'

Start PgPool-II:

# -n: do not run in daemon mode
sudo -u postgres pgpool -n -f ${LOCATION_OF_PGPOOL_CONF} &
# As for how to stop:
sudo -u postgres pgpool -f ${LOCATION_OF_PGPOOL_CONF} stop

Monitor:

1	psql -U postgres -p 9999 -c "SHOW POOL NODES;"

Automatic Failover

Setup failover_command and follow_primary_command in pgpool.conf (For the meaning of the command options below, please refer to the corresponding configuration link):

1 2	failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S' follow_primary_command = '/etc/pgpool-II/follow_primary.sh %d %h %p %D %m %H %M %P %r %R'

General steps of the scripts are described below. Please refer to failover.sh.sample and follow_primary.sh.sample for sample scripts.

failover_command: Triggered when a backend node get detached.

If there’s no new main node (%m) anymore, skip failover.

If the detached node is a standby node (%d != %P):

Drop the replication slot for the detached standby node.

OLD_PRIMARY_NODE_HOST="${11}" # %N: old primary node hostname
OLD_PRIMARY_NODE_PORT="${12}" # %S: old primary node port number

# REPLICA_SLOT_NAME can be composed of the self-defined slot name and $2 (%h: failed node hostname)
psql -U postgres -h ${OLD_PRIMARY_NODE_HOST} -p ${OLD_PRIMARY_NODE_PORT} -c "SELECT pg_drop_replication_slot('${REPLICA_SLOT_NAME}');"

Skip failover.

Otherwise, the detached node is primary node. We therefore promote the selected standby node as new primary node:

1
2
3

# The following command should be executed on the standby node via SSH or other remote access methods.
NEW_MAIN_NODE_PGDATA="${10}" # %R: new main database cluster path
pg_ctl -D ${NEW_MAIN_NODE_PGDATA} -w promote

follow_primary_command: Executed on each standby node after failover_command to synchronize the standby node with the new primary node.
1. Check the connection status of the standby node. Skip follow primary command if the node is not running.
  1
  2
  3
  NODE_HOST="$2" # %h: hostname
  NODE_PORT="$3" # %p: port number
  pg_isready -U postgres -h ${NODE_HOST} -p ${NODE_PORT}
2. Synchronize the standby node with the new primary node via pg_rewind.
  1. Checkpint on the new primary node before running pg_rewind.
  2. Create new replication slot for this standby node.
  3. Run pg_rewind.
3. Try pg_basebackup if pg_rewind failed.
4. If the synchronization succeeded, attach this node back to Pgpool-II. Otherwise, drop the replication slot we just created.

Issues about the Integration of Pgpool-II

至此我們整合 Pgpool-II 為 PostgreSQL physical replication servers 的 Proxy，並且支援 Load balancing, read/write splitting, automatic failover 等功能。關於其他 Pgpool-II 支援功能的設定方式可以參考官方文件與其中的幾個範例。

在文章的最後，我們接續前文最後的討論，延伸討論整合 Pgpool-II 後實務上可能會遇到的問題。

Network Latency

前文最後的討論就有提到 Performance 的部分需要考慮 Network latency ，尤其是如果 Production workload 都是 Light weight query 或是 Buffer cache 常常命中的情境，可能會造成 Scaled-out solution 的效能反而較差，在加入 Pgpool-II 後影響更為顯著。以一個 Query 從 Client 送至 Server 的 Lifecycle 來說明:

Send query to PostgreSQL directly
1. send a query string to PostgreSQL
2. process the query on PostgreSQL
3. return the query result
Send query to Pgpool-II
1. send a query string to Pgpool-II
2. parse the query on Pgpool-II
3. send a query string to PostgreSQL
4. process the query on PostgreSQL
5. return the query result to Pgpool-II
6. return the query result to client

可以觀察送至 Pgpool-II 的 Case，其明顯多了幾段 Process communitcation 的開銷，也會使 Network latency 的影響更為顯著。當然在實務上的 Workload 應該會更加複雜，因此通常 Scaled-out solution 還是會有其效益。

Connection Pooling: Pgpool-II vs Pgbouncer

Connection pooling 是指維護資料庫連線的快取 (Pool)，在有相同屬性的 (User, database, etc.) 請求進來時重複使用他們，以降低反覆連線的開銷，以 PostgreSQL 來說最有名的就是 Pgbouncer。

上文中有提到 Pgpool-II 的功能完善，對 Connection pooling 也有支援，雖然沒有提及設定方式，不過 Pgpool-II 其實預設就有開啟 Connection pooling 的功能，可以再調整 connection_cache, max_pool 等參數來控制功能。

不過實際嘗試後會發現 Pgpool-II 和 Pgbouncer 的 Connection pooling 機制上有所不同，我們首先介紹 Pgpool-II 的大致架構：

flowchart TD
    parent["Pgpool-II parent"] -- Fork --> children

subgraph children["Preforked processes"]
    direction LR
    child1["Pgpool-II child"]
    child2["Pgpool-II child"]
    child3["Pgpool-II child"]
end
    
        client["SQL Client"] -- Query -->children

即 Pgpool-II 會 Prefork num_init_children (預設 32) 數量的 Process，並藉由這些 Child process 來 Serve clients。而前面提到 Pgpool-II 和 Pgbouncer 的 Connection pooling 機制不同，詳見比較如下：

Pgpool-II: the connection pool is shared in each child process. However, we cannot control which child process a client connects to.
PgBouncer: the connection pool is shared between all clients.

換言之，Pgpool-II 有可能因為連線的 Child process 不同而導致 Connection cache 沒有命中。這點在 Client 的連線屬性 (User, database, etc.) 都相同時沒有影響，但反之如果 Database 會被多個不同的 Application 連線，則需要考慮使用別的 Solution，例如關掉 Pgpool-II 的 Connection pool，每個 Application 自己架 Pgbouncer。