Architecture

Preview release v0.7.1

Consider these main architectural aspects when deploying EDB Postgres Distributed in Kubernetes.

EDB Postgres Distributed for Kubernetes is a Kubernetes operator designed to deploy and manage EDB Postgres Distributed clusters running in private, public, hybrid, or multi-cloud environments.

Relationship with EDB Postgres Distributed

EDB Postgres Distributed (PGD) is a multi-master implementation of Postgres designed for high performance and availability. PGD generally requires deployment using Trusted Postgres Architect (TPA), a tool that uses Ansible to provision and deploy PGD clusters.

EDB Postgres Distributed for Kubernetes offers a different way of deploying PGD clusters, leveraging containers and Kubernetes. The advantages are that the resulting architecture:

Is self-healing and robust.
Is managed through declarative configuration.
Takes advantage of the vast and growing Kubernetes ecosystem.

Relationship with EDB Postgres for Kubernetes

A PGD cluster consists of one or more PGD groups, each having one or more PGD nodes. A PGD node is a Postgres database. EDB Postgres Distributed for Kubernetes internally manages each PGD node using the Cluster resource as defined by EDB Postgres for Kubernetes, specifically a cluster with a single instance (that is, no replicas).

You can configure the single PostgreSQL instance created by each cluster declaratively using the .spec.cnp section of the PGD group spec.

In EDB Postgres Distributed for Kubernetes, as in EDB Postgres for Kubernetes, the underlying database implementation is responsible for data replication. However, it's important to note that failover and switchover work differently, entailing Raft election and nominating new write leaders. EDB Postgres for Kubernetes handles only the deployment and healing of data nodes.

Managing PGD using EDB Postgres Distributed for Kubernetes

The EDB Postgres Distributed for Kubernetes operator can manage the complete lifecycle of PGD clusters. As such, in addition to PGD nodes (represented as single-instance clusters), it needs to manage other objects associated with PGD.

PGD relies on the Raft algorithm for distributed consensus to manage node metadata, specifically agreement on a write leader. Consensus among data nodes is also required for operations such as generating new global sequences or performing distributed DDL.

These considerations force additional actors in PGD above database nodes.

EDB Postgres Distributed for Kubernetes manages the following:

Data nodes. A node is a database and is managed by EDB Postgres for Kubernetes, creating a cluster with a single instance.
Witness nodes are basic database instances that don't participate in data replication. Their function is to guarantee that consensus is possible in groups with an even number of data nodes or after network partitions. Witness nodes are also managed using a single-instance Cluster resource.
PGD proxies act as Postgres proxies with knowledge of the write leader. PGD proxies need information from Raft to route writes to the current write leader.

Proxies and routing

PGD groups assume full mesh connectivity of PGD nodes. Each node must be able to connect to every other node using the appropriate connection string (a libpq-style DSN). Write operations don't need to be sent to every node. PGD takes care of replicating data after it's committed to one node.

For performance, we often recommend sending write operations mostly to a single node: the write leader. Raft identifies the node that's the write leader and holds metadata about the PGD nodes. PGD proxies transparently route writes to write leaders and can quickly pivot to the new write leader in case of switchover or failover.

It's possible to configure Raft subgroups, each of which can maintain a separate write leader. In EDB Postgres Distributed for Kubernetes, a PGD group containing a PGD proxy comprises a Raft subgroup.

Two kinds of routing are available with PGD proxies:

Global routing uses the top-level Raft group and maintains one global write leader.
Local routing uses subgroups to maintain separate write leaders. Local routing is often used to achieve geographical separation of writes.

In EDB Postgres Distributed for Kubernetes, local routing is used by default, and a configuration option is available to select global routing.

For more information, see Proxies, Raft, and Raft subgroups in the PGD documentation.

PGD architectures and high availability

To make good use of PGD's distributed multi-master capabilities and to offer high availability, we recommend several architectures.

The Always On architectures are built from either one group in a single location or two groups in two separate locations. See Choosing your architecture in the PGD documentation for more information.

Deploying PGD on Kubernetes

EDB Postgres Distributed for Kubernetes leverages Kubernetes to deploy and manage PGD clusters. As such, some adaptations are necessary to translate PGD into the Kubernetes ecosystem.

Images and operands

PGD can be configured to run one of three Postgres distributions. See Choosing a Postgres distribution in the PGD documentation to understand the features of each distribution.

To function in Kubernetes, containers are provided for each Postgres distribution. These are the operands. In addition, the operator images are kept in those same repositories.

See EDB private image registries for details on accessing the images.

Kubernetes architecture

Kubernetes natively provides the possibility to span separate physical locations connected to each other by way of redundant, low-latency, private network connectivity. These physical locations are also known as data centers, failure zones, or, more frequently, availability zones.

Being a distributed system, the recommended minimum number of availability zones for a Kubernetes cluster is three to make the control plane resilient to the failure of a single zone. This means that each data center is active at any time and can run workloads simultaneously.

EDB Postgres Distributed for Kubernetes can be installed in a single Kubernetes cluster or across multiple Kubernetes clusters.

Single Kubernetes cluster

A multi-availability-zone Kubernetes architecture is typical of Kubernetes services managed by cloud providers. Such an architecture enables the EDB Postgres Distributed for Kubernetes and the EDB Postgres for Kubernetes operators to schedule workloads and nodes across availability zones, considering all zones active.

Kubernetes cluster spanning over 3 independent data centers

PGD clusters can be deployed in a single Kubernetes cluster and take advantage of Kubernetes availability zones to enable high-availability architectures, including the Always On recommended architectures.

You can realize the Always On, single-location architecture shown in Choosing your architecture in the PGD documentation on a single Kubernetes cluster with three availability zones.

Always On Single Region

The EDB Postgres Distributed for Kubernetes operator can control the scheduling of pods (that is, which pods go to which data center) using affinity, tolerations, and node selectors, as is the case with EDB Postgres for Kubernetes. Individual scheduling controls are available for proxies as well as nodes.

See the Kubernetes documentation on scheduling, and Scheduling in the EDB Postgres for Kubernetes documentation for more information.

Multiple Kubernetes clusters

PGD clusters can also be deployed in multiple Kubernetes clusters that can reliably communicate with each other.

Multiple Kubernetes clusters

Always On multi-location PGD architectures can be realized on multiple Kubernetes clusters that meet the connectivity requirements. For more information, see Connectivity.