Backup on object stores

Preview release v0.7.1

EDB Postgres Distributed for Kubernetes supports online/hot backup of PGD clusters through physical backup and WAL archiving on an object store. This means that the database is always up (no downtime required) and that point-in-time recovery (PITR) is available.

Common object stores

Multiple object stores are supported, such as AWS S3, Microsoft Azure Blob Storage, Google Cloud Storage, MinIO Gateway, or any S3-compatible provider. Given that EDB Postgres Distributed for Kubernetes configures the connection with object stores by relying on EDB Postgres for Kubernetes, see the EDB Postgres for Kubernetes cloud provider support documentation for more information.

Important

The EDB Postgres for Kubernetes documentation's Cloud Provider configuration section is available at spec.backup.barmanObjectStore. In EDB Postgres Distributed for Kubernetes examples, the object store section is at a different path: spec.backup.configuration.barmanObjectStore.

WAL archive

WAL archiving is the process that sends WAL files to the object storage, and it's essential to execute online/hot backups or PITR. In EDB Postgres Distributed for Kubernetes, each PGD node is set up to archive WAL files in the object store independently.

The WAL archive is defined in the PGD group spec.backup.configuration.barmanObjectStore stanza and is enabled as soon as a destination path and cloud credentials are set. You can choose to compress WAL files before uploading them and also or alternatively encrypt them. In adddition, you can enable parallel WAL archiving.

apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
  backup:
    configuration:
      barmanObjectStore:
        [...]
        wal:
          compression: gzip
          encryption: AES256
          maxParallel: 8

For more information, see the EDB Postgres for Kubernetes WAL archiving documentation.

Scheduled backups

Scheduled backups are the recommended way to configure your backup strategy in EDB Postgres Distributed for Kubernetes. When the PGD group spec.backup.configuration.barmanObjectStore stanza is configured, the operator selects one of the PGD data nodes as the elected backup node for which it creates a Scheduled Backup resource.

The .spec.backup.cron.schedule field allows you to define a cron schedule specification, expressed in the Go cron package format.

apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
  backup:
    cron:
      schedule: "0 0 0 * * *"
      backupOwnerReference: self
      suspend: false
      immediate: true

You can suspend scheduled backups by setting .spec.backup.cron.suspend to true. This setting prevents any new backup from being scheduled.

If you want to execute a backup as soon as the ScheduledBackup resource is created, you can set .spec.backup.cron.immediate to true.

.spec.backupOwnerReference indicates the ownerReference to use in the created backup resources. The choices are:

none — No owner reference for created backup objects.
self — Sets the scheduled backup object as owner of the backup.
cluster — Sets the cluster as owner of the backup.

Note

The EDB Postgres for Kubernetes ScheduledBackup object contains the cluster option to specify the cluster to back up. This option is currently not supported by EDB Postgres Distributed for Kubernetes and is ignored if specified.

If an elected backup node is deleted, the operator transparently elects a new backup node and reconciles the Scheduled Backup resource accordingly.

Retention policies

EDB Postgres Distributed for Kubernetes can manage the automated deletion of backup files from the backup object store using retention policies based on the recovery window. This process also takes care of removing unused WAL files and WALs associated with backups that are scheduled for deletion.

You can define your backups with a retention policy of 30 days:

apiVersion: pgd.k8s.enterprisedb.io/v1beta1
kind: PGDGroup
[...]
spec:
  backup:
    configuration:
      retentionPolicy: "30d"

For more information, see the EDB Postgres for Kubernetes retention policies in the EDB Postgres for Kubernetes documentation.

Important

Currently, the retention policy is applied only for the elected backup node backups and WAL files. Given that every other PGD node also archives its own WALs independently, it's your responsibility to manage the lifecycle of those WAL files, for example by leveraging the object storage data-retention policy. Also, in case you have an object storage data retention policy set up on every PGD node directory, make sure it's not overlapping or interfering with the retention policy managed by the operator.

Compression algorithms

Backups and WAL files are uncompressed by default. However, multiple compression algorithms are supported. For more information, see the EDB Postgres for Kubernetes compression algorithms documentation.

Tagging of backup objects

It's possible to specify tags as key-value pairs for the backup objects, namely base backups, WAL files, and history files. For more information, see the EDB Postgres for Kubernetes documentation about tagging of backup objects.

On-demand backups of a PGD node

A PGD node is represented as single-instance EDB Postgres for Kubernetes Cluster object. As such, in case of need, it's possible to request an on-demand backup of a specific PGD node by creating a EDB Postgres for Kubernetes Backup resource. To do that, see EDB Postgres for Kubernetes on-demand backups in the EDB Postgres for Kubernetes documentation.

Hint

You can retrieve the list of EDB Postgres for Kubernetes clusters that make up your PGD group by running: kubectl get cluster -l k8s.pgd.enterprisedb.io/group=my-pgd-group -n my-namespace