Replicate data with Airbyte
Learn how to replicate data from Neon with Airbyte
Neon's logical replication feature allows you to replicate data from your Neon Postgres database to external destinations.
Airbyte is an open-source data integration platform that moves data from a source to a destination system. Airbyte offers a large library of connectors for various data sources and destinations.
In this guide, you will learn how to define your Neon Postgres database as a data source in Airbyte so that you can stream data to one or more of Airbyte's supported destinations.
Prerequisites
- An Airbyte account
- A Neon account
- Read the important notices about logical replication in Neon before you begin
Prepare your source Neon database
This section describes how to prepare your source Neon database (the publisher) for replicating data to your destination Neon database (the subscriber).
Enable logical replication in Neon
important
Enabling logical replication modifies the Postgres wal_level
configuration parameter, changing it from replica
to logical
for all databases in your Neon project. Once the wal_level
setting is changed to logical
, it cannot be reverted. Enabling logical replication also restarts all computes in your Neon project, meaning active connections will be dropped and have to reconnect.
To enable logical replication in Neon:
- Select your project in the Neon Console.
- On the Neon Dashboard, select Settings.
- Select Logical Replication.
- Click Enable to enable logical replication.
You can verify that logical replication is enabled by running the following query from the Neon SQL Editor:
Create a Postgres role for replication
It's recommended that you create a dedicated Postgres role for replicating data. The role must have the REPLICATION
privilege. The default Postgres role created with your Neon project and roles created using the Neon CLI, Console, or API are granted membership in the neon_superuser role, which has the required REPLICATION
privilege.
Grant schema access to your Postgres role
If your replication role does not own the schemas and tables you are replicating from, make sure to grant access. For example, the following commands grant access to all tables in the public
schema to Postgres role replication_user
:
GRANT USAGE ON SCHEMA public TO replication_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO replication_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO replication_user;
Granting SELECT ON ALL TABLES IN SCHEMA
instead of naming the specific tables avoids having to add privileges later if you add tables to your publication.
Create a replication slot
Airbyte requires a dedicated replication slot. Only one source should be configured to use this replication slot.
Airbyte uses the pgoutput
plugin in Postgres for decoding WAL changes into a logical replication stream. To create a replication slot called airbyte_slot
that uses the pgoutput
plugin, run the following command on your database using your replication role:
SELECT pg_create_logical_replication_slot('airbyte_slot', 'pgoutput');
airbyte_slot
is the name assigned to the replication slot. You will need to provide this name when you set up your Airbyte source.
Create a publication
Perform the following steps for each table you want to replicate data from:
-
Add the replication identity (the method of distinguishing between rows) for each table you want to replicate:
ALTER TABLE <table_name> REPLICA IDENTITY DEFAULT;
In rare cases, if your tables use data types that support TOAST or have very large field values, consider using
REPLICA IDENTITY FULL
instead:ALTER TABLE <table_name> REPLICA IDENTITY FULL;
-
Create the Postgres publication. Include all tables you want to replicate as part of the publication:
CREATE PUBLICATION airbyte_publication FOR TABLE <table_name, table_name, table_name>;
Alternatively, you can create a publication for all tables:
CREATE PUBLICATION airbyte_publication FOR ALL TABLES;
The publication name is customizable. Refer to the Postgres docs if you need to add or remove tables from your publication.
note
The Airbyte UI currently allows selecting any table for Change Data Capture (CDC). If a table is selected that is not part of the publication, it will not be replicated even though it is selected. If a table is part of the publication but does not have a replication identity, the replication identity will be created automatically on the first run if the Postgres role you use with Airbyte has the necessary permissions.
Create a Postgres source in Airbyte
-
From your Airbyte Cloud account, select Sources from the left navigation bar, search for Postgres, and then create a new Postgres source.
-
Enter the connection details for your Neon database. You can get these details from your Neon connection string, which you'll find in the Connection Details widget on the Dashboard of your Neon project. For example, given a connection string like this:
postgresql://alex:AbC123dEf@ep-cool-darkness-123456.us-east-2.aws.neon.tech/dbname?sslmode=require
Enter the details in the Airbyte Create a source dialog as shown below. Your values will differ.
- Host: ep-cool-darkness-123456.us-east-2.aws.neon.tech
- Port: 5432
- Database Name: dbname
- Username: replication_user
- Password: AbC123dEf
-
Under Optional fields, list the schemas you want to sync. Schema names are case-sensitive, and multiple schemas may be specified. By default,
public
is the only selected schema. -
Select an SSL mode. You will most frequently choose
require
orverify-ca
. Both of these options always require encryption. Theverify-ca
mode requires a certificate. Refer to Connect securely for information about the location of certificate files you can use with Neon. -
Under Advanced:
- Select Read Changes using Write-Ahead Log (CDC) from available replication methods.
- In the Replication Slot field, enter the name of the replication slot you created previously:
airbyte_slot
. - In the Publication field, enter the name of the publication you created previously:
airbyte_publication
.
Allow inbound traffic
If you are on Airbyte Cloud, and you are using Neon's IP Allow feature to limit IP addresses that can connect to Neon, you will need to allow inbound traffic from Airbyte's IP addresses. You can find a list of IPs that need to be allowlisted in the Airbyte Security docs. For information about configuring allowed IPs in Neon, see Configure IP Allow.
Complete the source setup
To complete your source setup, click Set up source in the Airbyte UI. Airbyte will test the connection to your database. Once this succeeds, you've successfully configured an Airbyte Postgres source for your Neon database.
Configure a destination
To complete your data integration setup, you can now add one of Airbyte's many supported destinations, such as Snowflake, BigQuery, or Kafka, to name a few. After configuring a destination, you'll need to set up a connection between your Neon source database and your chosen destination. Refer to the Airbyte documentation for instructions:
References
- What is an ELT data pipeline?
- Logical replication - PostgreSQL documentation
- Publications - PostgreSQL documentation
Need help?
Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more details, see Getting Support.