Postgres Data Export to S3

The cluster focuses on tools, scripts, and strategies for exporting or syncing data from Postgres (and similar databases) to S3 in Parquet or Iceberg formats for querying with Athena, DuckDB, BigQuery, or ClickHouse, often for analytics workloads.

📉 Falling 0.4x Databases
2,336
Comments
19
Years Active
5
Top Authors
#8144
Topic ID

Activity Over Time

2008
4
2009
12
2010
12
2011
16
2012
39
2013
64
2014
88
2015
106
2016
102
2017
127
2018
143
2019
162
2020
230
2021
224
2022
214
2023
262
2024
323
2025
199
2026
9

Keywords

RAM e.g S3 OSS COPY AWS CSV BigQuery NFS VALUES parquet clickhouse postgres duckdb s3 sql csv data query etl

Sample Comments

exAspArk Nov 7, 2024 View on HN

Thank you!Yes, absolutely!1) You could use BemiDB to sync your Postgres data (e.g., partition time-series tables) to S3 in Iceberg format. Iceberg is essentially a "table" abstraction on top of columnar Parquet data files with a schema, history, etc.2) If you don't need strong consistency and fine with delayed data (the main trade-off), you can use just BemiDB to query and visualize all data directly from S3. From a query perspective, it's like DuckDB that talks Post

mind-blight Sep 26, 2024 View on HN

I've been using duckdb to import data into postgres (especially CSVs and JSON) and it has been really effective.Duckdb can run SQL across the different data formats and insert or update directly into postgres. I run duckdb with python and Prefect for batch jobs, but you can use whatever language or scheduler you perfer.I can't recommend this setup enough. The only weird things I've run into is a really complex join across multiple postgres tables and parquet files had a bug

oulipo Oct 17, 2024 View on HN

Cool, would this be better than using a clickhouse / duckdb extension that reads postgres and saves to Parquet?What would be recommended to output regularly old data to S3 as parquet file? To use a cron job which launches a second Postgres process connecting to the database and extracting the data, or using the regular database instance? doesn't that slow down the instance too much?

pid-1 Jul 6, 2023 View on HN

Why not use parquet files + AWS Athena?

gizmodo59 Sep 7, 2022 View on HN

Yeah.. It would be much easier to copy the data to S3/any object storage (better to convert it into a columnar format like parquet) and query it directly using a SQL on lake engine like Dremio or Athena or S3Select would work too.

eatonphil Jun 25, 2021 View on HN

Why won't you just copy your parquet data into Postgres?

jhgg Jan 3, 2015 View on HN

You should be able to achieve that with this tool paired with postgres foreign data wrappers!

scottpersinger Mar 18, 2023 View on HN

Built this recently to help a friend setup a Snowflake warehouse from their Postgres database. Also tested it with ClickHouse which is cool for running locally. Uses simple COPY which is fast and doesn't require binlog access, but doesn't support real-time replication as a result.

rockostrich May 5, 2022 View on HN

Cloud SQL has BigQuery connections that can be leveraged. But yea, this seems like a nice solution if you have a postgres instance outside of Cloud SQL. Another approach would be to write the CDC to a message queue and archive that to parquet.

bduerst Apr 10, 2014 View on HN

I know it's a pain to get data into it, but how about Big Query?