Best way to get two billion rows out

I have a partitioned table that contains ~2 billion records in total. I need to get each row out, to publish to a queuing system for follow-on processing.

I was starting with SELECT ... FROM ... ORDER BY created_at ASC LIMIT <num> OFFSET <num>. I am using a limit of 10,000. I noticed as I got further into the results (2 million rows) that the queries were taking longer to get the next 10,000 rows.

Is there a better way to get this data out? Should I perform an export and read the files from disk instead?

Thanks in advance!

If you can use a client which supports protocol level cursors that would allow you to do a SELECT * FROM t without order/limit or offset and still only fetch a couple hundred or thousand records at a time.

The JDBC client does support this, another client would be asyncpg

#!/usr/bin/env python3

import asyncpg
import asyncio


async def main():
    conn = await asyncpg.connect(host='localhost', user='crate')
    async with conn.transaction():
        # this fetches 50 records at a time, but could be increased by passing prefetch=1000 to the cursor call
        async for record in conn.cursor('SELECT generate_series(0, 100)'):
            print(record)


if __name__ == "__main__":
    asyncio.run(main())

With JDBC:

try (Connection conn = DriverManager.getConnection(url(RW), properties)) {
    conn.setAutoCommit(false);
    try (Statement st = conn.createStatement()) {
        st.setFetchSize(1000);
        try (ResultSet resultSet = st.executeQuery("select x from t")) {
            while (resultSet.next()) {
                // resultSet.getInt(1);
            }
        }
    }
}

Thanks for the suggestion. I am using Go at the moment, but could try python or JDBC.

Might be that go also supports it, we haven’ tested it yet. Are you using pgx ?