Crate DB stream result with .NET and Npgsql

Hello,

I’m querying data from a big table (let say > 50Gb) in .NET Core with Npgsql.As soon as reader executes line “ReadAsync”, CrateDB starts reading entire dataset into memory, failing with OutOfMemory exception, once it reaches 32Gb. Is it possible to do “streaming read”, only loading partial result. One obvious use case would be read data, make some transformation and output into CSV.

I can send .NET solution if needed.

.NET 6
Crate BD version 5.0.0
Npgsql 6.0.6

using Npgsql;
using System.Data;

const string ConnectionString = "Host=127.0.0.1;Username=crate;SSL Mode=Prefer;Write Buffer Size=65536;";

string query = "select * from anyBigTable";

await foreach (var record in ExecuteQueryAsync(query))
{
    string row = string.Empty;
    foreach (var field in record)
    {
        row += $"{field} ";
    }

    Console.WriteLine(row);
}

Console.ReadLine();


async IAsyncEnumerable<object[]> ExecuteQueryAsync(string query)
{
    using var connection = new NpgsqlConnection(ConnectionString);
    await connection.OpenAsync();
    
    using var command = new NpgsqlCommand(connection: connection, cmdText: query);
    using var reader = await command.ExecuteReaderAsync(CommandBehavior.SequentialAccess);
    if (!reader.HasRows)
    {
        yield break;
    }

    bool read = await reader.ReadAsync();
    while (read)
    {
        object[] row = new object[reader.FieldCount];
        reader.GetValues(row);

        yield return row;

        read = await reader.ReadAsync();
    }
}

Hi @srj2013

CrateDB does support the ability to use a fetch size and stream results with compatible clients / drivers. (e.g. JDBC / asyncpg / …). Unfortunately I don’t think Npgsql supports this. So your best option is to define a column, by which the data can be split into chunks (e.g.) time interval and iterate over it with WHERE <condition> or use a file based approach with COPY TO

soon…
It is planned for the next version CrateDB 5.1 (to be released end of Q3 / Start of Q4) to support server-side CURSORs. This will allow also other clients to scroll through results sets and fetch parts at a time.

1 Like