Row-oriented reader needs to be able to skip fields #190

GKrivosheev-rms · 2021-04-29T01:23:28Z

Currently, all fields and properties of TRow must be present in the file for Row-based reader to work.
Often row properties are computed, unfilled or otherwise do not needs to be read from the file. We need a way to mark the type so that those members are not deserialized from file.

Proposal:
Add IgnoreColumn attribute to mark columns that must be skipped, such as:

struct MyRow
{
    [IgnoreColumn]
    public DateTime CurrentDate => DateTime.Now;

    [MapToColumn("ColumnB")]
    public string MyValue;
}
using var reader = ParquetFile.CreateRowReader<MyRow>("example.parquet");
...

Alternatively, make reader and writer symmetrical, and allow reader to be customied with list of columns, such as below. Note that the columns are names of members in the class, not in the file. This will allow to set a subset of members in the type.

public static ParquetRowReader<TTuple> CreateRowReader<TTuple>(string path, string[] columnNames = null);

GPSnoopy added the enhancement New feature or request label May 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Row-oriented reader needs to be able to skip fields #190

Row-oriented reader needs to be able to skip fields #190

GKrivosheev-rms commented Apr 29, 2021 •

edited

Loading

Row-oriented reader needs to be able to skip fields #190

Row-oriented reader needs to be able to skip fields #190

Comments

GKrivosheev-rms commented Apr 29, 2021 • edited Loading

GKrivosheev-rms commented Apr 29, 2021 •

edited

Loading