You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we create a TaskContext, the PartitionSpec and Schema are serialized to JSON strings and passed to each BaseContentScanTask. When we later want to inspect schema and spec of the tasks, they need to be deserialized. This can be very expensive if we have lots of columns in the schema. In our use case, the table has over 2k columns.
So I think we should store the schema/spec in TaskContext and pass them to BaseContentScanTask.
Query engine
None
Willingness to contribute
I can contribute this improvement/feature independently
I would be willing to contribute this improvement/feature with guidance from the Iceberg community
I cannot contribute this improvement/feature at this time
The text was updated successfully, but these errors were encountered:
Feature Request / Improvement
When we create a
TaskContext
, thePartitionSpec
andSchema
are serialized to JSON strings and passed to eachBaseContentScanTask
. When we later want to inspect schema and spec of the tasks, they need to be deserialized. This can be very expensive if we have lots of columns in the schema. In our use case, the table has over 2k columns.So I think we should store the schema/spec in
TaskContext
and pass them toBaseContentScanTask
.Query engine
None
Willingness to contribute
The text was updated successfully, but these errors were encountered: