Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ManifestGroup::TaskContext should cache partition spec #11235

Open
2 of 3 tasks
lirui-apache opened this issue Sep 29, 2024 · 0 comments
Open
2 of 3 tasks

ManifestGroup::TaskContext should cache partition spec #11235

lirui-apache opened this issue Sep 29, 2024 · 0 comments
Labels
improvement PR that improves existing functionality

Comments

@lirui-apache
Copy link
Contributor

Feature Request / Improvement

When we create a TaskContext, the PartitionSpec and Schema are serialized to JSON strings and passed to each BaseContentScanTask. When we later want to inspect schema and spec of the tasks, they need to be deserialized. This can be very expensive if we have lots of columns in the schema. In our use case, the table has over 2k columns.

So I think we should store the schema/spec in TaskContext and pass them to BaseContentScanTask.

Query engine

None

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time
@lirui-apache lirui-apache added the improvement PR that improves existing functionality label Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement PR that improves existing functionality
Projects
None yet
Development

No branches or pull requests

1 participant