-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data: Add partition stats writer and reader #11216
base: main
Are you sure you want to change the base?
Conversation
b7406ff
to
941505a
Compare
941505a
to
05a80f6
Compare
|
||
@Override | ||
@SuppressWarnings("checkstyle:CyclomaticComplexity") | ||
public boolean equals(Object other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StructLikeMap
was previously handling this implicitly. But when PartitionStatsRecord
wraps PartitionStats
now for the writers, it needs to override equals
and hashcode
StructLike coercedPartition = | ||
PartitionUtil.coercePartition(partitionType, spec, file.partition()); | ||
StructLike key = keyTemplate.copyFor(coercedPartition); | ||
Record key = coercedPartitionRecord(file, spec, partitionType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need Record
instead of PartitionData
for the writers.
Cannot keep this conversion in the data module as it just to wraps the same PartitionStats
object.
|
||
/** Wraps the {@link PartitionStats} as {@link Record}. Used by generic writers and readers. */ | ||
public class PartitionStatsRecord implements Record, StructLike { | ||
private static final LoadingCache<StructType, Map<String, Integer>> NAME_MAP_CACHE = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class is similar to GenericRecord
but for a specific partition stats schema.
partitionData.set(1, c3); | ||
return partitionData; | ||
private static StructLike partitionData(Types.StructType partitionType, String c2, String c3) { | ||
GenericRecord record = GenericRecord.create(partitionType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stores the Record
instead of PartitionData
in the core logic. So, testcase is updated to expect Record
.
Schema schema, | ||
PartitionSpec spec, | ||
int formatVersion, | ||
Map<String, String> properties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no option to pass the table properties before.
Needed to pass different file format for paramterized test.
@aokolnychyi: This PR is ready. But as we discussed previously, this PR wraps the I will explore adding the internal writers for Parquet and Orc. Similar to #11108. |
Introduce APIs to write the partition stats into files in table default format using Iceberg generic writers and readers.