Skip to content

Commit

Permalink
Refactor Confusing Dwrf Reader RowNumber Method (#11029)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #11029

We noticed that the `RowNumber` method in Dwrf reader produces two different outputs based on when it is called. It returns the current position of the reader after a seek operation and then the previous position of the reader before a `next` read operation was called. This makes it almost unusable without an understanding of its underlying implementation. In this diff, we modify it's design to return the current position of the reader for all calls.

Differential Revision: D62412793
  • Loading branch information
MacVincent Agha-Oko authored and facebook-github-bot committed Sep 19, 2024
1 parent f9d2ba4 commit bf950ce
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 3 deletions.
11 changes: 11 additions & 0 deletions velox/dwio/dwrf/reader/DwrfReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,17 @@ int64_t DwrfRowReader::nextRowNumber() {
return kAtEnd;
}

uint64_t DwrfRowReader::rowNumber() {
const auto nextRow = nextRowNumber();
if (nextRow == kAtEnd) {
if (isEmptyFile()) {
return 0;
}
return getReader().footer().numberOfRows();
}
return firstRowOfStripe_[currentStripe_] + currentRowInStripe_;
}

int64_t DwrfRowReader::nextReadSize(uint64_t size) {
VELOX_DCHECK_GT(size, 0);
if (nextRowNumber() == kAtEnd) {
Expand Down
4 changes: 1 addition & 3 deletions velox/dwio/dwrf/reader/DwrfReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,7 @@ class DwrfRowReader : public StrideIndexProvider,
return selectedSchema_;
}

uint64_t rowNumber() const {
return previousRow_;
}
uint64_t rowNumber();

uint64_t seekToRow(uint64_t rowNumber);

Expand Down

0 comments on commit bf950ce

Please sign in to comment.