Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve testing of mismatched field numbers. #13812

Merged
merged 4 commits into from
Sep 20, 2024

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Sep 19, 2024

This improves testing of mismatched field numbers by

  • improving AssertingDocValuesProducer to detect mismatched field numbers,
  • introducing a MismatchedCodecReader to actually test mismatched field numbers on DocValuesProducer (a MismatchedLeafReader wrapping a SlowCodecReaderWrapper doesn't work since SlowCodecReaderWrapper implicitly resolves the correct FieldInfo object),
  • introducing an explicit test for mismatched field numbers in BaseDocValuesFormatTestCase.

These new tests uncovered a bug when merging sorted doc values, which would call the underlying doc values producer with the merged field info.

Closes #13805

This improves testing of mismatched field numbers by
 - improving `AssertingDocValuesProducer` to detect mismatched field numbers,
 - introducing a `MismatchedCodecReader` to actually test mismatched field
   numbers on `DocValuesProducer` (a `MismatchedLeafReader` wrapping a
`SlowCodecReaderWrapper` doesn't work since `SlowCodecReaderWrapper` implicitly
resolves the correct `FieldInfo` object),
 - introducing an explicit test for mismatched field numbers in
   `BaseDocValuesFormatTestCase`.

These new tests uncovered a bug when merging sorted doc values, which would
call the underlying doc values producer with the merged field info.

Closes apache#13805
@@ -613,7 +613,7 @@ public void mergeSortedField(FieldInfo fieldInfo, final MergeState mergeState)
if (docValuesProducer != null) {
FieldInfo readerFieldInfo = mergeState.fieldInfos[i].fieldInfo(fieldInfo.name);
if (readerFieldInfo != null && readerFieldInfo.getDocValuesType() == DocValuesType.SORTED) {
values = docValuesProducer.getSorted(fieldInfo);
values = docValuesProducer.getSorted(readerFieldInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -229,6 +234,7 @@ static class AssertingDocValuesProducer extends DocValuesProducer {

@Override
public NumericDocValues getNumeric(FieldInfo field) throws IOException {
assert fieldInfos.fieldInfo(field.name).number == field.number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@ChrisHegarty
Copy link
Contributor

I can confirm that the previously failing tests seen in the 9.x branch pass successfully with the one-liner fix in this PR. 👍

@iverase
Copy link
Contributor

iverase commented Sep 19, 2024

The failure seems legit. It happens because of this method in MismatchedCodecReader:

  @Override
  public FieldInfos getFieldInfos() {
    return shuffled;
  }

if you replace this with return in.getFieldInfos() then it is successful. I think the method FiedInfos#getMergedFieldInfos get confused because of soft deletes and parent fields?

@jpountz
Copy link
Contributor Author

jpountz commented Sep 19, 2024

We haven't had failures in thes test case for a very long time, so I suspect it's a genuine failure indeed, which got triggered by the additional testing of mismatched field numbers?

@rmuir
Copy link
Member

rmuir commented Sep 19, 2024

Seems the test/functionality uses points and docvalues. Original failing test that inspired this PR testSparseDocValuesVsStoredFields. Maybe we need a similar one to cover DocValuesVsPoints?

I guess my concern here is that fixes/test heres are focused on docvalues, but fieldinfo numbers are used by all index structures and there might be bugs elsewhere too... just do they have the tests to find issues?

Thanks for digging into the field numbers issues.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 20, 2024

I found the bug, it's in the points merging logic, which assumes that it can look up the internal map<fieldnumber, PointValues> based on the field infos that are returned by the reader. This is incorrect when the reader is a wrapper than renumbers fields such as MismatchedCodecReader. It's unlikely that this bug would have hit anyone in production, but it's still a bug. I'll push a fix and more tests shortly.

@jpountz jpountz merged commit da1f954 into apache:main Sep 20, 2024
3 checks passed
@jpountz jpountz deleted the better_test_mismatched_field_numbers branch September 20, 2024 12:37
jpountz added a commit that referenced this pull request Sep 20, 2024
This improves testing of mismatched field numbers by
 - improving `AssertingDocValuesProducer` to detect mismatched field numbers,
 - introducing a `MismatchedCodecReader` to actually test mismatched field
   numbers on `DocValuesProducer` (a `MismatchedLeafReader` wrapping a
`SlowCodecReaderWrapper` doesn't work since `SlowCodecReaderWrapper` implicitly
resolves the correct `FieldInfo` object),
 - introducing an explicit test for mismatched field numbers for doc values, points,
postings and knn vectors.

These new tests uncovered a bug when merging sorted doc values, which would
call the underlying doc values producer with the merged field info.

Closes #13805
jpountz added a commit that referenced this pull request Sep 20, 2024
This improves testing of mismatched field numbers by
 - improving `AssertingDocValuesProducer` to detect mismatched field numbers,
 - introducing a `MismatchedCodecReader` to actually test mismatched field
   numbers on `DocValuesProducer` (a `MismatchedLeafReader` wrapping a
`SlowCodecReaderWrapper` doesn't work since `SlowCodecReaderWrapper` implicitly
resolves the correct `FieldInfo` object),
 - introducing an explicit test for mismatched field numbers for doc values, points,
postings and knn vectors.

These new tests uncovered a bug when merging sorted doc values, which would
call the underlying doc values producer with the merged field info.

Closes #13805
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestLucene90DocValuesFormat fails with ArrayIndexOutOfBoundsException
4 participants