Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch aplay error messages in apause.exp and exit #1226

Merged
merged 3 commits into from
Aug 1, 2024

Conversation

marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Aug 1, 2024

Always better to "fail fast".

See commit messages.

The script was originally tested with arecord but it works with aplay
exactly the same. Stop saying "recording" when playing, it's confusing.

Signed-off-by: Marc Herbert <[email protected]>
Fail fast and avoid timeouts.

This is especially important considering pause will soon be disabled by
default:
- thesofproject/linux#5041

Signed-off-by: Marc Herbert <[email protected]>
Experience with thesofproject/linux#5109 shows
that this warning never seems harmless: the test ends up timing out and
failing anyway.

So, better failing fast for clearer and better logs.

Also increase the log level of press_space() to avoid state confusion.

Signed-off-by: Marc Herbert <[email protected]>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Aug 1, 2024

Perfect demonstration of the value of this PR:

https://sof-ci.01.org/softestpr/PR1226/build710/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-pause-resume-playback-100

t=13242 ms: aplay: Pressing SPACE
t=14869 ms: aplay: (32/100) Found   === PAUSE ===  ,  pausing for 137 ms
t=14869 ms: aplay: ERROR: aplay: do_pause:1567:
pause push error: File descriptor in bad state

Apparent memory corruption here:

https://sof-ci.01.org/softestpr/PR1226/build710/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=multiple-pause-resume-50

[ 1515.873143] <inf> ipc: ipc_cmd: rx	: 0x13000004|0x0
[ 1515.873160] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 8
[ 1515.873168] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0119000 0xa00ebf54U priority 0 flags 0x0
������������������������������������������������������������������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
��������������������������������������������������
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

https://sof-ci.01.org/softestpr/PR1226/build710/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=verify-kernel-boot-log is unrelated issue thesofproject/linux#5124

Everything else is 100% green.

@marc-hb marc-hb marked this pull request as ready for review August 1, 2024 05:00
@marc-hb marc-hb requested a review from a team as a code owner August 1, 2024 05:00
@marc-hb
Copy link
Collaborator Author

marc-hb commented Aug 1, 2024

SOFCI TEST

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but the last time I used TCL was in 1996, can't really approve or reject.

# When multiple patterns match, first pattern wins.

-nocase -re {error.*\r|PAUSE.*no[[:blank:]]*hw[[:blank:]]*support.*\r} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

laughing at my desk at 8am, no idea what this does haha.

Copy link
Collaborator Author

@marc-hb marc-hb Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me only a couple days of practice to become productive in Tcl and expect (without any "stackoverflow garbage in, AI garbage out!"); it's not so bad.

{ } is like the single quotes in shell except they can be nested.

The basic syntax of the expect instruction is just:

expect {
 { pattern1 } { action1 }
 { pattern2 } { action2 }
 ...
}

-nocase -re means the pattern is a case-insensitive regular expression (as opposed to globbing).

That regular expression has itself nothing specific to Tcl.

Thanks for taking a look!

log 0 "ERROR: $buffer_with_lf"
exit 1
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear on the commit message: PAUSE will be disabled unless we have a kernel parameter that keeps it enabled. All CI platforms SHALL keep the pause enabled.

Copy link
Collaborator Author

@marc-hb marc-hb Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I tried to say in the commit message but more briefly and without getting into Intel specifics. Too briefly? I mean, should I change the commit message or is your comment here enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats' fine, just wanted to check we were aligned.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Aug 1, 2024

LNL https://sof-ci.01.org/softestpr/PR1226/build716/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=multiple-pause-resume-50 has pause push error: File descriptor in bad state across three models.

@plbossart
Copy link
Member

plbossart commented Aug 1, 2024

"pause push error: File descriptor in bad state" is not necessarily a problem, we see it all the time when resuming and INFO_RESUME is not supported.

I've seen this log countless times and things were just fine, alsa-lib reinitialized whatever was needed with a prepare.

Adding @ranj063 and @ujfalusi to make sure I am not completely mistaken.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Aug 1, 2024

"pause push error: File descriptor in bad state" is not necessarily a problem, we see it all the time when resuming and INFO_RESUME is not supported.

Mmmm.... questions:

  • why are we seeing this error message ONLY on LNL? Before this PR catches this error, LNL was also the only platform where the test failed (later and in a more confusing way).
  • Can you please elaborate on "INFO_RESUME is not supported." ? Is it the same as "pause not supported" that's coming with ASoC: SOF: core/Intel: Follow the pause_supported flag from topology to disable or keep the pause support enabled linux#5041? If it's the same and it's not supported in some configuration, then the test should not be run or "fail fast" with a relevant and visible error message thanks to this sof-test fix. Not later after some obscure timeout looking like a real bug.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Aug 1, 2024

Can you please elaborate on "INFO_RESUME is not supported." ? Is it the same as "pause not supported"

Yes it's the same according to @ranj063 .

Thanks for the reviews. This is a well tested and small change - easy to revert parts of it if needed so I'm merging it.

@marc-hb marc-hb merged commit 000633a into thesofproject:main Aug 1, 2024
4 of 7 checks passed
@marc-hb marc-hb deleted the harden-apause branch August 1, 2024 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants