Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[utils] extract_timezone() removes trailing year from time+date-type date-time string #29948

Open
3 of 6 tasks
dirkf opened this issue Sep 12, 2021 · 1 comment
Open
3 of 6 tasks

Comments

@dirkf
Copy link
Contributor

dirkf commented Sep 12, 2021

Checklist

  • I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2021.06.06
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

N/A

Description

Looking at a recent PR, the site uses this format for the date-time:

11:30, 06-Jun-2021

This isn't supported by unified_timestamp().

If you try to support it by adding '%H:%M %d-%b-%Y to the DATE_FORMATS_DAY_FIRST list, you still get None from unified_timestamp().

extract_timezone() thinks that -2021 is a time-zone and removes it.

unified_timestamp() should try to decode the string as given; then if that hasn't worked, extract_timezone() and retry.

In the above case there are work-arounds:

    date_str = ' '.join(reversed(date_str.split(',', 1)))
    DATE_FORMATS_DAY_FIRST.append('%d-%b-%Y %H:%M')
    timestamp  = unified_timestamp(date_str)

Or:

    date_str = re.sub(r'(?<=\w)-(?=\w|$)', `/`, date_str)
    DATE_FORMATS_DAY_FIRST.append('%H:%M %d/%b/%Y')
    timestamp  = unified_timestamp(date_str)
@dirkf dirkf mentioned this issue Sep 12, 2021
11 tasks
@dirkf
Copy link
Contributor Author

dirkf commented Sep 12, 2021

A check against this issue was added at 15ac841 but did not survive the subsequent introduction of unified_timestamp().

Separate processing for time-zones is needed because because time-zone formatting is only in Py>=3.2.

Another possible fix is to restrict the pattern used in extract_timezone() (utils.py ll.2940 ff.): see PR #29845 for implementation.

Some more unit tests for the date-time utility functions would be good.

pukkandan added a commit to yt-dlp/yt-dlp that referenced this issue Sep 19, 2021
Lesmiscore added a commit to ytdl-patched/ytdl-patched that referenced this issue Sep 19, 2021
* 'master' of https://github.com/yt-dlp/yt-dlp:
  [CBC] Fix CBC Gem extractors (#1013)
  [Peertube] Add channel extractor (#1023)
  [youtube] Warn when trying to download clips
  [test/cookies] Improve logging
  [Nuvid] Fix extractor (#1022)
  [aes] Add `aes_gcm_decrypt_and_verify` (#1020)
  [CGTN] Add extractor (#981)
  [utils] Improve `extract_timezone` Code taken from: ytdl-org/youtube-dl#29845 Fixes: ytdl-org/youtube-dl#29948 Authored by: dirkf
nixxo pushed a commit to nixxo/yt-dlp that referenced this issue Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant