Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Google Drive URL to in the300w_lp_dataset_builder.py with confirmation #5526

Merged
merged 2 commits into from
Jul 19, 2024

Conversation

Inokinoki
Copy link
Collaborator

@Inokinoki Inokinoki commented Jul 17, 2024

Thank you for your contribution!

Please read https://www.tensorflow.org/datasets/contribute#pr_checklist to make sure your PR follows the guidelines.

Fix Dataset

  • Dataset Name: the300w_lp
  • Issue Reference:

Description

It seems that Google Drive has a redirect with a warning for non-scanned files, which fails the download:

curl -L "https://drive.google.com/uc?export=download&id=0B7OEHD3T4eCkVGs0TkhUWFN6N1k"         
<!DOCTYPE html><html><head><title>Google Drive - Virus scan warning</title><meta http-equiv="content-type" content="text/html; charset=utf-8"/><style nonce="Cnthv5s43ZEpklfe8-kwQA">.goog-link-button{position:relative;color:#15c;text-decoration:underline;cursor:pointer}.goog-link-button-disabled{color:#ccc;text-decoration:none;cursor:default}body{color:#222;font:normal 13px/1.4 arial,sans-serif;margin:0}.grecaptcha-badge{visibility:hidden}.uc-main{padding-top:50px;text-align:center}#uc-dl-icon{display:inline-block;margin-top:16px;padding-right:1em;vertical-align:top}#uc-text{display:inline-block;max-width:68ex;text-align:left}.uc-error-caption,.uc-warning-caption{color:#222;font-size:16px}#uc-download-link{text-decoration:none}.uc-name-size a{color:#15c;text-decoration:none}.uc-name-size a:visited{color:#61c;text-decoration:none}.uc-name-size a:active{color:#d14836;text-decoration:none}.uc-footer{color:#777;font-size:11px;padding-bottom:5ex;padding-top:5ex;text-align:center}.uc-footer a{color:#15c}.uc-footer a:visited{color:#61c}.uc-footer a:active{color:#d14836}.uc-footer-divider{color:#ccc;width:100%}.goog-inline-block{position:relative;display:-moz-inline-box;display:inline-block}* html .goog-inline-block{display:inline}*:first-child+html .goog-inline-block{display:inline}sentinel{}</style><link rel="icon" href="//ssl.gstatic.com/docs/doclist/images/drive_2022q3_32dp.png"/></head><body><div class="uc-main"><div id="uc-dl-icon" class="image-container"><div class="drive-sprite-aux-download-file"></div></div><div id="uc-text"><p class="uc-warning-caption">Google Drive can't scan this file for viruses.</p><p class="uc-warning-subcaption"><span class="uc-name-size"><a href="/open?id=0B7OEHD3T4eCkVGs0TkhUWFN6N1k">300W-LP.zip</a> (2.6G)</span> is too large for Google to scan for viruses. Would you still like to download this file?</p><form id="download-form" action="https://drive.usercontent.google.com/download" method="get"><input type="submit" id="uc-download-link" class="goog-inline-block jfk-button jfk-button-action" value="Download anyway"/><input type="hidden" name="id" value="0B7OEHD3T4eCkVGs0TkhUWFN6N1k"><input type="hidden" name="export" value="download"><input type="hidden" name="confirm" value="t"><input type="hidden" name="uuid" value="4fcfdc71-ca23-4264-8c6a-1322c7b1c73e"></form></div></div><div class="uc-footer"><hr class="uc-footer-divider"></div></body></html>%

Using the new URL with confirm=t can resolve this issue.

Checklist

  • Address all TODO's
  • Add alphabetized import to subdirectory's __init__.py
  • Run download_and_prepare successfully
  • Add checksums file
  • Properly cite in BibTeX format
  • Add passing test(s)
  • Add test data
  • If using additional dependencies (e.g. scipy), use lazy_imports (if applicable)
  • Add data generation script (if applicable)
  • Lint code

Copy link

google-cla bot commented Jul 17, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@camelia-tfds camelia-tfds added the copybara-import Internal label for PR management label Jul 18, 2024
@camelia-tfds camelia-tfds self-requested a review July 18, 2024 12:50
Copy link
Collaborator

@camelia-tfds camelia-tfds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests are failing, please fix.

Copy link
Collaborator

@camelia-tfds camelia-tfds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@camelia-tfds camelia-tfds merged commit e9c6535 into tensorflow:master Jul 19, 2024
18 checks passed
@Inokinoki
Copy link
Collaborator Author

Inokinoki commented Jul 19, 2024

Thanks for the review! @camelia-tfds

Just a quick comment: for some other datasets using Google Drive, there could be the same issue. #5525 (comment)
But I haven't looked through all of them.
Maybe the team is able to catch and fix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
copybara-import Internal label for PR management
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants