Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly show progress bar for large files #627

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

minhtuev
Copy link

When following our documentation to download large datasets (such as Open Images), the default behavior is that all progress bars are hidden; hence, sometimes, the snippet appears stuck when downloading large files. This PR changes the behavior so that progress bars are shown explicitly when downloading files greater than a certain size.

Before:

In [20]: dataset = foz.load_zoo_dataset(
    ...:     "open-images-v7",
    ...:     split="train",
    ...:     max_samples=1000,
    ...:     seed=51,
    ...:     shuffle=True,
    ...: )
Downloading split 'train' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train' if necessary
Downloading 'https://storage.googleapis.com/openimages/2018_04/train/train-images-boxable-with-rotation.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/image_ids.csv'
 100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|    4.8Gb/4.8Gb [58.0s elapsed, 0s remaining, 89.5Mb/s]
Downloading 'https://storage.googleapis.com/openimages/v5/class-descriptions-boxable.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/classes.csv'
Downloading 'https://storage.googleapis.com/openimages/v6/oidv6-attributes-description.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/attributes.csv'
Downloading 'https://storage.googleapis.com/openimages/v5/classes-segmentation.txt' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/segmentation_classes.csv'
Downloading 'https://storage.googleapis.com/openimages/v7/oidv7-class-descriptions.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/point_classes.csv'
Downloading 'https://storage.googleapis.com/openimages/2018_04/bbox_labels_600_hierarchy.json' to '/var/folders/bg/cq0lv35x74s_8tlvqfdmf90r0000gn/T/tmptqnku3ue/metadata/hierarchy.json'
Downloading 'https://storage.googleapis.com/openimages/v5/train-annotations-human-imagelabels-boxable.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/labels/classifications.csv'

Downloading 'https://storage.googleapis.com/openimages/v6/oidv6-train-annotations-bbox.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/labels/detections.csv'
Downloading 'https://storage.googleapis.com/openimages/v7/oidv7-train-annotations-point-labels.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/labels/points.csv'

After:

In [1]: import fiftyone as fo
   ...: import fiftyone.zoo as foz

In [2]: dataset = foz.load_zoo_dataset(
   ...:     "open-images-v7",
   ...:     split="train",
   ...:     max_samples=1500,
   ...:     seed=51,
   ...:     shuffle=True,
   ...: )
Downloading split 'train' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train' if necessary
Downloading 'https://storage.googleapis.com/openimages/2018_04/train/train-images-boxable-with-rotation.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/image_ids.csv'
 100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|    4.8Gb/4.8Gb [57.3s elapsed, 0s remaining, 88.3Mb/s]
Downloading 'https://storage.googleapis.com/openimages/v5/class-descriptions-boxable.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/classes.csv'
Downloading 'https://storage.googleapis.com/openimages/v6/oidv6-attributes-description.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/attributes.csv'
Downloading 'https://storage.googleapis.com/openimages/v5/classes-segmentation.txt' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/segmentation_classes.csv'
Downloading 'https://storage.googleapis.com/openimages/v7/oidv7-class-descriptions.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/metadata/point_classes.csv'
Downloading 'https://storage.googleapis.com/openimages/2018_04/bbox_labels_600_hierarchy.json' to '/var/folders/bg/cq0lv35x74s_8tlvqfdmf90r0000gn/T/tmpklrabgfn/metadata/hierarchy.json'
Downloading 'https://storage.googleapis.com/openimages/v5/train-annotations-human-imagelabels-boxable.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/labels/classifications.csv'
 100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|    2.8Gb/2.8Gb [33.2s elapsed, 0s remaining, 89.8Mb/s]
Downloading 'https://storage.googleapis.com/openimages/v6/oidv6-train-annotations-bbox.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/labels/detections.csv'
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|   16.8Gb/16.8Gb [3.3m elapsed, 0s remaining, 89.6Mb/s]
Downloading 'https://storage.googleapis.com/openimages/v7/oidv7-train-annotations-point-labels.csv' to '/Users/minhtuevo/.fiftyone/datasets/open-images-v7/train/labels/points.csv'
  21% |████████████████████████\-------------------------------------------------------------------------------------------|    5.6Gb/26.3Gb [1.1m elapsed, 4.1m remaining, 88.9Mb/s]

@minhtuev minhtuev marked this pull request as ready for review June 28, 2024 19:15
eta/core/web.py Outdated
@@ -168,8 +169,13 @@ def _do_download(self, r, f):
size_bits = 8 * size_bytes if size_bytes is not None else None
total_downloaded_bytes = 0

quiet = self.quiet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a caller says download_file(url, quiet=True), I think we should treat their request as sacred: there should never be a progress bar.

For context, in FO, there's an fo.config.show_progress_bars setting that can be set to False if a user wants to turn off all progress bars shown throughout the library and all its dependencies (ie invocations of this method). This can actually be of more than just an aesthetic significance: rendering progress bars requires some low-level sorcery like getting the dimensions of the terminal in the caller's environment, which is not possible in all environments and could actually cause hard errors. Obviously having this global flag be fully comprehensive requires that we appropriately pass this setting through to third-party libraries, which we mostly do, but inspecting fiftyone.utils.openimages I see that we haven't respected fo.config.show_progress_bars in this case (either due to laziness on our part and/or because downloading zoo datasets is more of an ad hoc thing, not something that would be built into automations that would happen many times).

All that to say, if we really want to introduce file size-based progress bar control, I think we should add a new syntax. For example:

# Show a progress bar only if file size exceeds 64MB
download_file(url, quiet=64*1024**2)

# Show a progress bar only if file size exceeds our internal default of, say, 128MB
download_file(url, quiet="best")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to note that in the motivating case here of the fiftyone.utils.openimages module, some effort was already put in to show progress bar/logging messages in a bespoke and hopefully helpful way:

Maybe there's a way to improve user experience just by tweaking the quiet parameters over there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense; I was thinking about a case where a user wants to turn off all progress bars but couldn't figure out the exact logic.

I like that we can turn quiet into a more expressive parameter that is also backward compatible:

  • true/false (bool): if quiet is true, do not show any progress bar. if quiet is false, show all progress bars. This is the current behavior.
  • "best" (str): only shows the progress bar if the file size exceeds certain internal values.
  • (int): only show the progress bar if the file size exceeds the value passed in.

- true/false (bool): if quiet is true, do not show progress bar. if quiet is false, show all progress bars.
- "best" (str): only show progress bar if the file size exceeds certain internal values.
- (int): only show progress bar if the file size exceeds the value passed in.
@minhtuev minhtuev requested a review from brimoor July 2, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants