Skip to content

Stage 0: Anime and Fanart Downloading

CyberMeow edited this page Dec 24, 2023 · 5 revisions

Automatically download animes and images respectively from nyaa.si and Danbooru

  • --src_dir is not relevant here
  • Output folder for anime: /path/to/dataset_dir/intermediate/{image_type}/animes
  • Output folder for fanarts: /path/to/dataset_dir/intermediate/{image_type}/raw

Anime downloading

We download anime by searching with the keyword "{submitter name} {anime name} {resolution}", and filter by episode number when possible. Torrent is used for downloading, which means that this stage would hang if there were no seeders. Moreover, the parsing of anime name and episode number is hard coded an may not always work. Therefore it could be simpler for you just to download the animes yourself instead of invoking this stage.

  • anime_name: Anime name used in the keyword search.
    Example usage: --anime_name "yuuki_yuuna_wa_yuusha_de_aru"
  • candidate_submitters: A list of candidates submitters from which we try to search for anime. Only the first one with which we manage to find an anime to download will be used.
    Example usage: --candidate_submitters erai subsplease
  • anime_resolution: Anime resolution to use in keyword search. Typical choices are 480, 720, and 1080. Defaults to 720.
    Example usage: --anime_resolution 1080
  • min_download_episode and max_download_episode: This gives the range of episodes that you want to download. If you want to download all the episodes just leave them as None (the default value).
    Example usage: --min_download_episode 2 --max_download_episode 10

Farart downloading

For now the fanarts are simply downloaded from Danbooru as they come with existing character information. I may add possibility of downloading from other sources later. The downloading can be slow and needs improvement from the waifuc side.

  • anime_name_booru: Name to search for downloading from booru. It requires it to match exactly the name used on booru. If this is not provided and --anime_name is provided, the latter is used.
    Example usage: --anime_name_booru "yama_no_susume"
  • character_info_file: Path to an optional csv file providing correspondence between booru character names and the character names you want to use for training. The character names which are not specified insides remain untouched. Alternatively, you can use it to specify characters that you want to download (in which case you can leave the second column empty). Any characters that are not encountered in the anime downloading phase will then get downloaded if --download_for_characters is given.
    Example usage: --character_info_file "configs/csv_examples/character_mapping_example.csv"
  • download_for_characters: Whether to download characters in --character_info_file as explained above.
    Example usage: --download_for_characters
  • booru_download_limit: Limit on the total number of images to download from Danbooru. Defaults to no limit. Setting to 0 will download all images as well. Note that if both --booru_download_limit and --booru_download_limit_per_character are set, we are not guaranteed to download --booru_download_limit number of images.
    Example usage: --booru_download_limit 1000
  • booru_download_limit_per_character: Sets a limit on the number of images to download for each character from Danbooru. If set to 0, there will be no limit for each character. The default value is 500.
    Example usage: --booru_download_limit_per_character 300
  • allowed_ratings: Specifies a list of allowed ratings to filter the images downloaded from Danbooru. Options include s (safe), g (general), q (questionable), and e (explicit). By default, this list is empty, indicating no filtering based on ratings.
    Example usage: --allowed_ratings s g
  • allowed_image_classes: Defines a list of allowed image classes for filtering the images. Options include illustration, bangumi (anime), comic, and 3d. By default, only illustration and bangumi images are downloaded. Set this to an empty list to disable class-based filtering.
    Example usage: --allowed_image_classes illustration comic
  • max_download_size: Sets the maximum size for the smaller dimension of downloaded images. If an image's smaller dimension exceeds this limit, it will be resized. The default value is 1024.
    Example usage: --max_download_size 800