-
Notifications
You must be signed in to change notification settings - Fork 11
Stage 6: Dataset Arrangement
CyberMeow edited this page Dec 24, 2023
·
2 revisions
Arrange folder in a certain format for concept balancing
- If you start from this stage, please set
--src_dir
to the training folder to arrange (/path/to/dataset_dir/training/{image_type}
by default). - In-place operation.
For more details please refer to Dataset Organization.
-
rearrange_up_levels
: This argument specifies the number of directory levels to ascend from the captioned directory when setting the source directory for the rearrange stage. By default, this is set to 0, meaning no change from the captioned directory level.
Example usage: --rearrange_up_levels 2 -
arrange_format
: It defines the directory hierarchy for dataset arrangement. The default format isn_characters/character
. Other valid components arecharacter_string
(useful in the case of further character refinement) andimage_type
(should be used with--rearrange_up_levels
set to positive values).
Example usage: --arrange_format n_characters/character_string/image_type -
max_character_number
: This argument determines the naming convention forn_characters
folders. When set, any image containing more than the specified number of characters will be grouped into a single folder named with the format{n}+_characters
, where n is the number specified. The default value is 6.
Example usage:--max_character_number 2
-
min_images_per_combination
: This sets the minimum number of images required for a specific character combination to have its own directory. If the number of images for a particular character combination is below this threshold, the images are placed in acharacter_others
directory. The default number is 10.
Example usage:--min_images_per_combination 15
- Home
- Dataset Organization
- Main Arguments
- Organization of the Character Reference Directory
- Start Training
- Conversion Scripts
- Anime and fanart downloading
- Frame extraction and similar image removal
- Character detection and cropping
- Character classification
- Image selection and resizing
- Tagging, captioning, and generating wildcards and embedding initialization information
- Dataset arrangement
- Repeat computation for concept balancing