-
Notifications
You must be signed in to change notification settings - Fork 11
Start Training
Once we go through the pipeline, the dataset is hierarchically organized in /path/to/dataset_dir/training
with multiply.txt
in each subfolder indicating the repeat of the images from this directory. You can pretty much launch the training process with your favorite trainer at this stage, modulo a few more steps to make sure that the data are read correctly.
With multiply.txt
in each folder, the above structure is directly compatible with EveryDream2.
For kohya-ss/sd-scripts you need to perform one more step with flatten_folder.py
python flatten_folder.py \
--separator ~ \
--src_dir /path/to/dataset_dir/training
If you do not have the used separator (~
by default) in any folder name you can undo the change by
python flatten_folder.py \
--separator ~ \
--src_dir /path/to/dataset_dir/training \
--revert
It is important to switch between the two modes as I rely on the folder structure to compute repeat for now.
HCP-Diffusion requires to set up an yaml file to specify the repeat of each data source, and its configuration is generally more complicated, so I have provided prepare_hcp.py
to streamline the process (to be run in the hcp-diffusion python environment).
python prepare_hcp \
--config_dst_dir /path/to/training_config_dir \
--dataset_dir /path/to/dataset_dir/training
--pivotal \
--trigger_word_file /path/to/dataset_dir/emb_init.json
Once this is done, the embeddings are created in /path/to/training_config_dir/embs
and you can start training with
accelerate launch -m hcpdiff.train_ac_single \
--cfg /path/to/training_config_dir/lora_conventional.yaml
-
--pivotal
indicates pivotal tuning, i.e. training of embedding and network at the same time (this is not possible with neither kohya nor EveryDream). Remove this argument if you do not want to train embedding. - You can customize the embedding you want to create and how they are initialized by modifying the content of
emb_init.json
. - Use
--help
to see more arguments. Notably you can set--emb_dir
,--exp_dir
, and--main_config_file
(which defaults tohcp_configs/lora_conventional.yaml
), among others. - To modify training and dataset parameters, you can modify either directly the files in
hcp_configs
before running the script or modifydataset.yaml
andlora_conventional.yaml
(or other config file you use) in/path/to/training_config_dir
after running the script. - You should not move the generated config files because some absolute paths are used.
After training, the output files from HCP diffusion cannot be readily used by a1111/sd-webui. For conversion please refer to Utilities: Conversion Scripts.
Each trainer has its strength and drawback. If you know another good trainer that I overlook here, please let me know.
- Home
- Dataset Organization
- Main Arguments
- Organization of the Character Reference Directory
- Start Training
- Conversion Scripts
- Anime and fanart downloading
- Frame extraction and similar image removal
- Character detection and cropping
- Character classification
- Image selection and resizing
- Tagging, captioning, and generating wildcards and embedding initialization information
- Dataset arrangement
- Repeat computation for concept balancing