Release v2.0.0 · foundation-model-stack/fms-hf-tuning

New major features:

Support for LoRA for the following model architectures - llama3, llama3.1, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral, and allam
Support for QLora for the following model architectures - llama3, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral
Addition of post-processing function to format tuned adapters as required by vLLM for inference. Refer to README on how to run as a script. When tuning on image, post-processing can be enabled using the flag lora_post_process_for_vllm. See build README for details on how to set this flag.
Enablement of new flags for throughput improvements: padding_free to process multiple examples without adding padding tokens, multipack for multi-GPU training to balance the number of tokens processed on each device, and fast_kernels for optimized tuning with fused operations and triton kernels. See README for details on how to set these flags and use cases.

Dependency upgrades:

Upgraded transformers to version 4.44.2 needed for tuning of all models
Upgraded accelerate to version 0.33 needed for tuning of all models. Version 0.34.0 has a bug for FSDP.

API /interface changes:

train() API now returns a tuple of trainer instance and additional metadata as a dict

Additional features and fixes

Support of resume tuning from the existing checkpoint. Refer to README on how to use it as a flag. Flag resume_training defaults to True.
Addition of default pad token in tokenizer when EOS and PAD tokens are equal to improve training quality.
JSON compatability for input datasets. See docs for details on data formats.
Fix to not resize embedding layer by default, embedding layer can continue to be resized as needed using flag embedding_size_multiple_of.

Full List of what's Changed

fix: do not resize embedding layer by default by @kmehant in #310
fix: logger is unbound error by @HarikrishnanBalagopal in #308
feat: Enable JSON dataset compatibility by @willmj in #297
doc: How to tune LoRA lm_head by @aluu317 in #305
docs: Add findings from exploration into model tuning performance degradation by @willmj in #315
fix: warnings about casing when building the Docker image by @HarikrishnanBalagopal in #318
fix: need to pass skip_prepare_dataset for pretokenized dataset due to breaking change in HF SFTTrainer by @HarikrishnanBalagopal in #326
feat: install fms-acceleration to enable qlora by @anhuong in #284
feat: Migrating the trainer controller to python logger by @seshapad in #309
fix: remove fire ported from Hari's PR #303 by @HarikrishnanBalagopal in #324
dep: cap transformers version due to FSDP bug by @anhuong in #335
deps: Add protobuf to support aLLaM models by @willmj in #336
fix: add enable_aim build args in all stages needed by @anhuong in #337
fix: remove lm_head post processing by @Abhishek-TAMU in #333
doc: Add qLoRA README by @aluu317 in #322
feat: Add deps to evaluate qLora tuned model by @aluu317 in #312
feat: Add support for smoothly resuming training from a saved checkpoint by @Abhishek-TAMU in #300
ci: add a github workflow to label pull requests based on their title by @HarikrishnanBalagopal in #298
fix: Addition of default pad token in tokenizer when EOS and PAD token are equal by @Abhishek-TAMU in #343
feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels by @achew010 in #280
fix: cap transformers at v4.44 by @anhuong in #349
fix: utilities to post process checkpoint for LoRA by @Ssukriti in #338
feat: Add post processing logic to accelerate launch by @willmj in #351
build: install additional fms-acceleration plugins by @anhuong in #350
fix: unable to find output_dir in multi-GPU during resume_from_checkpoint check by @Abhishek-TAMU in #352
fix: check for wte.weight along with embed_tokens.weight by @willmj in #356
release: merge set of changes for v2.0.0 by @Abhishek-TAMU in #357

New Contributors

@achew010 made their first contribution in #280

Full Changelog: v1.2.2...v2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0

New major features:

Dependency upgrades:

API /interface changes:

Additional features and fixes

Full List of what's Changed

New Contributors

Contributors