Skip to content

v2.0.0

Latest
Compare
Choose a tag to compare
@Abhishek-TAMU Abhishek-TAMU released this 30 Sep 21:03
3b150ab

New major features:

  1. Support for LoRA for the following model architectures - llama3, llama3.1, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral, and allam
  2. Support for QLora for the following model architectures - llama3, granite (GPTBigCode and LlamaForCausalLM), mistral, mixtral
  3. Addition of post-processing function to format tuned adapters as required by vLLM for inference. Refer to README on how to run as a script. When tuning on image, post-processing can be enabled using the flag lora_post_process_for_vllm. See build README for details on how to set this flag.
  4. Enablement of new flags for throughput improvements: padding_free to process multiple examples without adding padding tokens, multipack for multi-GPU training to balance the number of tokens processed on each device, and fast_kernels for optimized tuning with fused operations and triton kernels. See README for details on how to set these flags and use cases.

Dependency upgrades:

  1. Upgraded transformers to version 4.44.2 needed for tuning of all models
  2. Upgraded accelerate to version 0.33 needed for tuning of all models. Version 0.34.0 has a bug for FSDP.

API /interface changes:

  1. train() API now returns a tuple of trainer instance and additional metadata as a dict

Additional features and fixes

  1. Support of resume tuning from the existing checkpoint. Refer to README on how to use it as a flag. Flag resume_training defaults to True.
  2. Addition of default pad token in tokenizer when EOS and PAD tokens are equal to improve training quality.
  3. JSON compatability for input datasets. See docs for details on data formats.
  4. Fix to not resize embedding layer by default, embedding layer can continue to be resized as needed using flag embedding_size_multiple_of.

Full List of what's Changed

New Contributors

Full Changelog: v1.2.2...v2.0.0