Releases: huggingface/optimum-intel
v1.19.0: SentenceTransformers OpenVINO support
- Support SentenceTransformers models inference by @aleksandr-mokrov in #865
from optimum.intel.openvino import OVSentenceTransformer
model_id = "sentence-transformers/all-mpnet-base-v2"
model = OVSentenceTransformer.from_pretrained(model_id, export=True)
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
- Infer if the model needs to be exported or not by @echarlaix in #825
from optimum.intel import OVModelForCausalLM
- model = OVModelForCausalLM.from_pretrained("gpt2", export=True)
+ model = OVModelForCausalLM.from_pretrained("gpt2")
Compatible with transformers>=4.36,<=4.44
Full Changelog: v1.18.0...v1.19.0
v1.18.3: Patch release
Full Changelog: v1.18.2...v1.18.3
v1.18.2: Patch release
- Fix model patching for internlm2 by @eaidova in #814
- Fix loading models from cache by @eaidova in #820
- Disable tpp for un-verified models by @jiqing-feng in #822
- Update default NNCF configurationsby @KodiaqQ in #824
- Fix update causal mask for transformers 4.42 by @eaidova in #852
- Fix bf16 inference accuracy for mistral, phi3, dbrx by @eaidova in #833
- Revert rotary embedding patching for recovering gpu accuracy by @eaidova in #855
- Support transformers 4.43 by @IlyasMoutawwakil in #856
Full Changelog: v1.18.1...v1.18.2
v1.18.1: Patch release
- OV configurations alignment by @KodiaqQ in #787
- Enable transformers v4.42.0 by @echarlaix in #793
- Deprecate onnx/ort model export and quantization by @IlyasMoutawwakil in #795
- Free memory after model export by @eaidova in #800
- Update config import path for neural-compressor v2.6 by @changwangss in #801
- Pin library name to transformers for feature extraction by @IlyasMoutawwakil in #804
Full Changelog: v1.18.0...v1.18.1
v1.18.0: Arctic, Jais, OpenVINO pipelines
OpenVINO
- Enable Arctic, Jais export by @eaidova in #726
- Enable GLM-4 export by @eaidova in #776
- Move data-driven quantization after model export for text-generation models by @nikita-savelyevv in #721
- Create default token_type_ids when needed for inference by @echarlaix #757
- Resolve default int4 config for local models by @eaidova in #760
- Update to NNCF 2.11 by @nikita-savelyevv in #763
- Fix quantization config by @echarlaix in #773
- Expose trust remote code argument when generating calibration dataset for datasets >= v2.20.0 by @echarlaix #767
- Add pipelines by @echarlaix in #740
from optimum.intel.pipelines import pipeline
# Load openvino model
ov_pipe = pipeline("text-generation", "helenai/gpt2-ov", accelerator="openvino")
# Load pytorch model and convert it to openvino before inference
pipe = pipeline("text-generation", "gpt2", accelerator="openvino")
IPEX
- Enable IPEX patching for llama for >= v2.3 by @jiqing-feng in #725
- Refactor llama modeling for IPEX patching by @faaany in #728
- Refactor model loading by @jiqing-feng in #752
v1.17.2: Patch release
- Fix compatibility with transformers < v4.39.0 by @echarlaix in #754
v1.17.1: Patch release
- Add setuptools to fix issue with Python 3.12 by @helena-intel in #747
- Disable warnings by @helena-intel in #748
- Fix Windows TemporaryDirectory issue by @helena-intel in #749
- Fix generation config loading and saving by @eaidova in #750
v1.17.0: ITREX WOQ, IPEX pipeline, extended OpenVINO export
OpenVINO
-
Enable BioGPT, Cohere, Persimmon, XGLM export by @eaidova in #709
-
Add OVModelForVision2Seq class by @eaidova in #634
from optimum.intel import OVModelForVision2Seq model = OVModelForVision2Seq.from_pretrained("nlpconnect/vit-gpt2-image-captioning", export=True) gen_tokens = model.generate(**inputs)
-
Introduce OVQuantizationConfig for NNCF quantization by @nikita-savelyevv in #638
-
Enable hybrid StableDiffusion models export via optimum-cli by @l-bat in #618
optimum-cli export openvino --model SimianLuo/LCM_Dreamshaper_v7 --task latent-consistency --dataset conceptual_captions --weight-format int8 <output_dir>
-
Convert Tokenizers by default by @apaniukov in #580
-
Custom tasks modeling by @IlyasMoutawwakil in #669
-
Add dynamic quantization config by @echarlaix in #661
from optimum.intel import OVModelForCausalLM, OVDynamicQuantizationConfig model_id = "meta-llama/Meta-Llama-3-8B" q_config = OVDynamicQuantizationConfig(bits=8, activations_group_size=32) model = OVModelForCausalLM.from_pretrained(model_id, export=True, quantization_config=q_config)
-
Transition to a newer NNCF API for PyTorch model quantization by @nikita-savelyevv in #630
ITREX
- Add ITREX weight-only quantization support by @PenghuiCheng in #455
IPEX
- Add IPEX pipeline by @jiqing-feng in #501
v1.16.1: Patch release
- Bump transformers version by @echarlaix in #682
v1.16.0: OpenVINO config, SD hybrid quantization
Add hybrid quantization for Stable Diffusion pipelines by @l-bat in #584
from optimum.intel import OVStableDiffusionPipeline, OVWeightQuantizationConfig
model_id = "echarlaix/stable-diffusion-v1-5-openvino"
quantization_config = OVWeightQuantizationConfig(bits=8, dataset="conceptual_captions")
model = OVStableDiffusionPipeline.from_pretrained(model_id, quantization_config=quantization_config)
Add openvino export configs by @eaidova in #568
Enabling OpenVINO export for the following architectures enabled : Mixtral, ChatGLM, Baichuan, MiniCPM, Qwen, Qwen2, StableLM
Add support for export and inference for StarCoder2 models by @eaidova in #619