Skip to content

Latest commit

 

History

History
96 lines (73 loc) · 3.3 KB

File metadata and controls

96 lines (73 loc) · 3.3 KB

TensorFlow BERT Large Pretraining

This document has instructions for running BERT Large Pretraining on baremetal using Intel-optimized TensorFlow.

Setup on baremetal

  • Create and activate virtual environment.

    virtualenv -p python <virtualenv_name>
    source <virtualenv_name>/bin/activate
  • Install Intel Tensorflow

    pip install intel-tensorflow
  • Note: For kernel version 5.16, AVX512_CORE_AMX is turned on by default. If the kernel version < 5.16 , please set the following environment variable for AMX environment:

    DNNL_MAX_CPU_ISA=AVX512_CORE_AMX
    # To run VNNI, please set 
    DNNL_MAX_CPU_ISA=AVX512_CORE_BF16
  • Clone Intel AI Reference Models repository

    git clone https://github.com/IntelAI/models

Quick Start Scripts

Script name Description
pretraining.sh Uses mpirun to execute 1 process per socket for BERT Large pretraining with the specified precision (fp32, bfloat16 and bfloat32). Logs for each instance are saved to the output directory.

Datasets

SQuAD data

Download and unzip the BERT Large uncased (whole word masking) model from the google bert repo. Set the DATASET_DIR to point to this directory when running BERT Large.

mkdir -p $DATASET_DIR && cd $DATASET_DIR
wget https://storage.googleapis.com/bert_models/2019_05_30/wwm_uncased_L-24_H-1024_A-16.zip
unzip wwm_uncased_L-24_H-1024_A-16.zip

wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -P wwm_uncased_L-24_H-1024_A-16

Follow instructions to generate BERT pre-training dataset in TensorFlow record file format. The output TensorFlow record files are expected to be located in the dataset directory ${DATASET_DIR}/tf_records. An example for the TF record file path should be ${DATASET_DIR}/tf_records/part-00430-of-00500.

Download checkpoints:

wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/bert_large_checkpoints.zip
unzip bert_large_checkpoints.zip
export CHECKPOINT_DIR=$(pwd)/bert_large_checkpoints

Run the model

Set environment variables to specify the dataset directory, precision to run, and an output directory.

# Navigate to the container package directory
cd models

# Install pre-requisites for the model:
./quickstart/language_modeling/tensorflow/bert_large/training/cpu/setup.sh

# Set the required environment vars
export PRECISION=<specify the precision to run:fp32, bfloat16 and bfloat32>
export DATASET_DIR=<path to the dataset>
export OUTPUT_DIR=<directory where log files will be written>
export CHECKPOINT_DIR=<path to the downloaded checkpoints folder>

# Run the container with pretraining.sh quickstart script
./quickstart/language_modeling/tensorflow/bert_large/training/cpu/pretraining.sh

License

Licenses can be found in the model package, in the licenses directory.