diff --git a/demo/README.md b/demo/README.md index 26deb453bb58..df53b05bb568 100644 --- a/demo/README.md +++ b/demo/README.md @@ -145,7 +145,7 @@ Send a PR to add a one sentence description:) ## Tools using XGBoost - [BayesBoost](https://github.com/mpearmain/BayesBoost) - Bayesian Optimization using xgboost and sklearn API -- [FLAML](https://github.com/microsoft/FLAML) - An open source AutoML library +- [FLAML](https://github.com/microsoft/FLAML) - An open source AutoML library designed to automatically produce accurate machine learning models with low computational cost. FLAML includes [XGBoost as one of the default learners](https://github.com/microsoft/FLAML/blob/main/flaml/model.py) and can also be used as a fast hyperparameter tuning tool for XGBoost ([code example](https://microsoft.github.io/FLAML/docs/Examples/AutoML-for-XGBoost)). - [gp_xgboost_gridsearch](https://github.com/vatsan/gp_xgboost_gridsearch) - In-database parallel grid-search for XGBoost on [Greenplum](https://github.com/greenplum-db/gpdb) using PL/Python - [tpot](https://github.com/rhiever/tpot) - A Python tool that automatically creates and optimizes machine learning pipelines using genetic programming. diff --git a/doc/tutorials/spark_estimator.rst b/doc/tutorials/spark_estimator.rst index 545403a34ad0..44bdd7733f56 100644 --- a/doc/tutorials/spark_estimator.rst +++ b/doc/tutorials/spark_estimator.rst @@ -35,13 +35,13 @@ We can create a ``SparkXGBRegressor`` estimator like: ) -The above snippet creates a spark estimator which can fit on a spark dataset, -and return a spark model that can transform a spark dataset and generate dataset -with prediction column. We can set almost all of xgboost sklearn estimator parameters -as ``SparkXGBRegressor`` parameters, but some parameter such as ``nthread`` is forbidden -in spark estimator, and some parameters are replaced with pyspark specific parameters -such as ``weight_col``, ``validation_indicator_col``, ``use_gpu``, for details please see -``SparkXGBRegressor`` doc. +The above snippet creates a spark estimator which can fit on a spark dataset, and return a +spark model that can transform a spark dataset and generate dataset with prediction +column. We can set almost all of xgboost sklearn estimator parameters as +``SparkXGBRegressor`` parameters, but some parameter such as ``nthread`` is forbidden in +spark estimator, and some parameters are replaced with pyspark specific parameters such as +``weight_col``, ``validation_indicator_col``, for details please see ``SparkXGBRegressor`` +doc. The following code snippet shows how to train a spark xgboost regressor model, first we need to prepare a training dataset as a spark dataframe contains @@ -88,7 +88,7 @@ XGBoost PySpark fully supports GPU acceleration. Users are not only able to enab efficient training but also utilize their GPUs for the whole PySpark pipeline including ETL and inference. In below sections, we will walk through an example of training on a PySpark standalone GPU cluster. To get started, first we need to install some additional -packages, then we can set the ``use_gpu`` parameter to ``True``. +packages, then we can set the ``device`` parameter to ``cuda`` or ``gpu``. Prepare the necessary packages ============================== @@ -128,7 +128,7 @@ Write your PySpark application ============================== Below snippet is a small example for training xgboost model with PySpark. Notice that we are -using a list of feature names and the additional parameter ``use_gpu``: +using a list of feature names and the additional parameter ``device``: .. code-block:: python @@ -148,12 +148,12 @@ using a list of feature names and the additional parameter ``use_gpu``: # get a list with feature column names feature_names = [x.name for x in train_df.schema if x.name != label_name] - # create a xgboost pyspark regressor estimator and set use_gpu=True + # create a xgboost pyspark regressor estimator and set device="cuda" regressor = SparkXGBRegressor( features_col=feature_names, label_col=label_name, num_workers=2, - use_gpu=True, + device="cuda", ) # train and return the model @@ -163,6 +163,7 @@ using a list of feature names and the additional parameter ``use_gpu``: predict_df = model.transform(test_df) predict_df.show() +Like other distributed interfaces, the ```device`` parameter doesn't support specifying ordinal as GPUs are managed by Spark instead of XGBoost (good: ``device=cuda``, bad: ``device=cuda:0``). Submit the PySpark application ============================== diff --git a/jvm-packages/README.md b/jvm-packages/README.md index 451a0d981b08..78f9a5e0f9a1 100644 --- a/jvm-packages/README.md +++ b/jvm-packages/README.md @@ -3,161 +3,15 @@ [![Documentation Status](https://readthedocs.org/projects/xgboost/badge/?version=latest)](https://xgboost.readthedocs.org/en/latest/jvm/index.html) [![GitHub license](http://dmlc.github.io/img/apache2.svg)](../LICENSE) -[Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) | +[Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) | [Resources](../demo/README.md) | [Release Notes](../NEWS.md) -XGBoost4J is the JVM package of xgboost. It brings all the optimizations -and power xgboost into JVM ecosystem. +XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost +into JVM ecosystem. -- Train XGBoost models in scala and java with easy customizations. -- Run distributed xgboost natively on jvm frameworks such as -Apache Flink and Apache Spark. +- Train XGBoost models in scala and java with easy customization. +- Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache +Spark. -You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) and [Resource Page](../demo/README.md). - -## Add Maven Dependency - -XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5. - -### Access release version - -Maven - -``` - - ml.dmlc - xgboost4j_2.12 - latest_version_num - - - ml.dmlc - xgboost4j-spark_2.12 - latest_version_num - -``` -or -``` - - ml.dmlc - xgboost4j_2.13 - latest_version_num - - - ml.dmlc - xgboost4j-spark_2.13 - latest_version_num - -``` - -sbt -```sbt -libraryDependencies ++= Seq( - "ml.dmlc" %% "xgboost4j" % "latest_version_num", - "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num" -) -``` - -For the latest release version number, please check [here](https://github.com/dmlc/xgboost/releases). - - -### Access SNAPSHOT version - -First add the following Maven repository hosted by the XGBoost project: - -Maven: - -```xml - - XGBoost4J Snapshot Repo - XGBoost4J Snapshot Repo - https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/ - -``` - -sbt: - -```sbt -resolvers += "XGBoost4J Snapshot Repo" at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/" -``` - -Then add XGBoost4J as a dependency: - -Maven - -``` - - ml.dmlc - xgboost4j_2.12 - latest_version_num-SNAPSHOT - - - ml.dmlc - xgboost4j-spark_2.12 - latest_version_num-SNAPSHOT - -``` -or with scala 2.13 -``` - - ml.dmlc - xgboost4j_2.13 - latest_version_num-SNAPSHOT - - - ml.dmlc - xgboost4j-spark_2.13 - latest_version_num-SNAPSHOT - -``` - -sbt -```sbt -libraryDependencies ++= Seq( - "ml.dmlc" %% "xgboost4j" % "latest_version_num-SNAPSHOT", - "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num-SNAPSHOT" -) -``` - -For the latest release version number, please check [the repository listing](https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html). - -### GPU algorithm -To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12` and `xgboost4j-spark-gpu_2.12` instead. -Note that scala 2.13 is not supported by the [NVIDIA/spark-rapids#1525](https://github.com/NVIDIA/spark-rapids/issues/1525) yet, so the GPU algorithm can only be used with scala 2.12. - -## Examples - -Full code examples for Scala, Java, Apache Spark, and Apache Flink can -be found in the [examples package](https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example). - -**NOTE on LIBSVM Format**: - -There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost. - -When users use Spark to load trainingset/testset in LIBSVM format with the following code snippet: - -```scala -spark.read.format("libsvm").load("trainingset_libsvm") -``` - -Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost. - -## Development - -You can build/package xgboost4j locally with the following steps: - -**Linux:** -1. Ensure [Docker for Linux](https://docs.docker.com/install/) is installed. -2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git` -3. Run the following command: - - With Tests: `./xgboost/jvm-packages/dev/build-linux.sh` - - Skip Tests: `./xgboost/jvm-packages/dev/build-linux.sh --skip-tests` - -**Windows:** -1. Ensure [Docker for Windows](https://docs.docker.com/docker-for-windows/install/) is installed. -2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git` -3. Run the following command: - - With Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd` - - Skip Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd --skip-tests` - -*Note: this will create jars for deployment on Linux machines.* +You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) and [Resource Page](../demo/README.md). \ No newline at end of file diff --git a/jvm-packages/dev/.gitattributes b/jvm-packages/dev/.gitattributes deleted file mode 100644 index ed670ecedb5e..000000000000 --- a/jvm-packages/dev/.gitattributes +++ /dev/null @@ -1,3 +0,0 @@ -# Set line endings to LF, even on Windows. Otherwise, execution within Docker fails. -# See https://help.github.com/articles/dealing-with-line-endings/ -*.sh text eol=lf diff --git a/jvm-packages/dev/.gitignore b/jvm-packages/dev/.gitignore deleted file mode 100644 index eb713db19674..000000000000 --- a/jvm-packages/dev/.gitignore +++ /dev/null @@ -1 +0,0 @@ -.m2 diff --git a/jvm-packages/dev/Dockerfile b/jvm-packages/dev/Dockerfile deleted file mode 100644 index 72ccdeba0825..000000000000 --- a/jvm-packages/dev/Dockerfile +++ /dev/null @@ -1,58 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -FROM centos:7 - -# Install all basic requirements -RUN \ - yum -y update && \ - yum install -y bzip2 make tar unzip wget xz git centos-release-scl yum-utils java-1.8.0-openjdk-devel && \ - yum-config-manager --enable centos-sclo-rh-testing && \ - yum -y update && \ - yum install -y devtoolset-7-gcc devtoolset-7-binutils devtoolset-7-gcc-c++ && \ - # Python - wget https://repo.continuum.io/miniconda/Miniconda3-4.5.12-Linux-x86_64.sh && \ - bash Miniconda3-4.5.12-Linux-x86_64.sh -b -p /opt/python && \ - # CMake - wget -nv -nc https://cmake.org/files/v3.18/cmake-3.18.3-Linux-x86_64.sh --no-check-certificate && \ - bash cmake-3.18.3-Linux-x86_64.sh --skip-license --prefix=/usr && \ - # Maven - wget https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz && \ - tar xvf apache-maven-3.6.1-bin.tar.gz -C /opt && \ - ln -s /opt/apache-maven-3.6.1/ /opt/maven - -# Set the required environment variables -ENV PATH=/opt/python/bin:/opt/maven/bin:$PATH -ENV CC=/opt/rh/devtoolset-7/root/usr/bin/gcc -ENV CXX=/opt/rh/devtoolset-7/root/usr/bin/c++ -ENV CPP=/opt/rh/devtoolset-7/root/usr/bin/cpp -ENV JAVA_HOME=/usr/lib/jvm/java - -# Install Python packages -RUN \ - pip install numpy pytest scipy scikit-learn wheel kubernetes urllib3==1.22 awscli - -ENV GOSU_VERSION 1.10 - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -WORKDIR /xgboost diff --git a/jvm-packages/dev/build-linux.cmd b/jvm-packages/dev/build-linux.cmd deleted file mode 100644 index a5d962f5fe52..000000000000 --- a/jvm-packages/dev/build-linux.cmd +++ /dev/null @@ -1,44 +0,0 @@ -@echo off - -rem -rem Licensed to the Apache Software Foundation (ASF) under one -rem or more contributor license agreements. See the NOTICE file -rem distributed with this work for additional information -rem regarding copyright ownership. The ASF licenses this file -rem to you under the Apache License, Version 2.0 (the -rem "License"); you may not use this file except in compliance -rem with the License. You may obtain a copy of the License at -rem -rem http://www.apache.org/licenses/LICENSE-2.0 -rem -rem Unless required by applicable law or agreed to in writing, -rem software distributed under the License is distributed on an -rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -rem KIND, either express or implied. See the License for the -rem specific language governing permissions and limitations -rem under the License. -rem - -rem The the local path of this file -set "BASEDIR=%~dp0" - -rem The local path of .m2 directory for maven -set "M2DIR=%BASEDIR%\.m2\" - -rem Create a local .m2 directory if needed -if not exist "%M2DIR%" mkdir "%M2DIR%" - -rem Build and tag the Dockerfile -docker build -t dmlc/xgboost4j-build %BASEDIR% - -docker run^ - -it^ - --rm^ - --memory 12g^ - --env JAVA_OPTS="-Xmx9g"^ - --env MAVEN_OPTS="-Xmx3g"^ - --ulimit core=-1^ - --volume %BASEDIR%\..\..:/xgboost^ - --volume %M2DIR%:/root/.m2^ - dmlc/xgboost4j-build^ - /xgboost/jvm-packages/dev/package-linux.sh "%*" diff --git a/jvm-packages/dev/build-linux.sh b/jvm-packages/dev/build-linux.sh deleted file mode 100755 index 1509a375236c..000000000000 --- a/jvm-packages/dev/build-linux.sh +++ /dev/null @@ -1,41 +0,0 @@ -#!/usr/bin/env bash -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -BASEDIR="$( cd "$( dirname "$0" )" && pwd )" # the directory of this file - -docker build -t dmlc/xgboost4j-build "${BASEDIR}" # build and tag the Dockerfile - -exec docker run \ - -it \ - --rm \ - --memory 12g \ - --env JAVA_OPTS="-Xmx9g" \ - --env MAVEN_OPTS="-Xmx3g -Dmaven.repo.local=/xgboost/jvm-packages/dev/.m2" \ - --env CI_BUILD_UID=`id -u` \ - --env CI_BUILD_GID=`id -g` \ - --env CI_BUILD_USER=`id -un` \ - --env CI_BUILD_GROUP=`id -gn` \ - --ulimit core=-1 \ - --volume "${BASEDIR}/../..":/xgboost \ - dmlc/xgboost4j-build \ - /xgboost/tests/ci_build/entrypoint.sh jvm-packages/dev/package-linux.sh "$@" - -# CI_BUILD_UID, CI_BUILD_GID, CI_BUILD_USER, CI_BUILD_GROUP -# are used by entrypoint.sh to create the user with the same uid in a container -# so all produced artifacts would be owned by your host user \ No newline at end of file diff --git a/jvm-packages/dev/package-linux.sh b/jvm-packages/dev/package-linux.sh deleted file mode 100755 index 1fd777d9b90b..000000000000 --- a/jvm-packages/dev/package-linux.sh +++ /dev/null @@ -1,36 +0,0 @@ -#!/usr/bin/env bash -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# -cd jvm-packages - -case "$1" in - --skip-tests) SKIP_TESTS=true ;; - "") SKIP_TESTS=false ;; -esac - -if [[ -n ${SKIP_TESTS} ]]; then - if [[ ${SKIP_TESTS} == "true" ]]; then - mvn --batch-mode clean package -DskipTests - elif [[ ${SKIP_TESTS} == "false" ]]; then - mvn --batch-mode clean package - fi -else - echo "Usage: $0 [--skip-tests]" - exit 1 -fi diff --git a/python-package/xgboost/core.py b/python-package/xgboost/core.py index d41976e8bc7c..4cacd61f3bb9 100644 --- a/python-package/xgboost/core.py +++ b/python-package/xgboost/core.py @@ -276,6 +276,27 @@ def _check_call(ret: int) -> None: raise XGBoostError(py_str(_LIB.XGBGetLastError())) +def _check_distributed_params(kwargs: Dict[str, Any]) -> None: + """Validate parameters in distributed environments.""" + device = kwargs.get("device", None) + if device and not isinstance(device, str): + msg = "Invalid type for the `device` parameter" + msg += _expect((str,), type(device)) + raise TypeError(msg) + + if device and device.find(":") != -1: + raise ValueError( + "Distributed training doesn't support selecting device ordinal as GPUs are" + " managed by the distributed framework. use `device=cuda` or `device=gpu`" + " instead." + ) + + if kwargs.get("booster", None) == "gblinear": + raise NotImplementedError( + f"booster `{kwargs['booster']}` is not supported for distributed training." + ) + + def build_info() -> dict: """Build information of XGBoost. The returned value format is not stable. Also, please note that build time dependency is not the same as runtime dependency. For diff --git a/python-package/xgboost/dask.py b/python-package/xgboost/dask.py index 32dd2a4a7ce3..271a5e458efe 100644 --- a/python-package/xgboost/dask.py +++ b/python-package/xgboost/dask.py @@ -70,6 +70,7 @@ Metric, Objective, QuantileDMatrix, + _check_distributed_params, _deprecate_positional_args, _expect, ) @@ -924,17 +925,7 @@ async def _train_async( ) -> Optional[TrainReturnT]: workers = _get_workers_from_data(dtrain, evals) _rabit_args = await _get_rabit_args(len(workers), dconfig, client) - - if params.get("booster", None) == "gblinear": - raise NotImplementedError( - f"booster `{params['booster']}` is not yet supported for dask." - ) - device = params.get("device", None) - if device and device.find(":") != -1: - raise ValueError( - "The dask interface for XGBoost doesn't support selecting specific device" - " ordinal. Use `device=cpu` or `device=cuda` instead." - ) + _check_distributed_params(params) def dispatched_train( parameters: Dict, diff --git a/python-package/xgboost/sklearn.py b/python-package/xgboost/sklearn.py index d69cb3a014d7..46a3ffa4aec1 100644 --- a/python-package/xgboost/sklearn.py +++ b/python-package/xgboost/sklearn.py @@ -1004,13 +1004,17 @@ def fit( Validation metrics will help us track the performance of the model. eval_metric : str, list of str, or callable, optional + .. deprecated:: 1.6.0 - Use `eval_metric` in :py:meth:`__init__` or :py:meth:`set_params` instead. + + Use `eval_metric` in :py:meth:`__init__` or :py:meth:`set_params` instead. early_stopping_rounds : int + .. deprecated:: 1.6.0 - Use `early_stopping_rounds` in :py:meth:`__init__` or - :py:meth:`set_params` instead. + + Use `early_stopping_rounds` in :py:meth:`__init__` or :py:meth:`set_params` + instead. verbose : If `verbose` is True and an evaluation set is used, the evaluation metric measured on the validation set is printed to stdout at each boosting stage. diff --git a/python-package/xgboost/spark/core.py b/python-package/xgboost/spark/core.py index 283999c6dd9a..998afbf77fde 100644 --- a/python-package/xgboost/spark/core.py +++ b/python-package/xgboost/spark/core.py @@ -60,7 +60,7 @@ import xgboost from xgboost import XGBClassifier from xgboost.compat import is_cudf_available -from xgboost.core import Booster +from xgboost.core import Booster, _check_distributed_params from xgboost.sklearn import DEFAULT_N_ESTIMATORS, XGBModel, _can_use_qdm from xgboost.training import train as worker_train @@ -92,6 +92,7 @@ get_class_name, get_logger, serialize_booster, + use_cuda, ) # Put pyspark specific params here, they won't be passed to XGBoost. @@ -108,7 +109,6 @@ "arbitrary_params_dict", "force_repartition", "num_workers", - "use_gpu", "feature_names", "features_cols", "enable_sparse_data_optim", @@ -132,8 +132,7 @@ _inverse_pyspark_param_alias_map = {v: k for k, v in _pyspark_param_alias_map.items()} _unsupported_xgb_params = [ - "gpu_id", # we have "use_gpu" pyspark param instead. - "device", # we have "use_gpu" pyspark param instead. + "gpu_id", # we have "device" pyspark param instead. "enable_categorical", # Use feature_types param to specify categorical feature instead "use_label_encoder", "n_jobs", # Do not allow user to set it, will use `spark.task.cpus` value instead. @@ -198,11 +197,24 @@ class _SparkXGBParams( "The number of XGBoost workers. Each XGBoost worker corresponds to one spark task.", TypeConverters.toInt, ) + device = Param( + Params._dummy(), + "device", + ( + "The device type for XGBoost executors. Available options are `cpu`,`cuda`" + " and `gpu`. Set `device` to `cuda` or `gpu` if the executors are running " + "on GPU instances. Currently, only one GPU per task is supported." + ), + TypeConverters.toString, + ) use_gpu = Param( Params._dummy(), "use_gpu", - "A boolean variable. Set use_gpu=true if the executors " - + "are running on GPU instances. Currently, only one GPU per task is supported.", + ( + "Deprecated, use `device` instead. A boolean variable. Set use_gpu=true " + "if the executors are running on GPU instances. Currently, only one GPU per" + " task is supported." + ), TypeConverters.toBoolean, ) force_repartition = Param( @@ -336,10 +348,20 @@ def _validate_params(self) -> None: f"It cannot be less than 1 [Default is 1]" ) + tree_method = self.getOrDefault(self.getParam("tree_method")) + if ( + self.getOrDefault(self.use_gpu) or use_cuda(self.getOrDefault(self.device)) + ) and not _can_use_qdm(tree_method): + raise ValueError( + f"The `{tree_method}` tree method is not supported on GPU." + ) + if self.getOrDefault(self.features_cols): - if not self.getOrDefault(self.use_gpu): + if not use_cuda(self.getOrDefault(self.device)) and not self.getOrDefault( + self.use_gpu + ): raise ValueError( - "features_col param with list value requires enabling use_gpu." + "features_col param with list value requires `device=cuda`." ) if self.getOrDefault("objective") is not None: @@ -392,17 +414,7 @@ def _validate_params(self) -> None: "`pyspark.ml.linalg.Vector` type." ) - if self.getOrDefault(self.use_gpu): - tree_method = self.getParam("tree_method") - if ( - self.getOrDefault(tree_method) is not None - and self.getOrDefault(tree_method) != "gpu_hist" - ): - raise ValueError( - f"tree_method should be 'gpu_hist' or None when use_gpu is True," - f"found {self.getOrDefault(tree_method)}." - ) - + if use_cuda(self.getOrDefault(self.device)) or self.getOrDefault(self.use_gpu): gpu_per_task = ( _get_spark_session() .sparkContext.getConf() @@ -424,8 +436,8 @@ def _validate_params(self) -> None: # so it's okay for printing the below warning instead of checking the real # gpu numbers and raising the exception. get_logger(self.__class__.__name__).warning( - "You enabled use_gpu in spark local mode. Please make sure your local node " - "has at least %d GPUs", + "You enabled GPU in spark local mode. Please make sure your local " + "node has at least %d GPUs", self.getOrDefault(self.num_workers), ) else: @@ -558,6 +570,7 @@ def __init__(self) -> None: # they are added in `setParams`. self._setDefault( num_workers=1, + device="cpu", use_gpu=False, force_repartition=False, repartition_random_shuffle=False, @@ -566,9 +579,7 @@ def __init__(self) -> None: arbitrary_params_dict={}, ) - def setParams( - self, **kwargs: Dict[str, Any] - ) -> None: # pylint: disable=invalid-name + def setParams(self, **kwargs: Any) -> None: # pylint: disable=invalid-name """ Set params for the estimator. """ @@ -613,6 +624,8 @@ def setParams( ) raise ValueError(err_msg) _extra_params[k] = v + + _check_distributed_params(kwargs) _existing_extra_params = self.getOrDefault(self.arbitrary_params_dict) self._set(arbitrary_params_dict={**_existing_extra_params, **_extra_params}) @@ -709,9 +722,6 @@ def _get_distributed_train_params(self, dataset: DataFrame) -> Dict[str, Any]: # TODO: support "num_parallel_tree" for random forest params["num_boost_round"] = self.getOrDefault("n_estimators") - if self.getOrDefault(self.use_gpu): - params["tree_method"] = "gpu_hist" - return params @classmethod @@ -883,8 +893,9 @@ def _fit(self, dataset: DataFrame) -> "_SparkXGBModel": dmatrix_kwargs, ) = self._get_xgb_parameters(dataset) - use_gpu = self.getOrDefault(self.use_gpu) - + run_on_gpu = use_cuda(self.getOrDefault(self.device)) or self.getOrDefault( + self.use_gpu + ) is_local = _is_local(_get_spark_session().sparkContext) num_workers = self.getOrDefault(self.num_workers) @@ -903,7 +914,7 @@ def _train_booster( dev_ordinal = None use_qdm = _can_use_qdm(booster_params.get("tree_method", None)) - if use_gpu: + if run_on_gpu: dev_ordinal = ( context.partitionId() if is_local else _get_gpu_id(context) ) diff --git a/python-package/xgboost/spark/estimator.py b/python-package/xgboost/spark/estimator.py index ba75aca7f8e0..f11a0eda856b 100644 --- a/python-package/xgboost/spark/estimator.py +++ b/python-package/xgboost/spark/estimator.py @@ -3,8 +3,8 @@ # pylint: disable=fixme, too-many-ancestors, protected-access, no-member, invalid-name # pylint: disable=unused-argument, too-many-locals - -from typing import Any, Dict, List, Optional, Type, Union +import warnings +from typing import Any, List, Optional, Type, Union import numpy as np from pyspark import keyword_only @@ -77,27 +77,35 @@ def set_param_attrs(attr_name: str, param: Param) -> None: set_param_attrs(name, param_obj) +def _deprecated_use_gpu() -> None: + warnings.warn( + "`use_gpu` is deprecated since 2.0.0, use `device` instead", FutureWarning + ) + + class SparkXGBRegressor(_SparkXGBEstimator): """SparkXGBRegressor is a PySpark ML estimator. It implements the XGBoost regression algorithm based on XGBoost python library, and it can be used in PySpark Pipeline - and PySpark ML meta algorithms like :py:class:`~pyspark.ml.tuning.CrossValidator`/ - :py:class:`~pyspark.ml.tuning.TrainValidationSplit`/ - :py:class:`~pyspark.ml.classification.OneVsRest` + and PySpark ML meta algorithms like + - :py:class:`~pyspark.ml.tuning.CrossValidator`/ + - :py:class:`~pyspark.ml.tuning.TrainValidationSplit`/ + - :py:class:`~pyspark.ml.classification.OneVsRest` SparkXGBRegressor automatically supports most of the parameters in :py:class:`xgboost.XGBRegressor` constructor and most of the parameters used in - :py:meth:`xgboost.XGBRegressor.fit` and :py:meth:`xgboost.XGBRegressor.predict` method. + :py:meth:`xgboost.XGBRegressor.fit` and :py:meth:`xgboost.XGBRegressor.predict` + method. - SparkXGBRegressor doesn't support setting `device` but supports another param - `use_gpu`, see doc below for more details. + To enable GPU support, set `device` to `cuda` or `gpu`. - SparkXGBRegressor doesn't support setting `base_margin` explicitly as well, but support - another param called `base_margin_col`. see doc below for more details. + SparkXGBRegressor doesn't support setting `base_margin` explicitly as well, but + support another param called `base_margin_col`. see doc below for more details. SparkXGBRegressor doesn't support `validate_features` and `output_margin` param. - SparkXGBRegressor doesn't support setting `nthread` xgboost param, instead, the `nthread` - param for each xgboost worker will be set equal to `spark.task.cpus` config value. + SparkXGBRegressor doesn't support setting `nthread` xgboost param, instead, the + `nthread` param for each xgboost worker will be set equal to `spark.task.cpus` + config value. Parameters @@ -133,8 +141,11 @@ class SparkXGBRegressor(_SparkXGBEstimator): How many XGBoost workers to be used to train. Each XGBoost worker corresponds to one spark task. use_gpu: - Boolean value to specify whether the executors are running on GPU - instances. + .. deprecated:: 2.0.0 + + Use `device` instead. + device: + Device for XGBoost workers, available options are `cpu`, `cuda`, and `gpu`. force_repartition: Boolean value to specify if forcing the input dataset to be repartitioned before XGBoost training. @@ -193,14 +204,17 @@ def __init__( weight_col: Optional[str] = None, base_margin_col: Optional[str] = None, num_workers: int = 1, - use_gpu: bool = False, + use_gpu: Optional[bool] = None, + device: Optional[str] = None, force_repartition: bool = False, repartition_random_shuffle: bool = False, enable_sparse_data_optim: bool = False, - **kwargs: Dict[str, Any], + **kwargs: Any, ) -> None: super().__init__() input_kwargs = self._input_kwargs + if use_gpu: + _deprecated_use_gpu() self.setParams(**input_kwargs) @classmethod @@ -238,27 +252,29 @@ class SparkXGBClassifier(_SparkXGBEstimator, HasProbabilityCol, HasRawPrediction """SparkXGBClassifier is a PySpark ML estimator. It implements the XGBoost classification algorithm based on XGBoost python library, and it can be used in PySpark Pipeline and PySpark ML meta algorithms like - :py:class:`~pyspark.ml.tuning.CrossValidator`/ - :py:class:`~pyspark.ml.tuning.TrainValidationSplit`/ - :py:class:`~pyspark.ml.classification.OneVsRest` + - :py:class:`~pyspark.ml.tuning.CrossValidator`/ + - :py:class:`~pyspark.ml.tuning.TrainValidationSplit`/ + - :py:class:`~pyspark.ml.classification.OneVsRest` SparkXGBClassifier automatically supports most of the parameters in :py:class:`xgboost.XGBClassifier` constructor and most of the parameters used in - :py:meth:`xgboost.XGBClassifier.fit` and :py:meth:`xgboost.XGBClassifier.predict` method. + :py:meth:`xgboost.XGBClassifier.fit` and :py:meth:`xgboost.XGBClassifier.predict` + method. - SparkXGBClassifier doesn't support setting `device` but support another param - `use_gpu`, see doc below for more details. + To enable GPU support, set `device` to `cuda` or `gpu`. - SparkXGBClassifier doesn't support setting `base_margin` explicitly as well, but support - another param called `base_margin_col`. see doc below for more details. + SparkXGBClassifier doesn't support setting `base_margin` explicitly as well, but + support another param called `base_margin_col`. see doc below for more details. - SparkXGBClassifier doesn't support setting `output_margin`, but we can get output margin - from the raw prediction column. See `raw_prediction_col` param doc below for more details. + SparkXGBClassifier doesn't support setting `output_margin`, but we can get output + margin from the raw prediction column. See `raw_prediction_col` param doc below for + more details. SparkXGBClassifier doesn't support `validate_features` and `output_margin` param. - SparkXGBClassifier doesn't support setting `nthread` xgboost param, instead, the `nthread` - param for each xgboost worker will be set equal to `spark.task.cpus` config value. + SparkXGBClassifier doesn't support setting `nthread` xgboost param, instead, the + `nthread` param for each xgboost worker will be set equal to `spark.task.cpus` + config value. Parameters @@ -300,8 +316,11 @@ class SparkXGBClassifier(_SparkXGBEstimator, HasProbabilityCol, HasRawPrediction How many XGBoost workers to be used to train. Each XGBoost worker corresponds to one spark task. use_gpu: - Boolean value to specify whether the executors are running on GPU - instances. + .. deprecated:: 2.0.0 + + Use `device` instead. + device: + Device for XGBoost workers, available options are `cpu`, `cuda`, and `gpu`. force_repartition: Boolean value to specify if forcing the input dataset to be repartitioned before XGBoost training. @@ -360,11 +379,12 @@ def __init__( weight_col: Optional[str] = None, base_margin_col: Optional[str] = None, num_workers: int = 1, - use_gpu: bool = False, + use_gpu: Optional[bool] = None, + device: Optional[str] = None, force_repartition: bool = False, repartition_random_shuffle: bool = False, enable_sparse_data_optim: bool = False, - **kwargs: Dict[str, Any], + **kwargs: Any, ) -> None: super().__init__() # The default 'objective' param value comes from sklearn `XGBClassifier` ctor, @@ -372,6 +392,8 @@ def __init__( # binary or multinomial input dataset, and we need to remove the fixed default # param value as well to avoid causing ambiguity. input_kwargs = self._input_kwargs + if use_gpu: + _deprecated_use_gpu() self.setParams(**input_kwargs) self._setDefault(objective=None) @@ -422,19 +444,20 @@ class SparkXGBRanker(_SparkXGBEstimator): :py:class:`xgboost.XGBRanker` constructor and most of the parameters used in :py:meth:`xgboost.XGBRanker.fit` and :py:meth:`xgboost.XGBRanker.predict` method. - SparkXGBRanker doesn't support setting `device` but support another param `use_gpu`, - see doc below for more details. + To enable GPU support, set `device` to `cuda` or `gpu`. SparkXGBRanker doesn't support setting `base_margin` explicitly as well, but support another param called `base_margin_col`. see doc below for more details. SparkXGBRanker doesn't support setting `output_margin`, but we can get output margin - from the raw prediction column. See `raw_prediction_col` param doc below for more details. + from the raw prediction column. See `raw_prediction_col` param doc below for more + details. SparkXGBRanker doesn't support `validate_features` and `output_margin` param. - SparkXGBRanker doesn't support setting `nthread` xgboost param, instead, the `nthread` - param for each xgboost worker will be set equal to `spark.task.cpus` config value. + SparkXGBRanker doesn't support setting `nthread` xgboost param, instead, the + `nthread` param for each xgboost worker will be set equal to `spark.task.cpus` + config value. Parameters @@ -467,13 +490,15 @@ class SparkXGBRanker(_SparkXGBEstimator): :py:class:`xgboost.XGBRanker` fit method. qid_col: Query id column name. - num_workers: How many XGBoost workers to be used to train. Each XGBoost worker corresponds to one spark task. use_gpu: - Boolean value to specify whether the executors are running on GPU - instances. + .. deprecated:: 2.0.0 + + Use `device` instead. + device: + Device for XGBoost workers, available options are `cpu`, `cuda`, and `gpu`. force_repartition: Boolean value to specify if forcing the input dataset to be repartitioned before XGBoost training. @@ -538,14 +563,17 @@ def __init__( base_margin_col: Optional[str] = None, qid_col: Optional[str] = None, num_workers: int = 1, - use_gpu: bool = False, + use_gpu: Optional[bool] = None, + device: Optional[str] = None, force_repartition: bool = False, repartition_random_shuffle: bool = False, enable_sparse_data_optim: bool = False, - **kwargs: Dict[str, Any], + **kwargs: Any, ) -> None: super().__init__() input_kwargs = self._input_kwargs + if use_gpu: + _deprecated_use_gpu() self.setParams(**input_kwargs) @classmethod diff --git a/python-package/xgboost/spark/utils.py b/python-package/xgboost/spark/utils.py index 46e465dde4a6..5f3bb19bacbd 100644 --- a/python-package/xgboost/spark/utils.py +++ b/python-package/xgboost/spark/utils.py @@ -7,7 +7,7 @@ import sys import uuid from threading import Thread -from typing import Any, Callable, Dict, Set, Type +from typing import Any, Callable, Dict, Optional, Set, Type import pyspark from pyspark import BarrierTaskContext, SparkContext, SparkFiles @@ -186,3 +186,8 @@ def deserialize_booster(model: str) -> Booster: f.write(model) booster.load_model(tmp_file_name) return booster + + +def use_cuda(device: Optional[str]) -> bool: + """Whether xgboost is using CUDA workers.""" + return device in ("cuda", "gpu") diff --git a/src/gbm/gbtree.cc b/src/gbm/gbtree.cc index e97b27665354..0806c13a7c7f 100644 --- a/src/gbm/gbtree.cc +++ b/src/gbm/gbtree.cc @@ -98,8 +98,8 @@ void MismatchedDevices(Context const* booster, Context const* data) { - Use a data structure that matches the device ordinal in the booster. - Set the device for booster before call to inplace_predict. -This warning will only be shown once, and subsequent warnings made by the current thread will be -suppressed. +This warning will only be shown once for each thread. Subsequent warnings made by the +current thread will be suppressed. )"; logged = true; } diff --git a/tests/buildkite/build-containers.sh b/tests/buildkite/build-containers.sh index 899976a7ddf7..f46e6ccd089e 100755 --- a/tests/buildkite/build-containers.sh +++ b/tests/buildkite/build-containers.sh @@ -20,16 +20,18 @@ case "${container}" in cpu) ;; - gpu|rmm) + gpu) BUILD_ARGS="$BUILD_ARGS --build-arg CUDA_VERSION_ARG=$CUDA_VERSION" BUILD_ARGS="$BUILD_ARGS --build-arg RAPIDS_VERSION_ARG=$RAPIDS_VERSION" - if [[ $container == "rmm" ]] - then - BUILD_ARGS="$BUILD_ARGS --build-arg NCCL_VERSION_ARG=$NCCL_VERSION" - fi ;; - gpu_build_centos7|jvm_gpu_build) + gpu_build_centos7) + BUILD_ARGS="$BUILD_ARGS --build-arg CUDA_VERSION_ARG=$CUDA_VERSION" + BUILD_ARGS="$BUILD_ARGS --build-arg NCCL_VERSION_ARG=$NCCL_VERSION" + BUILD_ARGS="$BUILD_ARGS --build-arg RAPIDS_VERSION_ARG=$RAPIDS_VERSION" + ;; + + jvm_gpu_build) BUILD_ARGS="$BUILD_ARGS --build-arg CUDA_VERSION_ARG=$CUDA_VERSION" BUILD_ARGS="$BUILD_ARGS --build-arg NCCL_VERSION_ARG=$NCCL_VERSION" ;; diff --git a/tests/buildkite/build-cuda-with-rmm.sh b/tests/buildkite/build-cuda-with-rmm.sh index 2e0b9fe2c916..46bc9802863d 100755 --- a/tests/buildkite/build-cuda-with-rmm.sh +++ b/tests/buildkite/build-cuda-with-rmm.sh @@ -2,9 +2,11 @@ set -euo pipefail +WHEEL_TAG=manylinux2014_x86_64 + source tests/buildkite/conftest.sh -echo "--- Build with CUDA ${CUDA_VERSION}, RMM enabled" +echo "--- Build with CUDA ${CUDA_VERSION} with RMM" if [[ ($is_pull_request == 1) || ($is_release_branch == 0) ]] then @@ -13,14 +15,40 @@ else arch_flag="" fi -command_wrapper="tests/ci_build/ci_build.sh rmm docker --build-arg "` +command_wrapper="tests/ci_build/ci_build.sh gpu_build_centos7 docker --build-arg "` `"CUDA_VERSION_ARG=$CUDA_VERSION --build-arg "` - `"RAPIDS_VERSION_ARG=$RAPIDS_VERSION --build-arg "` - `"NCCL_VERSION_ARG=$NCCL_VERSION" + `"NCCL_VERSION_ARG=$NCCL_VERSION --build-arg "` + `"RAPIDS_VERSION_ARG=$RAPIDS_VERSION" echo "--- Build libxgboost from the source" -$command_wrapper tests/ci_build/build_via_cmake.sh --conda-env=gpu_test -DUSE_CUDA=ON \ - -DUSE_NCCL=ON -DPLUGIN_RMM=ON ${arch_flag} +$command_wrapper tests/ci_build/prune_libnccl.sh +$command_wrapper tests/ci_build/build_via_cmake.sh -DCMAKE_PREFIX_PATH="/opt/grpc;/opt/rmm" \ + -DUSE_CUDA=ON -DUSE_NCCL=ON -DUSE_OPENMP=ON -DHIDE_CXX_SYMBOLS=ON -DPLUGIN_FEDERATED=ON \ + -DPLUGIN_RMM=ON -DUSE_NCCL_LIB_PATH=ON -DNCCL_INCLUDE_DIR=/usr/include \ + -DNCCL_LIBRARY=/workspace/libnccl_static.a ${arch_flag} +echo "--- Build binary wheel" +$command_wrapper bash -c \ + "cd python-package && rm -rf dist/* && pip wheel --no-deps -v . --wheel-dir dist/" +$command_wrapper python tests/ci_build/rename_whl.py python-package/dist/*.whl \ + ${BUILDKITE_COMMIT} ${WHEEL_TAG} + +echo "--- Audit binary wheel to ensure it's compliant with manylinux2014 standard" +tests/ci_build/ci_build.sh auditwheel_x86_64 docker auditwheel repair \ + --plat ${WHEEL_TAG} python-package/dist/*.whl +$command_wrapper python tests/ci_build/rename_whl.py wheelhouse/*.whl \ + ${BUILDKITE_COMMIT} ${WHEEL_TAG} +mv -v wheelhouse/*.whl python-package/dist/ +# Make sure that libgomp.so is vendored in the wheel +tests/ci_build/ci_build.sh auditwheel_x86_64 docker bash -c \ + "unzip -l python-package/dist/*.whl | grep libgomp || exit -1" + +echo "--- Upload Python wheel" +buildkite-agent artifact upload python-package/dist/*.whl +if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] +then + aws s3 cp python-package/dist/*.whl s3://xgboost-nightly-builds/experimental_build_with_rmm/ \ + --acl public-read --no-progress +fi echo "-- Stash C++ test executable (testxgboost)" buildkite-agent artifact upload build/testxgboost diff --git a/tests/buildkite/build-cuda.sh b/tests/buildkite/build-cuda.sh index c180695e820f..1926754b8ab7 100755 --- a/tests/buildkite/build-cuda.sh +++ b/tests/buildkite/build-cuda.sh @@ -17,11 +17,12 @@ fi command_wrapper="tests/ci_build/ci_build.sh gpu_build_centos7 docker --build-arg "` `"CUDA_VERSION_ARG=$CUDA_VERSION --build-arg "` - `"NCCL_VERSION_ARG=$NCCL_VERSION" + `"NCCL_VERSION_ARG=$NCCL_VERSION --build-arg "` + `"RAPIDS_VERSION_ARG=$RAPIDS_VERSION" echo "--- Build libxgboost from the source" $command_wrapper tests/ci_build/prune_libnccl.sh -$command_wrapper tests/ci_build/build_via_cmake.sh -DCMAKE_PREFIX_PATH=/opt/grpc \ +$command_wrapper tests/ci_build/build_via_cmake.sh -DCMAKE_PREFIX_PATH="/opt/grpc" \ -DUSE_CUDA=ON -DUSE_NCCL=ON -DUSE_OPENMP=ON -DHIDE_CXX_SYMBOLS=ON -DPLUGIN_FEDERATED=ON \ -DUSE_NCCL_LIB_PATH=ON -DNCCL_INCLUDE_DIR=/usr/include \ -DNCCL_LIBRARY=/workspace/libnccl_static.a ${arch_flag} diff --git a/tests/buildkite/pipeline-mgpu.yml b/tests/buildkite/pipeline-mgpu.yml index aff2d078bda3..3229646d5467 100644 --- a/tests/buildkite/pipeline-mgpu.yml +++ b/tests/buildkite/pipeline-mgpu.yml @@ -12,7 +12,7 @@ steps: queue: pipeline-loader - wait - block: ":rocket: Run this test job" - if: build.pull_request.id != null + if: build.pull_request.id != null || build.branch =~ /^dependabot\// #### -------- CONTAINER BUILD -------- - label: ":docker: Build containers" commands: diff --git a/tests/buildkite/pipeline-win64.yml b/tests/buildkite/pipeline-win64.yml index 0a1f7f1648c0..d4491148ee37 100644 --- a/tests/buildkite/pipeline-win64.yml +++ b/tests/buildkite/pipeline-win64.yml @@ -6,7 +6,7 @@ steps: queue: pipeline-loader - wait - block: ":rocket: Run this test job" - if: build.pull_request.id != null + if: build.pull_request.id != null || build.branch =~ /^dependabot\// #### -------- BUILD -------- - label: ":windows: Build XGBoost for Windows with CUDA" command: "tests/buildkite/build-win64-gpu.ps1" diff --git a/tests/buildkite/pipeline.yml b/tests/buildkite/pipeline.yml index fa09242bffb7..905535c526ff 100644 --- a/tests/buildkite/pipeline.yml +++ b/tests/buildkite/pipeline.yml @@ -9,14 +9,13 @@ steps: queue: pipeline-loader - wait - block: ":rocket: Run this test job" - if: build.pull_request.id != null + if: build.pull_request.id != null || build.branch =~ /^dependabot\// #### -------- CONTAINER BUILD -------- - label: ":docker: Build containers" commands: - "tests/buildkite/build-containers.sh cpu" - "tests/buildkite/build-containers.sh gpu" - "tests/buildkite/build-containers.sh gpu_build_centos7" - - "tests/buildkite/build-containers.sh rmm" key: build-containers agents: queue: linux-amd64-cpu diff --git a/tests/buildkite/test-cpp-gpu.sh b/tests/buildkite/test-cpp-gpu.sh index 7c8f5e505d46..58d25030852c 100755 --- a/tests/buildkite/test-cpp-gpu.sh +++ b/tests/buildkite/test-cpp-gpu.sh @@ -16,8 +16,8 @@ echo "--- Run Google Tests with CUDA, using a GPU, RMM enabled" rm -rfv build/ buildkite-agent artifact download "build/testxgboost" . --step build-cuda-with-rmm chmod +x build/testxgboost -tests/ci_build/ci_build.sh rmm nvidia-docker \ +tests/ci_build/ci_build.sh gpu nvidia-docker \ --build-arg CUDA_VERSION_ARG=$CUDA_VERSION \ --build-arg RAPIDS_VERSION_ARG=$RAPIDS_VERSION \ - --build-arg NCCL_VERSION_ARG=$NCCL_VERSION bash -c \ - "source activate gpu_test && build/testxgboost --use-rmm-pool" + --build-arg NCCL_VERSION_ARG=$NCCL_VERSION \ + build/testxgboost --use-rmm-pool diff --git a/tests/ci_build/Dockerfile.gpu_build_centos7 b/tests/ci_build/Dockerfile.gpu_build_centos7 index bfc79c2162b9..4f9823baab07 100644 --- a/tests/ci_build/Dockerfile.gpu_build_centos7 +++ b/tests/ci_build/Dockerfile.gpu_build_centos7 @@ -2,6 +2,7 @@ ARG CUDA_VERSION_ARG FROM nvidia/cuda:$CUDA_VERSION_ARG-devel-centos7 ARG CUDA_VERSION_ARG ARG NCCL_VERSION_ARG +ARG RAPIDS_VERSION_ARG # Install all basic requirements RUN \ @@ -16,8 +17,8 @@ RUN \ bash conda.sh -b -p /opt/mambaforge && \ /opt/mambaforge/bin/python -m pip install awscli && \ # CMake - wget -nv -nc https://cmake.org/files/v3.18/cmake-3.18.0-Linux-x86_64.sh --no-check-certificate && \ - bash cmake-3.18.0-Linux-x86_64.sh --skip-license --prefix=/usr + wget -nv -nc https://cmake.org/files/v3.24/cmake-3.24.0-linux-x86_64.sh --no-check-certificate && \ + bash cmake-3.24.0-linux-x86_64.sh --skip-license --prefix=/usr # NCCL2 (License: https://docs.nvidia.com/deeplearning/sdk/nccl-sla/index.html) RUN \ @@ -33,9 +34,21 @@ ENV PATH=/opt/mambaforge/bin:/usr/local/ninja:$PATH ENV CC=/opt/rh/devtoolset-9/root/usr/bin/gcc ENV CXX=/opt/rh/devtoolset-9/root/usr/bin/c++ ENV CPP=/opt/rh/devtoolset-9/root/usr/bin/cpp +ENV CUDAHOSTCXX=/opt/rh/devtoolset-9/root/usr/bin/c++ ENV GOSU_VERSION 1.10 +# Install RMM +RUN git clone -b v${RAPIDS_VERSION_ARG}.00 https://github.com/rapidsai/rmm.git --recurse-submodules --depth 1 && \ + pushd rmm && \ + mkdir build && \ + pushd build && \ + cmake .. -GNinja -DCMAKE_INSTALL_PREFIX=/opt/rmm -DCUDA_STATIC_RUNTIME=ON && \ + cmake --build . --target install && \ + popd && \ + popd && \ + rm -rf rmm + # Install gRPC RUN git clone -b v1.49.1 https://github.com/grpc/grpc.git \ --recurse-submodules --depth 1 && \ diff --git a/tests/ci_build/Dockerfile.rmm b/tests/ci_build/Dockerfile.rmm deleted file mode 100644 index 16db377c2260..000000000000 --- a/tests/ci_build/Dockerfile.rmm +++ /dev/null @@ -1,49 +0,0 @@ -ARG CUDA_VERSION_ARG -FROM nvidia/cuda:$CUDA_VERSION_ARG-devel-ubuntu20.04 -ARG CUDA_VERSION_ARG -ARG RAPIDS_VERSION_ARG -ARG NCCL_VERSION_ARG - -# Environment -ENV DEBIAN_FRONTEND noninteractive -SHELL ["/bin/bash", "-c"] # Use Bash as shell - -# Install all basic requirements -RUN \ - apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub && \ - apt-get update && \ - apt-get install -y wget unzip bzip2 libgomp1 build-essential ninja-build git && \ - # Python - wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/22.11.1-2/Mambaforge-22.11.1-2-Linux-x86_64.sh && \ - bash conda.sh -b -p /opt/mambaforge - -# NCCL2 (License: https://docs.nvidia.com/deeplearning/sdk/nccl-sla/index.html) -RUN \ - export CUDA_SHORT=`echo $CUDA_VERSION_ARG | grep -o -E '[0-9]+\.[0-9]'` && \ - export NCCL_VERSION=$NCCL_VERSION_ARG && \ - apt-get update && \ - apt-get install -y --allow-downgrades --allow-change-held-packages libnccl2=${NCCL_VERSION}+cuda${CUDA_SHORT} libnccl-dev=${NCCL_VERSION}+cuda${CUDA_SHORT} - -ENV PATH=/opt/mambaforge/bin:$PATH - -# Create new Conda environment with RMM -RUN \ - conda install -c conda-forge mamba && \ - mamba create -n gpu_test -c rapidsai-nightly -c rapidsai -c nvidia -c conda-forge -c defaults \ - python=3.10 rmm=$RAPIDS_VERSION_ARG* cudatoolkit=$CUDA_VERSION_ARG cmake && \ - mamba clean --all - -ENV GOSU_VERSION 1.10 - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/tests/ci_build/prune_libnccl.sh b/tests/ci_build/prune_libnccl.sh index 5b6e48ad5bb5..a81d6e4ac7e7 100755 --- a/tests/ci_build/prune_libnccl.sh +++ b/tests/ci_build/prune_libnccl.sh @@ -26,7 +26,7 @@ set_property(TARGET test PROPERTY CUDA_ARCHITECTURES \${CMAKE_CUDA_ARCHITECTURES set(CMAKE_EXPORT_COMPILE_COMMANDS ON) EOF -cmake . -GNinja +cmake . -GNinja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON gen_code=$(grep -o -- '--generate-code=\S*' compile_commands.json | paste -sd ' ') nvprune ${gen_code} /usr/lib64/libnccl_static.a -o ../libnccl_static.a diff --git a/tests/test_distributed/test_gpu_with_spark/test_gpu_spark.py b/tests/test_distributed/test_gpu_with_spark/test_gpu_spark.py index 1f986f96ea10..a962f778e888 100644 --- a/tests/test_distributed/test_gpu_with_spark/test_gpu_spark.py +++ b/tests/test_distributed/test_gpu_with_spark/test_gpu_spark.py @@ -154,7 +154,7 @@ def spark_diabetes_dataset_feature_cols(spark_session_with_gpu): def test_sparkxgb_classifier_with_gpu(spark_iris_dataset): from pyspark.ml.evaluation import MulticlassClassificationEvaluator - classifier = SparkXGBClassifier(use_gpu=True, num_workers=num_workers) + classifier = SparkXGBClassifier(device="cuda", num_workers=num_workers) train_df, test_df = spark_iris_dataset model = classifier.fit(train_df) pred_result_df = model.transform(test_df) @@ -169,7 +169,7 @@ def test_sparkxgb_classifier_feature_cols_with_gpu(spark_iris_dataset_feature_co train_df, test_df, feature_names = spark_iris_dataset_feature_cols classifier = SparkXGBClassifier( - features_col=feature_names, use_gpu=True, num_workers=num_workers + features_col=feature_names, device="cuda", num_workers=num_workers ) model = classifier.fit(train_df) @@ -185,7 +185,7 @@ def test_cv_sparkxgb_classifier_feature_cols_with_gpu(spark_iris_dataset_feature train_df, test_df, feature_names = spark_iris_dataset_feature_cols classifier = SparkXGBClassifier( - features_col=feature_names, use_gpu=True, num_workers=num_workers + features_col=feature_names, device="cuda", num_workers=num_workers ) grid = ParamGridBuilder().addGrid(classifier.max_depth, [6, 8]).build() evaluator = MulticlassClassificationEvaluator(metricName="f1") @@ -197,11 +197,24 @@ def test_cv_sparkxgb_classifier_feature_cols_with_gpu(spark_iris_dataset_feature f1 = evaluator.evaluate(pred_result_df) assert f1 >= 0.97 + clf = SparkXGBClassifier( + features_col=feature_names, use_gpu=True, num_workers=num_workers + ) + grid = ParamGridBuilder().addGrid(clf.max_depth, [6, 8]).build() + evaluator = MulticlassClassificationEvaluator(metricName="f1") + cv = CrossValidator( + estimator=clf, evaluator=evaluator, estimatorParamMaps=grid, numFolds=3 + ) + cvModel = cv.fit(train_df) + pred_result_df = cvModel.transform(test_df) + f1 = evaluator.evaluate(pred_result_df) + assert f1 >= 0.97 + def test_sparkxgb_regressor_with_gpu(spark_diabetes_dataset): from pyspark.ml.evaluation import RegressionEvaluator - regressor = SparkXGBRegressor(use_gpu=True, num_workers=num_workers) + regressor = SparkXGBRegressor(device="cuda", num_workers=num_workers) train_df, test_df = spark_diabetes_dataset model = regressor.fit(train_df) pred_result_df = model.transform(test_df) @@ -215,7 +228,7 @@ def test_sparkxgb_regressor_feature_cols_with_gpu(spark_diabetes_dataset_feature train_df, test_df, feature_names = spark_diabetes_dataset_feature_cols regressor = SparkXGBRegressor( - features_col=feature_names, use_gpu=True, num_workers=num_workers + features_col=feature_names, device="cuda", num_workers=num_workers ) model = regressor.fit(train_df) diff --git a/tests/test_distributed/test_with_spark/test_spark_local.py b/tests/test_distributed/test_with_spark/test_spark_local.py index 124f36d02034..50eafb0a170a 100644 --- a/tests/test_distributed/test_with_spark/test_spark_local.py +++ b/tests/test_distributed/test_with_spark/test_spark_local.py @@ -741,11 +741,6 @@ def test_early_stop_param_validation(self, clf_data: ClfData) -> None: with pytest.raises(ValueError, match="early_stopping_rounds"): classifier.fit(clf_data.cls_df_train) - def test_gpu_param_setting(self, clf_data: ClfData) -> None: - py_cls = SparkXGBClassifier(use_gpu=True) - train_params = py_cls._get_distributed_train_params(clf_data.cls_df_train) - assert train_params["tree_method"] == "gpu_hist" - def test_classifier_with_list_eval_metric(self, clf_data: ClfData) -> None: classifier = SparkXGBClassifier(eval_metric=["auc", "rmse"]) model = classifier.fit(clf_data.cls_df_train) @@ -756,6 +751,53 @@ def test_classifier_with_string_eval_metric(self, clf_data: ClfData) -> None: model = classifier.fit(clf_data.cls_df_train) model.transform(clf_data.cls_df_test).collect() + def test_regressor_params_basic(self) -> None: + py_reg = SparkXGBRegressor() + assert hasattr(py_reg, "n_estimators") + assert py_reg.n_estimators.parent == py_reg.uid + assert not hasattr(py_reg, "gpu_id") + assert hasattr(py_reg, "device") + assert py_reg.getOrDefault(py_reg.n_estimators) == 100 + assert py_reg.getOrDefault(getattr(py_reg, "objective")), "reg:squarederror" + py_reg2 = SparkXGBRegressor(n_estimators=200) + assert py_reg2.getOrDefault(getattr(py_reg2, "n_estimators")), 200 + py_reg3 = py_reg2.copy({getattr(py_reg2, "max_depth"): 10}) + assert py_reg3.getOrDefault(getattr(py_reg3, "n_estimators")), 200 + assert py_reg3.getOrDefault(getattr(py_reg3, "max_depth")), 10 + + def test_classifier_params_basic(self) -> None: + py_clf = SparkXGBClassifier() + assert hasattr(py_clf, "n_estimators") + assert py_clf.n_estimators.parent == py_clf.uid + assert not hasattr(py_clf, "gpu_id") + assert hasattr(py_clf, "device") + + assert py_clf.getOrDefault(py_clf.n_estimators) == 100 + assert py_clf.getOrDefault(getattr(py_clf, "objective")) is None + py_clf2 = SparkXGBClassifier(n_estimators=200) + assert py_clf2.getOrDefault(getattr(py_clf2, "n_estimators")) == 200 + py_clf3 = py_clf2.copy({getattr(py_clf2, "max_depth"): 10}) + assert py_clf3.getOrDefault(getattr(py_clf3, "n_estimators")) == 200 + assert py_clf3.getOrDefault(getattr(py_clf3, "max_depth")), 10 + + def test_classifier_kwargs_basic(self, clf_data: ClfData) -> None: + py_clf = SparkXGBClassifier(**clf_data.cls_params) + assert hasattr(py_clf, "n_estimators") + assert py_clf.n_estimators.parent == py_clf.uid + assert not hasattr(py_clf, "gpu_id") + assert hasattr(py_clf, "device") + assert hasattr(py_clf, "arbitrary_params_dict") + assert py_clf.getOrDefault(py_clf.arbitrary_params_dict) == {} + + # Testing overwritten params + py_clf = SparkXGBClassifier() + py_clf.setParams(x=1, y=2) + py_clf.setParams(y=3, z=4) + xgb_params = py_clf._gen_xgb_params_dict() + assert xgb_params["x"] == 1 + assert xgb_params["y"] == 3 + assert xgb_params["z"] == 4 + def test_regressor_model_save_load(self, reg_data: RegData) -> None: with tempfile.TemporaryDirectory() as tmpdir: path = "file:" + tmpdir @@ -826,6 +868,24 @@ def test_regressor_model_pipeline_save_load(self, reg_data: RegData) -> None: ) assert_model_compatible(model.stages[0], tmpdir) + def test_device_param(self, reg_data: RegData, clf_data: ClfData) -> None: + clf = SparkXGBClassifier(device="cuda", tree_method="exact") + with pytest.raises(ValueError, match="not supported on GPU"): + clf.fit(clf_data.cls_df_train) + regressor = SparkXGBRegressor(device="cuda", tree_method="exact") + with pytest.raises(ValueError, match="not supported on GPU"): + regressor.fit(reg_data.reg_df_train) + + reg = SparkXGBRegressor(device="cuda", tree_method="gpu_hist") + reg._validate_params() + reg = SparkXGBRegressor(device="cuda") + reg._validate_params() + + clf = SparkXGBClassifier(device="cuda", tree_method="gpu_hist") + clf._validate_params() + clf = SparkXGBClassifier(device="cuda") + clf._validate_params() + class XgboostLocalTest(SparkTestCase): def setUp(self): @@ -1020,55 +1080,6 @@ def test_convert_to_sklearn_model_reg(self) -> None: assert sklearn_regressor.max_depth == 3 assert sklearn_regressor.get_params()["sketch_eps"] == 0.5 - def test_regressor_params_basic(self): - py_reg = SparkXGBRegressor() - self.assertTrue(hasattr(py_reg, "n_estimators")) - self.assertEqual(py_reg.n_estimators.parent, py_reg.uid) - self.assertFalse(hasattr(py_reg, "gpu_id")) - self.assertFalse(hasattr(py_reg, "device")) - self.assertEqual(py_reg.getOrDefault(py_reg.n_estimators), 100) - self.assertEqual(py_reg.getOrDefault(py_reg.objective), "reg:squarederror") - py_reg2 = SparkXGBRegressor(n_estimators=200) - self.assertEqual(py_reg2.getOrDefault(py_reg2.n_estimators), 200) - py_reg3 = py_reg2.copy({py_reg2.max_depth: 10}) - self.assertEqual(py_reg3.getOrDefault(py_reg3.n_estimators), 200) - self.assertEqual(py_reg3.getOrDefault(py_reg3.max_depth), 10) - - def test_classifier_params_basic(self): - py_cls = SparkXGBClassifier() - self.assertTrue(hasattr(py_cls, "n_estimators")) - self.assertEqual(py_cls.n_estimators.parent, py_cls.uid) - self.assertFalse(hasattr(py_cls, "gpu_id")) - self.assertFalse(hasattr(py_cls, "device")) - self.assertEqual(py_cls.getOrDefault(py_cls.n_estimators), 100) - self.assertEqual(py_cls.getOrDefault(py_cls.objective), None) - py_cls2 = SparkXGBClassifier(n_estimators=200) - self.assertEqual(py_cls2.getOrDefault(py_cls2.n_estimators), 200) - py_cls3 = py_cls2.copy({py_cls2.max_depth: 10}) - self.assertEqual(py_cls3.getOrDefault(py_cls3.n_estimators), 200) - self.assertEqual(py_cls3.getOrDefault(py_cls3.max_depth), 10) - - def test_classifier_kwargs_basic(self): - py_cls = SparkXGBClassifier(**self.cls_params_kwargs) - self.assertTrue(hasattr(py_cls, "n_estimators")) - self.assertEqual(py_cls.n_estimators.parent, py_cls.uid) - self.assertFalse(hasattr(py_cls, "gpu_id")) - self.assertFalse(hasattr(py_cls, "device")) - self.assertTrue(hasattr(py_cls, "arbitrary_params_dict")) - expected_kwargs = {"sketch_eps": 0.03} - self.assertEqual( - py_cls.getOrDefault(py_cls.arbitrary_params_dict), expected_kwargs - ) - - # Testing overwritten params - py_cls = SparkXGBClassifier() - py_cls.setParams(x=1, y=2) - py_cls.setParams(y=3, z=4) - xgb_params = py_cls._gen_xgb_params_dict() - assert xgb_params["x"] == 1 - assert xgb_params["y"] == 3 - assert xgb_params["z"] == 4 - def test_param_alias(self): py_cls = SparkXGBClassifier(features_col="f1", label_col="l1") self.assertEqual(py_cls.getOrDefault(py_cls.featuresCol), "f1") @@ -1200,16 +1211,6 @@ def test_num_workers_param(self): classifier = SparkXGBClassifier(num_workers=0) self.assertRaises(ValueError, classifier._validate_params) - def test_use_gpu_param(self): - classifier = SparkXGBClassifier(use_gpu=True, tree_method="exact") - self.assertRaises(ValueError, classifier._validate_params) - regressor = SparkXGBRegressor(use_gpu=True, tree_method="exact") - self.assertRaises(ValueError, regressor._validate_params) - regressor = SparkXGBRegressor(use_gpu=True, tree_method="gpu_hist") - regressor = SparkXGBRegressor(use_gpu=True) - classifier = SparkXGBClassifier(use_gpu=True, tree_method="gpu_hist") - classifier = SparkXGBClassifier(use_gpu=True) - def test_feature_importances(self): reg1 = SparkXGBRegressor(**self.reg_params) model = reg1.fit(self.reg_df_train)