Changes

This version represents a major change from the last release (v0.6), which was released one year and half ago.
Updated Sklearn API
- Add compatibility layer for scikit-learn v0.18: sklearn.cross_validation now deprecated
- Updated to allow use of all XGBoost parameters via **kwargs.
- Updated nthread to n_jobs and seed to random_state (as per Sklearn convention); nthread and seed are now marked as deprecated
- Updated to allow choice of Booster (gbtree, gblinear, or dart)
- XGBRegressor now supports instance weights (specify sample_weight parameter)
- Pass n_jobs parameter to the DMatrix constructor
- Add xgb_model parameter to fit method, to allow continuation of training
Refactored gbm to allow more friendly cache strategy
- Specialized some prediction routine
Robust DMatrix construction from a sparse matrix
Faster consturction of DMatrix from 2D NumPy matrices: elide copies, use of multiple threads
Automatically remove nan from input data when it is sparse.
- This can solve some of user reported problem of istart != hist.size
Fix the single-instance prediction function to obtain correct predictions
Minor fixes
- Thread local variable is upgraded so it is automatically freed at thread exit.
- Fix saving and loading count::poisson models
- Fix CalcDCG to use base-2 logarithm
- Messages are now written to stderr instead of stdout
- Keep built-in evaluations while using customized evaluation functions
- Use bst_float consistently to minimize type conversion
- Copy the base margin when slicing DMatrix
- Evaluation metrics are now saved to the model file
- Use int32_t explicitly when serializing version
- In distributed training, synchronize the number of features after loading a data matrix.
Migrate to C++11
- The current master version now requires C++11 enabled compiled(g++4.8 or higher)
Predictor interface was factored out (in a manner similar to the updater interface).
Makefile support for Solaris and ARM
Test code coverage using Codecov
Add CPP tests
Add Dockerfile and Jenkinsfile to support continuous integration for GPU code
New functionality
- Ability to adjust tree model's statistics to a new dataset without changing tree structures.
- Ability to extract feature contributions from individual predictions, as described in here and here.
- Faster, histogram-based tree algorithm (tree_method='hist') .
- GPU/CUDA accelerated tree algorithms (tree_method='gpu_hist' or 'gpu_exact'), including the GPU-based predictor.
- Monotonic constraints: when other features are fixed, force the prediction to be monotonic increasing with respect to a certain specified feature.
- Faster gradient caculation using AVX SIMD
- Ability to export models in JSON format
- Support for Tweedie regression
- Additional dropout options for DART: binomial+1, epsilon
- Ability to update an existing model in-place: this is useful for many applications, such as determining feature importance
Python package:
- New parameters:
  - learning_rates in cv()
  - shuffle in mknfold()
  - max_features and show_values in plot_importance()
  - sample_weight in XGBRegressor.fit()
- Support binary wheel builds
- Fix MultiIndex detection to support Pandas 0.21.0 and higher
- Support metrics and evaluation sets whose names contain -
- Support feature maps when plotting trees
- Compatibility fix for Python 2.6
- Call print_evaluation callback at last iteration
- Use appropriate integer types when calling native code, to prevent truncation and memory error
- Fix shared library loading on Mac OS X
R package:
- New parameters:
  - silent in xgb.DMatrix()
  - use_int_id in xgb.model.dt.tree()
  - predcontrib in predict()
  - monotone_constraints in xgb.train()
- Default value of the save_period parameter in xgboost() changed to NULL (consistent with xgb.train()).
- It's possible to custom-build the R package with GPU acceleration support.
- Enable JVM build for Mac OS X and Windows
- Integration with AppVeyor CI
- Improved safety for garbage collection
- Store numeric attributes with higher precision
- Easier installation for devel version
- Improved xgb.plot.tree()
- Various minor fixes to improve user experience and robustness
- Register native code to pass CRAN check
- Updated CRAN submission
JVM packages
- Add Spark pipeline persistence API
- Fix data persistence: loss evaluation on test data had wrongly used caches for training data.
- Clean external cache after training
- Implement early stopping
- Enable training of multiple models by distinguishing stage IDs
- Better Spark integration: support RDD / dataframe / dataset, integrate with Spark ML package
- XGBoost4j now supports ranking task
- Support training with missing data
- Refactor JVM package to separate regression and classification models to be consistent with other machine learning libraries
- Support XGBoost4j compilation on Windows
- Parameter tuning tool
- Publish source code for XGBoost4j to maven local repo
- Scala implementation of the Rabit tracker (drop-in replacement for the Java implementation)
- Better exception handling for the Rabit tracker
- Persist num_class, number of classes (for classification task)
- XGBoostModel now holds BoosterParams
- libxgboost4j is now part of CMake build
- Release DMatrix when no longer needed, to conserve memory
- Expose baseMargin, to allow initialization of boosting with predictions from an external model
- Support instance weights
- Use SparkParallelismTracker to prevent jobs from hanging forever
- Expose train-time evaluation metrics via XGBoostModel.summary
- Option to specify host-ip explicitly in the Rabit tracker
Documentation
- Better math notation for gradient boosting
- Updated build instructions for Mac OS X
- Template for GitHub issues
- Add CITATION file for citing XGBoost in scientific writing
- Fix dropdown menu in xgboost.readthedocs.io
- Document updater_seq parameter
- Style fixes for Python documentation
- Links to additional examples and tutorials
- Clarify installation requirements
Changes that break backward compatibility
- #1519 XGBoost-spark no longer contains APIs for DMatrix; use the public booster interface instead.
- #2476 XGBoostModel.predict() now has a different signature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This is a stable release of 0.7 version

Changes