Treelite API

API of Treelite Python package.

Main API

Treelite: a model compiler for decision tree ensembles

Classes:

Annotator()

Branch annotator class: annotate branches in a given model using frequency patterns in the training data

Model([handle])

Decision tree ensemble model

ModelBuilder(num_feature[, num_class, ...])

Builder class for tree ensemble model: provides tools to iteratively build an ensemble of decision trees

Exceptions:

TreeliteError

Error thrown by Treelite

Functions:

create_shared(toolchain, dirpath, *[, ...])

Create shared library.

generate_cmakelists(dirpath[, options])

Generate a CMakeLists.txt for a given directory of headers and sources.

generate_makefile(dirpath, platform, toolchain)

Generate a Makefile for a given directory of headers and sources.

class treelite.Annotator

Branch annotator class: annotate branches in a given model using frequency patterns in the training data

Methods:

annotate_branch(model, dmat[, nthread, verbose])

Annotate branches in a given model using frequency patterns in the training data.

save(path)

Save branch annotation infromation as a JSON file.

annotate_branch(model, dmat, nthread=None, verbose=False)

Annotate branches in a given model using frequency patterns in the training data. Each node gets the count of the instances that belong to it. Any prior annotation information stored in the annotator will be replaced with the new annotation returned by this method.

Parameters:
  • model (object of type Model) – decision tree ensemble model

  • dmat (object of type DMatrix) – data matrix representing the training data

  • nthread (int, optional) – number of threads to use while annotating. If missing, use all physical cores in the system.

  • verbose (bool, optional) – whether to produce extra messages

save(path)

Save branch annotation infromation as a JSON file.

Parameters:

path (str) – location of saved JSON file

class treelite.Model(handle=None)

Decision tree ensemble model

Parameters:

handle (ctypes.c_void_p, optional) – Initial value of model handle

Methods:

compile(dirpath[, params, compiler, verbose])

Generate prediction code from a tree ensemble model.

concatenate(model_objs)

Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.

deserialize(filename)

Deserialize (recover) the model from a checkpoint file in the disk.

deserialize_bytes(model_bytes)

Deserialize (recover) the model from a byte sequence.

dump_as_json(*[, pretty_print])

Dump the model as a JSON string.

export_lib(toolchain, libpath[, params, ...])

Convenience function: Generate prediction code and immediately turn it into a dynamic shared library.

export_srcpkg(platform, toolchain, pkgpath, ...)

Convenience function: Generate prediction code and create a zipped source package for deployment.

from_lightgbm(booster)

Load a tree ensemble model from a LightGBM Booster object

from_xgboost(booster)

Load a tree ensemble model from an XGBoost Booster object

from_xgboost_json(json_str[, ...])

Load a tree ensemble model from a string containing XGBoost JSON

import_from_json(json_str)

Import a tree ensemble model from a JSON string.

load(filename, model_format[, ...])

Load a tree ensemble model from a file

serialize(filename)

Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation.

serialize_bytes()

Serialize (persist) the model to a byte sequence, using a fast binary representation.

set_tree_limit(tree_limit)

Set first n trees to be kept, the remaining ones will be dropped

Attributes:

num_class

Number of classes of the model (1 if the model is not a multi-class classifier

num_feature

Number of features used in the model

num_tree

Number of decision trees in the model

compile(dirpath, params=None, compiler='ast_native', verbose=False)

Generate prediction code from a tree ensemble model. The code will be C99 compliant. One header file (.h) will be generated, along with one or more source files (.c). Use create_shared() method to package prediction code as a dynamic shared library (.so/.dll/.dylib).

Parameters:
  • dirpath (str) – directory to store header and source files

  • params (dict, optional) – parameters for compiler. See this page for the list of compiler parameters.

  • compiler (str, optional) – name of compiler to use

  • verbose (bool, optional) – Whether to print extra messages during compilation

Example

The following populates the directory ./model with source and header files:

model.compile(dirpath='./my/model', params={}, verbose=True)

If parallel compilation is enabled (parameter parallel_comp), the files are in the form of ./my/model/header.h, ./my/model/main.c, ./my/model/tu0.c, ./my/model/tu1.c and so forth, depending on the value of parallel_comp. Otherwise, there will be exactly two files: ./model/header.h, ./my/model/main.c

classmethod concatenate(model_objs)

Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.

Parameters:

model_objs (List[Model]) – List of Model objects

Returns:

model – Concatenated model

Return type:

Model object

Example

concatenated_model = Model.concatenate([model1, model2, model3])
classmethod deserialize(filename)

Deserialize (recover) the model from a checkpoint file in the disk. It is expected that the file was generated by a call to the serialize() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Parameters:

filename (str) – Path to checkpoint

Returns:

model – Recovered model

Return type:

Model object

classmethod deserialize_bytes(model_bytes)

Deserialize (recover) the model from a byte sequence. It is expected that the byte sequence was generated by a call to the serialize_bytes() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Parameters:

model_bytes (bytes) – Byte sequence representing the serialized model

Returns:

model – Recovered model

Return type:

Model object

dump_as_json(*, pretty_print=True)

Dump the model as a JSON string. This is useful for inspecting details of the tree ensemble model.

Note

The operation performed in dump_as_json() is strictly one-way. So the output of dump_as_json() will differ from the JSON string you used in calling import_from_json().

Parameters:

pretty_print (bool, optional) – Whether to pretty-print the JSON string, set this to False to make the string compact

Returns:

json_str – JSON string representing the model

Return type:

str

export_lib(toolchain, libpath, params=None, compiler='ast_native', verbose=False, nthread=None, options=None)

Convenience function: Generate prediction code and immediately turn it into a dynamic shared library. A temporary directory will be created to hold the source files.

Parameters:
  • toolchain (str) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • libpath (str) – location to save the generated dynamic shared library

  • params (dict, optional) – parameters to be passed to the compiler. See this page for the list of compiler parameters.

  • compiler (str, optional) – name of compiler to use in C code generation

  • verbose (bool, optional) – whether to produce extra messages

  • nthread (int, optional) – number of threads to use in creating the shared library. Defaults to the number of cores in the system.

  • options (list of str, optional) – Additional options to pass to toolchain

Example

The one-line command

model.export_lib(toolchain='msvc', libpath='./mymodel.dll',
                 params={}, verbose=True)

is equivalent to the following sequence of commands:

model.compile(dirpath='/temporary/directory', params={}, verbose=True)
treelite.create_shared(toolchain='msvc', dirpath='/temporary/directory',
                       verbose=True)
# move the library out of the temporary directory
shutil.move('/temporary/directory/mymodel.dll', './mymodel.dll')
export_srcpkg(platform, toolchain, pkgpath, libname, params=None, compiler='ast_native', verbose=False, options=None)

Convenience function: Generate prediction code and create a zipped source package for deployment. The resulting zip file will also contain a Makefile.

Parameters:
  • platform (str) – name of the operating system on which the headers and sources shall be compiled. Must be one of the following: ‘windows’ (Microsoft Windows), ‘osx’ (Mac OS X), ‘unix’ (Linux and other UNIX-like systems)

  • toolchain (str) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, ‘gcc’, and ‘cmake’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • pkgpath (str) – location to save the zipped source package

  • libname (str) – name of model shared library to be built

  • params (dict, optional) – parameters to be passed to the compiler. See this page for the list of compiler parameters.

  • compiler (str, optional) – name of compiler to use in C code generation

  • verbose (bool, optional) – whether to produce extra messages

  • nthread (int, optional) – number of threads to use in creating the shared library. Defaults to the number of cores in the system.

  • options (list of str, optional) – Additional options to pass to toolchain

Example

The one-line command

model.export_srcpkg(platform='unix', toolchain='gcc',
                    pkgpath='./mymodel_pkg.zip', libname='mymodel.so',
                    params={}, verbose=True)

is equivalent to the following sequence of commands:

model.compile(dirpath='/temporary/directory/mymodel',
              params={}, verbose=True)
generate_makefile(dirpath='/temporary/directory/mymodel',
                  platform='unix', toolchain='gcc')
# zip the directory containing C code and Makefile
shutil.make_archive(base_name=pkgpath, format='zip',
                    root_dir='/temporary/directory',
                    base_dir='mymodel/')
classmethod from_lightgbm(booster)

Load a tree ensemble model from a LightGBM Booster object

Parameters:

booster (object of type lightgbm.Booster) – Python handle to LightGBM model

Returns:

model – loaded model

Return type:

Model object

Example

bst = lightgbm.train(params, dtrain, 10, valid_sets=[dtrain],
                     valid_names=['train'])
tl_model = Model.from_lightgbm(bst)
classmethod from_xgboost(booster)

Load a tree ensemble model from an XGBoost Booster object

Parameters:

booster (object of type xgboost.Booster) – Python handle to XGBoost model

Returns:

model – loaded model

Return type:

Model object

Example

bst = xgboost.train(params, dtrain, 10, [(dtrain, 'train')])
tl_model = Model.from_xgboost(bst)
classmethod from_xgboost_json(json_str, allow_unknown_field=False)

Load a tree ensemble model from a string containing XGBoost JSON

Parameters:
  • json_str (bytearray | str) – a string specifying an XGBoost model in the XGBoost JSON format

  • allow_unknown_field (bool, optional) – Whether to allow extra fields with unrecognized keys

Returns:

model – loaded model

Return type:

Model object

Example

bst = xgboost.train(params, dtrain, 10, [(dtrain, 'train')])
bst.save_model('model.json')
with open('model.json') as file_:
    json_str = file_.read()
tl_model = Model.from_xgboost_json(json_str)
classmethod import_from_json(json_str)

Import a tree ensemble model from a JSON string.

See Specifying models using JSON string for details.

Note

import_from_json() is strict about which JSON strings to accept

Some tree libraries let users to export models as JSON strings, but in general import_from_json() will not accept them. See the warning at the top of Specifying models using JSON string.

Note

The operation performed in import_from_json() is strictly one-way. So the output of dump_as_json() will differ from the JSON string you used in calling import_from_json().

Parameters:

json_str (str) – JSON string representing a tree ensemble model

Returns:

model – Imported model

Return type:

Model object

classmethod load(filename, model_format, allow_unknown_field=False)

Load a tree ensemble model from a file

Note

To load scikit-learn models, use import_model() instead.

Parameters:
  • filename (str) – Path to model file

  • model_format (str) – Model file format. Must be “xgboost”, “xgboost_json”, or “lightgbm”

  • allow_unknown_field (bool, optional) – Whether to allow extra fields with unrecognized keys. This flag is only applicable if model_format = “xgboost_json”

Returns:

model – loaded model

Return type:

Model object

Example

xgb_model = Model.load('xgboost_model.model', 'xgboost')
property num_class

Number of classes of the model (1 if the model is not a multi-class classifier

property num_feature

Number of features used in the model

property num_tree

Number of decision trees in the model

serialize(filename)

Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation. To recover the model from the checkpoint, use deserialize() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Parameters:

filename (str) – Path to checkpoint

serialize_bytes()

Serialize (persist) the model to a byte sequence, using a fast binary representation. To recover the model from the byte sequence, use deserialize_bytes() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Return type:

bytes

set_tree_limit(tree_limit)

Set first n trees to be kept, the remaining ones will be dropped

class treelite.ModelBuilder(num_feature, num_class=1, average_tree_output=False, threshold_type='float32', leaf_output_type='float32', **kwargs)

Builder class for tree ensemble model: provides tools to iteratively build an ensemble of decision trees

Parameters:
  • num_feature (int) – number of features used in model being built. We assume that all feature indices are between 0 and (num_feature - 1)

  • num_class (int, optional) – number of output groups; >1 indicates multiclass classification

  • average_tree_output (bool, optional) – whether the model is a random forest; True indicates a random forest and False indicates gradient boosted trees

  • **kwargs – model parameters, to be used to specify the resulting model. Refer to this page for the full list of model parameters.

Classes:

Node()

Handle to a node in a tree

Tree([threshold_type, leaf_output_type])

Handle to a decision tree in a tree ensemble Builder

Value(init_value, dtype)

Value whose type may be specified at runtime

Methods:

append(tree)

Add a tree at the end of the ensemble

commit()

Finalize the ensemble model

insert(index, tree)

Insert a tree at specified location in the ensemble

class Node

Handle to a node in a tree

Methods:

set_categorical_test_node(feature_id, ...)

Set the node as a test node with categorical split.

set_leaf_node(leaf_value[, leaf_value_type])

Set the node as a leaf node

set_numerical_test_node(feature_id, opname, ...)

Set the node as a test node with numerical split.

set_root()

Set the node as the root

set_categorical_test_node(feature_id, left_categories, default_left, left_child_key, right_child_key)

Set the node as a test node with categorical split. A list defines all categories that would be classified as the left side. Categories are integers ranging from 0 to n-1, where n is the number of categories in that particular feature.

Parameters:
  • feature_id (int) – feature index

  • left_categories (list of int) – list of categories belonging to the left child.

  • default_left (bool) – default direction for missing values (True for left; False for right)

  • left_child_key (int) – unique integer key to identify the left child node

  • right_child_key (int) – unique integer key to identify the right child node

set_leaf_node(leaf_value, leaf_value_type='float32')

Set the node as a leaf node

Parameters:
  • leaf_value (float / list of float) – Usually a single leaf value (weight) of the leaf node. For multiclass random forest classifier, leaf_value should be a list of leaf weights.

  • leaf_value_type (str) – Data type used for leaf_value (e.g. ‘float32’)

set_numerical_test_node(feature_id, opname, threshold, default_left, left_child_key, right_child_key, threshold_type='float32')

Set the node as a test node with numerical split. The test is in the form [feature value] OP [threshold]. Depending on the result of the test, either left or right child would be taken.

Parameters:
  • feature_id (int) – feature index

  • opname (str) – binary operator to use in the test

  • threshold (float) – threshold value

  • default_left (bool) – default direction for missing values (True for left; False for right)

  • left_child_key (int) – unique integer key to identify the left child node

  • right_child_key (int) – unique integer key to identify the right child node

  • threshold_type (str) – data type for threshold value (e.g. ‘float32’)

set_root()

Set the node as the root

class Tree(threshold_type='float32', leaf_output_type='float32')

Handle to a decision tree in a tree ensemble Builder

class Value(init_value, dtype)

Value whose type may be specified at runtime

Parameters:

dtype (str) – Initial value of model handle

append(tree)

Add a tree at the end of the ensemble

Parameters:

tree (Tree object) – tree to be added

Example

builder = ModelBuilder(num_feature=4227)
tree = ...               # build tree somehow
builder.append(tree)     # add tree at the end of the ensemble
commit()

Finalize the ensemble model

Returns:

model – finished model

Return type:

Model object

Example

builder = ModelBuilder(num_feature=4227)
for i in range(100):
  tree = ...                    # build tree somehow
  builder.append(tree)          # add one tree at a time

model = builder.commit()        # now get a Model object
model.compile(dirpath='test')   # compile model into C code
insert(index, tree)

Insert a tree at specified location in the ensemble

Parameters:
  • index (int) – index of the element before which to insert the tree

  • tree (Tree object) – tree to be inserted

Example

builder = ModelBuilder(num_feature=4227)
tree = ...               # build tree somehow
builder.insert(0, tree)  # insert tree at index 0
exception treelite.TreeliteError

Error thrown by Treelite

treelite.create_shared(toolchain, dirpath, *, nthread=None, verbose=False, options=None, long_build_time_warning=True)

Create shared library.

Parameters:
  • toolchain (str) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • dirpath (str) – directory containing the header and source files previously generated by Model.compile(). The directory must contain recipe.json which specifies build dependencies.

  • nthread (int, optional) – number of threads to use in creating the shared library. Defaults to the number of cores in the system.

  • verbose (bool, optional) – whether to produce extra messages

  • options (list of str, optional) – Additional options to pass to toolchain

  • long_build_time_warning (bool, optional) – If set to False, suppress the warning about potentially long build time

Returns:

libpath – absolute path of created shared library

Return type:

str

Example

The following command uses Visual C++ toolchain to generate ./my/model/model.dll:

model.compile(dirpath='./my/model', params={}, verbose=True)
create_shared(toolchain='msvc', dirpath='./my/model', verbose=True)

Later, the shared library can be referred to by its directory name:

predictor = Predictor(libpath='./my/model', verbose=True)
# looks for ./my/model/model.dll

Alternatively, one may specify the library down to its file name:

predictor = Predictor(libpath='./my/model/model.dll', verbose=True)
treelite.generate_cmakelists(dirpath, options=None)

Generate a CMakeLists.txt for a given directory of headers and sources. The resulting CMakeLists.txt will be stored in the directory. This function is useful for deploying a model on a different machine.

Parameters:
  • dirpath (str) – directory containing the header and source files previously generated by Model.compile(). The directory must contain recipe.json which specifies build dependencies.

  • options (list of str, optional) – Additional options to pass to toolchain

treelite.generate_makefile(dirpath, platform, toolchain, options=None)

Generate a Makefile for a given directory of headers and sources. The resulting Makefile will be stored in the directory. This function is useful for deploying a model on a different machine.

Parameters:
  • dirpath (str) – directory containing the header and source files previously generated by Model.compile(). The directory must contain recipe.json which specifies build dependencies.

  • platform (str) – name of the operating system on which the headers and sources shall be compiled. Must be one of the following: ‘windows’ (Microsoft Windows), ‘osx’ (Mac OS X), ‘unix’ (Linux and other UNIX-like systems)

  • toolchain (str) – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • options (list of str, optional) – Additional options to pass to toolchain

Scikit-learn importer

Converter to ingest scikit-learn models into Treelite

Functions:

import_model(sklearn_model)

Load a tree ensemble model from a scikit-learn model object

import_model_with_model_builder(sklearn_model)

Load a tree ensemble model from a scikit-learn model object using the model builder API.

treelite.sklearn.import_model(sklearn_model)

Load a tree ensemble model from a scikit-learn model object

Note

For ‘IsolationForest’, it will calculate the outlier score using the standardized ratio as proposed in the original reference, which matches with ‘IsolationForest._compute_chunked_score_samples’ but is a bit different from ‘IsolationForest.decision_function’.

Parameters:

sklearn_model (object of type RandomForestRegressor / RandomForestClassifier / ExtraTreesRegressor / ExtraTreesClassifier / GradientBoostingRegressor / GradientBoostingClassifier / HistGradientBoostingRegressor / HistGradientBoostingClassifier / IsolationForest) – Python handle to scikit-learn model

Returns:

model – loaded model

Return type:

Model object

Example

import sklearn.datasets
import sklearn.ensemble
X, y = sklearn.datasets.load_boston(return_X_y=True)
clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10)
clf.fit(X, y)

import treelite.sklearn
model = treelite.sklearn.import_model(clf)

Notes

This function does not yet support categorical splits in HistGradientBoostingRegressor and HistGradientBoostingClassifier. If you are using either estimator types, make sure that all test nodes have numerical test conditions.

treelite.sklearn.import_model_with_model_builder(sklearn_model)

Load a tree ensemble model from a scikit-learn model object using the model builder API.

Note

Use import_model for production use

This function exists to demonstrate the use of the model builder API and is slow with large models. For production, please use import_model() which is significantly faster.

Parameters:

sklearn_model (object of type RandomForestRegressor / RandomForestClassifier / ExtraTreesRegressor / ExtraTreesClassifier / GradientBoostingRegressor / GradientBoostingClassifier) – Python handle to scikit-learn model

Returns:

model – loaded model

Return type:

Model object

Example

import sklearn.datasets
import sklearn.ensemble
X, y = sklearn.datasets.load_boston(return_X_y=True)
clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10)
clf.fit(X, y)

import treelite.sklearn
model = treelite.sklearn.import_model_with_model_builder(clf)