Treelite API

API of Treelite Python package.

Treelite: a model compiler for decision tree ensembles

class treelite.DMatrix(data, data_format=None, missing=None, feature_names=None, feature_types=None, verbose=False, nthread=None)

Data matrix used in treelite.

Parameters
  • data – Data source. When data is str type, it indicates that data should be read from a file.

  • data_format – Format of input data file. Applicable only when data is read from a file. If missing, the svmlight (.libsvm) format is assumed.

  • missing – Value in the data that represents a missing entry. If set to None, numpy.nan will be used.

  • verbose – Whether to print extra messages during construction

  • feature_names – Human-readable names for features

  • feature_types – Types for features

  • nthread – Number of threads

Type data

str / numpy.ndarray / scipy.sparse.csr_matrix / pandas.DataFrame

Type data_format

str, optional

Type missing

float, optional

Type verbose

bool, optional

Type feature_names

list, optional

Type feature_types

list, optional

Type nthread

int, optional

class treelite.Model(handle=None)

Decision tree ensemble model

Parameters
  • handle – Initial value of model handle

Type handle

ctypes.c_void_p, optional

compile(dirpath, params=None, compiler='ast_native', verbose=False)

Generate prediction code from a tree ensemble model. The code will be C99 compliant. One header file (.h) will be generated, along with one or more source files (.c). Use create_shared() method to package prediction code as a dynamic shared library (.so/.dll/.dylib).

Parameters
  • dirpath – directory to store header and source files

  • params – parameters for compiler. See this page for the list of compiler parameters.

  • compiler – name of compiler to use

  • verbose – Whether to print extra messages during compilation

Type dirpath

str

Type params

dict, optional

Type compiler

str, optional

Type verbose

bool, optional

Example

The following populates the directory ./model with source and header files:

model.compile(dirpath='./my/model', params={}, verbose=True)

If parallel compilation is enabled (parameter parallel_comp), the files are in the form of ./my/model/header.h, ./my/model/main.c, ./my/model/tu0.c, ./my/model/tu1.c and so forth, depending on the value of parallel_comp. Otherwise, there will be exactly two files: ./model/header.h, ./my/model/main.c

export_lib(toolchain, libpath, params=None, compiler='ast_native', verbose=False, nthread=None, options=None)

Convenience function: Generate prediction code and immediately turn it into a dynamic shared library. A temporary directory will be created to hold the source files.

Parameters
  • toolchain – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • libpath – location to save the generated dynamic shared library

  • params – parameters to be passed to the compiler. See this page for the list of compiler parameters.

  • compiler – name of compiler to use in C code generation

  • verbose – whether to produce extra messages

  • nthread – number of threads to use in creating the shared library. Defaults to the number of cores in the system.

  • options – Additional options to pass to toolchain

Type toolchain

str

Type libpath

str

Type params

dict, optional

Type compiler

str, optional

Type verbose

bool, optional

Type nthread

int, optional

Type options

list of str, optional

Example

The one-line command

model.export_lib(toolchain='msvc', libpath='./mymodel.dll',
                 params={}, verbose=True)

is equivalent to the following sequence of commands:

model.compile(dirpath='/temporary/directory', params={}, verbose=True)
treelite.create_shared(toolchain='msvc', dirpath='/temporary/directory',
              verbose=True)
# move the library out of the temporary directory
shutil.move('/temporary/directory/mymodel.dll', './mymodel.dll')
export_protobuf(filename)

Export a tree ensemble model as a Protocol Buffers format. Protocol Buffers (google/protobuf) is a language- and platform-neutral mechanism for serializing structured data. See src/tree.proto for format spec.

Parameters
  • filename – path to save Protocol Buffers output

Type filename

str

Example

model.export_protobuf('./my.buffer')
export_srcpkg(platform, toolchain, pkgpath, libname, params=None, compiler='ast_native', verbose=False, options=None)

Convenience function: Generate prediction code and create a zipped source package for deployment. The resulting zip file will also contain a Makefile.

Parameters
  • platform – name of the operating system on which the headers and sources shall be compiled. Must be one of the following: ‘windows’ (Microsoft Windows), ‘osx’ (Mac OS X), ‘unix’ (Linux and other UNIX-like systems)

  • toolchain – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • pkgpath – location to save the zipped source package

  • libname – name of model shared library to be built

  • params – parameters to be passed to the compiler. See this page for the list of compiler parameters.

  • compiler – name of compiler to use in C code generation

  • verbose – whether to produce extra messages

  • nthread – number of threads to use in creating the shared library. Defaults to the number of cores in the system.

  • options – Additional options to pass to toolchain

Type platform

str

Type toolchain

str

Type pkgpath

str

Type libname

str

Type params

dict, optional

Type compiler

str, optional

Type verbose

bool, optional

Type nthread

int, optional

Type options

list of str, optional

Example

The one-line command

model.export_srcpkg(platform='unix', toolchain='gcc',
                    pkgpath='./mymodel_pkg.zip', libname='mymodel.so',
                    params={}, verbose=True)

is equivalent to the following sequence of commands:

model.compile(dirpath='/temporary/directory/mymodel',
              params={}, verbose=True)
generate_makefile(dirpath='/temporary/directory/mymodel',
                  platform='unix', toolchain='gcc')
# zip the directory containing C code and Makefile
shutil.make_archive(base_name=pkgpath, format='zip',
                    root_dir='/temporary/directory',
                    base_dir='mymodel/')
classmethod from_xgboost(booster)

Load a tree ensemble model from an XGBoost Booster object

Parameters
  • booster – Python handle to XGBoost model

Type booster

object of type xgboost.Booster

Returns

model – loaded model

Rtype

Model object

Example

bst = xgboost.train(params, dtrain, 10, [(dtrain, 'train')])
xgb_model = Model.from_xgboost(bst)
classmethod load(filename, model_format)

Load a tree ensemble model from a file

Parameters
  • filename – path to model file

  • model_format – model file format. Must be one or ‘xgboost’, ‘lightgbm’, ‘protobuf’

Type filename

str

Type model_format

str

Returns

model – loaded model

Rtype

Model object

Example

xgb_model = Model.load('xgboost_model.model', 'xgboost')
property num_feature

Number of features used in the model

property num_output_group

Number of output groups of the model

property num_tree

Number of decision trees in the model

set_tree_limit(n)

Set first n trees to be kept, the remaining ones will be dropped

class treelite.ModelBuilder(num_feature, num_output_group=1, random_forest=False, **kwargs)

Builder class for tree ensemble model: provides tools to iteratively build an ensemble of decision trees

Parameters
  • num_feature – number of features used in model being built. We assume that all feature indices are between 0 and (num_feature - 1)

  • num_output_group – number of output groups; >1 indicates multiclass classification

  • random_forest – whether the model is a random forest; True indicates a random forest and False indicates gradient boosted trees

  • **kwargs – model parameters, to be used to specify the resulting model. Refer to this page for the full list of model parameters.

Type num_feature

int

Type num_output_group

int, optional

Type random_forest

bool, optional

class Node

Handle to a node in a tree

set_categorical_test_node(feature_id, left_categories, default_left, left_child_key, right_child_key)

Set the node as a test node with categorical split. A list defines all categories that would be classified as the left side. Categories are integers ranging from 0 to n-1, where n is the number of categories in that particular feature.

Parameters
  • feature_id – feature index

  • left_categories – list of categories belonging to the left child.

  • default_left – default direction for missing values (True for left; False for right)

  • left_child_key – unique integer key to identify the left child node

  • right_child_key – unique integer key to identify the right child node

Type feature_id

int

Type left_categories

list of int

Type default_left

bool

Type left_child_key

int

Type right_child_key

int

set_leaf_node(leaf_value)

Set the node as a leaf node

Parameters
  • leaf_value – Usually a single leaf value (weight) of the leaf node. For multiclass random forest classifier, leaf_value should be a list of leaf weights.

Type leaf_value

float / list of float

set_numerical_test_node(feature_id, opname, threshold, default_left, left_child_key, right_child_key)

Set the node as a test node with numerical split. The test is in the form [feature value] OP [threshold]. Depending on the result of the test, either left or right child would be taken.

Parameters
  • feature_id – feature index

  • opname – binary operator to use in the test

  • threshold – threshold value

  • default_left – default direction for missing values (True for left; False for right)

  • left_child_key – unique integer key to identify the left child node

  • right_child_key – unique integer key to identify the right child node

Type feature_id

int

Type opname

str

Type threshold

float

Type default_left

bool

Type left_child_key

int

Type right_child_key

int

set_root()

Set the node as the root

class Tree

Handle to a decision tree in a tree ensemble Builder

append(tree)

Add a tree at the end of the ensemble

Parameters
  • tree – tree to be added

Type tree

Tree object

Example

builder = ModelBuilder(num_feature=4227)
tree = ...               # build tree somehow
builder.append(tree)     # add tree at the end of the ensemble
commit()

Finalize the ensemble model

Returns

model – finished model

Rtype

Model object

Example

builder = ModelBuilder(num_feature=4227)
for i in range(100):
  tree = ...                    # build tree somehow
  builder.append(tree)          # add one tree at a time

model = builder.commit()        # now get a Model object
model.compile(dirpath='test')   # compile model into C code
insert(index, tree)

Insert a tree at specified location in the ensemble

Parameters
  • index – index of the element before which to insert the tree

  • tree – tree to be inserted

Type index

int

Type tree

Tree object

Example

builder = ModelBuilder(num_feature=4227)
tree = ...               # build tree somehow
builder.insert(0, tree)  # insert tree at index 0
class treelite.Annotator

Branch annotator class: annotate branches in a given model using frequency patterns in the training data

annotate_branch(model, dmat, nthread=None, verbose=False)

Annotate branches in a given model using frequency patterns in the training data. Each node gets the count of the instances that belong to it. Any prior annotation information stored in the annotator will be replaced with the new annotation returned by this method.

Parameters
  • model – decision tree ensemble model

  • dmat – data matrix representing the training data

  • nthread – number of threads to use while annotating. If missing, use all physical cores in the system.

  • verbose – whether to produce extra messages

Type model

object of type Model

Type dmat

object of type DMatrix

Type nthread

int, optional

Type verbose

bool, optional

save(path)

Save branch annotation infromation as a JSON file.

Parameters
  • path – location of saved JSON file

Type path

str

treelite.create_shared(toolchain, dirpath, nthread=None, verbose=False, options=None)

Create shared library.

Parameters
  • toolchain – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • dirpath – directory containing the header and source files previously generated by Model.compile(). The directory must contain recipe.json which specifies build dependencies.

  • nthread – number of threads to use in creating the shared library. Defaults to the number of cores in the system.

  • verbose – whether to produce extra messages

  • options – Additional options to pass to toolchain

Type toolchain

str

Type dirpath

str

Type nthread

int, optional

Type verbose

bool, optional

Type options

list of str, optional

Returns

libpath – absolute path of created shared library

Rtype

str

Example

The following command uses Visual C++ toolchain to generate ./my/model/model.dll:

model.compile(dirpath='./my/model', params={}, verbose=True)
create_shared(toolchain='msvc', dirpath='./my/model', verbose=True)

Later, the shared library can be referred to by its directory name:

predictor = Predictor(libpath='./my/model', verbose=True)
# looks for ./my/model/model.dll

Alternatively, one may specify the library down to its file name:

predictor = Predictor(libpath='./my/model/model.dll', verbose=True)
treelite.save_runtime_package(destdir)

Save a copy of the (zipped) runtime package, containing all glue code necessary to deploy compiled models into the wild

Parameters
  • destdir – directory to save the zipped package

Type destdir

str

treelite.generate_makefile(dirpath, platform, toolchain, options=None)

Generate a Makefile for a given directory of headers and sources. The resulting Makefile will be stored in the directory. This function is useful for deploying a model on a different machine.

Parameters
  • dirpath – directory containing the header and source files previously generated by Model.compile(). The directory must contain recipe.json which specifies build dependencies.

  • platform – name of the operating system on which the headers and sources shall be compiled. Must be one of the following: ‘windows’ (Microsoft Windows), ‘osx’ (Mac OS X), ‘unix’ (Linux and other UNIX-like systems)

  • toolchain – which toolchain to use. You may choose one of ‘msvc’, ‘clang’, and ‘gcc’. You may also specify a specific variation of clang or gcc (e.g. ‘gcc-7’)

  • options – Additional options to pass to toolchain

Type dirpath

str

Type platform

str

Type toolchain

str

Type options

list of str, optional


treelite.gallery.sklearn.import_model(sklearn_model)

Load a tree ensemble model from a scikit-learn model object

Parameters
  • sklearn_model – Python handle to scikit-learn model

Type sklearn_model

object of type RandomForestRegressor / RandomForestClassifier / GradientBoostingRegressor / GradientBoostingClassifier

Returns

model – loaded model

Rtype

Model object

Example

import sklearn.datasets
import sklearn.ensemble
X, y = sklearn.datasets.load_boston(return_X_y=True)
clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10)
clf.fit(X, y)

import treelite.gallery.sklearn
model = treelite.gallery.sklearn.import_model(clf)