Treelite API

API of Treelite Python package.

Model loaders 

Functions to load and build model objects

Functions:

`load_xgboost_model_legacy_binary`(filename)	Load a tree ensemble model from XGBoost model, stored using the legacy binary format.
`load_xgboost_model`(filename, *[, ...])	Load a tree ensemble model from XGBoost model, stored using JSON or UBJSON format.
`load_lightgbm_model`(filename)	Load a tree ensemble model from a LightGBM model file.
`from_xgboost`(booster)	Load a tree ensemble model from an XGBoost Booster object
`from_xgboost_json`(model_json_str, *[, ...])	Load a tree ensemble model from a string containing XGBoost JSON
`from_xgboost_ubjson`(model_ubjson_str, *[, ...])	Load a XGBoost model from a byte sequence containing UBJSON
`from_lightgbm`(booster)	Load a tree ensemble model from a LightGBM Booster object

treelite.frontend.load_xgboost_model_legacy_binary(filename)

Load a tree ensemble model from XGBoost model, stored using the legacy binary format. Note: new XGBoost models should be stored in the JSON format, to take advantage of the latest functionalities of XGBoost.

Parameters:: filename (str | Path) – Path to model file
Returns:: model – Loaded model
Return type:: Model

Example

xgb_model = treelite.frontend.load_xgboost_model_legacy_binary(
    "xgboost_model.model")

treelite.frontend.load_xgboost_model(filename, *, format_choice='use_suffix', allow_unknown_field=False)

Load a tree ensemble model from XGBoost model, stored using JSON or UBJSON format.

Parameters:

filename (str | Path) – Path to model file
format_choice (str) –
Method to select the model format
- use_suffix (default): Use the suffix of the file name (also known as file extension) to detect the format. Files whose names end with .json will be parsed as JSON; all other files will be parsed as UBJSON.
- inspect: Inspect the first few bytes of the file to heuristically determine whether the file is JSON or UBJSON.
- ubjson: Parse the file as UBJSON.
- json: Parse the file as JSON.
allow_unknown_field (bool) – Whether to allow extra fields with unrecognized keys

Returns:

model – Loaded model

Return type:

Model

Example

xgb_model = treelite.frontend.load_xgboost_model("xgboost_model.json")

treelite.frontend.load_lightgbm_model(filename)

Load a tree ensemble model from a LightGBM model file.

Parameters:: filename (str | Path) – Path to model file
Returns:: model – Loaded model
Return type:: Model

Example

lgb_model = treelite.frontend.load_lightgbm_model("lightgbm_model.txt")

treelite.frontend.from_xgboost(booster)

Load a tree ensemble model from an XGBoost Booster object

Parameters:: booster (Object of type xgboost.Booster) – Python handle to XGBoost model
Returns:: model – Loaded model
Return type:: Model

treelite.frontend.from_xgboost_json(model_json_str, *, allow_unknown_field=False)

Load a tree ensemble model from a string containing XGBoost JSON

Parameters:

model_json_str (bytes | bytearray | str) – A string specifying an XGBoost model in the XGBoost JSON format
allow_unknown_field (bool) – Whether to allow extra fields with unrecognized keys

Returns:

model – Loaded model

Return type:

Model

treelite.frontend.from_xgboost_ubjson(model_ubjson_str, *, allow_unknown_field=False)

Load a XGBoost model from a byte sequence containing UBJSON

Parameters:

model_ubjson_str (bytes | bytearray) – A byte sequence specifying an XGBoost model in the UBJSON format
allow_unknown_field (bool) – Whether to allow extra fields with unrecognized keys

Returns:

model – Loaded model

Return type:

Model

treelite.frontend.from_lightgbm(booster)

Load a tree ensemble model from a LightGBM Booster object

Parameters:: booster (object of type lightgbm.Booster) – Python handle to LightGBM model
Returns:: model – Loaded model
Return type:: Model

Scikit-learn importer 

Model loader to ingest scikit-learn models into Treelite

Functions:

`import_model`(sklearn_model)	Load a tree ensemble model from a scikit-learn model object
`export_model`(model)	Export a model as a scikit-learn RandomForest.
`import_model_with_model_builder`(sklearn_model)	This function was removed in Treelite 4.0; please use `import_model()` instead.

treelite.sklearn.import_model(sklearn_model)

Load a tree ensemble model from a scikit-learn model object

Note

For IsolationForest, the loaded model will calculate the outlier score using the standardized ratio as proposed in the original reference, which matches with score_samples() but is a bit different from decision_function().

More precisely, the following relation holds:

treelite.gtil.predict(tl_model, X) == -clf.score_samples(X)
# clf is an IsolationForest
# tl_model is a Treelite representation of clf

Parameters:: sklearn_model (object of type RandomForestRegressor / RandomForestClassifier / ExtraTreesRegressor / ExtraTreesClassifier / GradientBoostingRegressor / GradientBoostingClassifier / HistGradientBoostingRegressor / HistGradientBoostingClassifier / IsolationForest) – Python handle to scikit-learn model
Returns:: model – Loaded model
Return type:: Model

Example

import sklearn.datasets
import sklearn.ensemble
X, y = sklearn.datasets.load_diabetes(return_X_y=True)
clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10)
clf.fit(X, y)

import treelite.sklearn
model = treelite.sklearn.import_model(clf)

treelite.sklearn.export_model(model)

Export a model as a scikit-learn RandomForest.

Note

Currently only random forests can be exported as scikit-learn model objects. Support for gradient boosted trees and other kinds of tree models will be added in the future.

Parameters:: model (Model) – Treelite mobel to export
Returns:: sklearn_model – Scikit-learn model
Return type:: object of type RandomForestRegressor / RandomForestClassifier / GradientBoostingRegressor / GradientBoostingClassifier

treelite.sklearn.import_model_with_model_builder(sklearn_model): This function was removed in Treelite 4.0; please use import_model() instead.

Model builder 

Treelite Model builder class

Classes:

`Metadata`(num_feature, task_type, ...)	Metadata object, consisting of metadata information about the model at large.
`TreeAnnotation`(num_tree, target_id, class_id)	Annotation for individual trees.
`PostProcessorFunc`(name[, sigmoid_alpha, ratio_c])	Specification for postprocessor of prediction outputs
`ModelBuilder`(*, threshold_type, ...[, ...])	Model builder class, to iteratively build a tree ensemble model.

class treelite.model_builder.Metadata(num_feature, task_type, average_tree_output, num_target, num_class, leaf_vector_shape)

Metadata object, consisting of metadata information about the model at large.

Parameters:

num_feature (int) – Number of features used in the model. We assume that all feature indices are between 0 and num_feature - 1.
task_type (str) – Task type. Can be one of kBinaryClf, kRegressor, kMultiClf, kLearningToRank, or kIsolationForest.
average_tree_output (bool) – Whether to average outputs of trees
num_target (int) – Number of targets
num_class (List[int]) – Number of classes. num_class[i] is the number of classes of target i.
leaf_vector_shape (Tuple[int, int]) – Shape of the output from each leaf node

Methods:

asdict()

Convert to dictionary

asdict()

Convert to dictionary

Return type:: Dict[str, Any]

class treelite.model_builder.TreeAnnotation(num_tree, target_id, class_id)

Annotation for individual trees. Use this object to look up which target and class each tree is associated with.

The output of each target / class is obtained by summing the outputs of all trees that are associated with that target / class. target_id[i] indicates the target the i-th tree is associated with. (-1 indicates that the tree is a multi-target tree, whose output gets counted for all targets.) class_id[i] indicates the class the i-th tree is associated with. (-1 indicates that the tree’s output gets counted for all classes.)

Parameters:

num_tree (int) – Number of trees
target_id (List[int]) – Target that each tree is associated with (see explanation above)
class_id (List[int]) – Class that each tree is associated with (see explanation above)

Methods:

asdict()

Convert to dictionary

asdict()

Convert to dictionary

Return type:: Dict[str, Any]

class treelite.model_builder.PostProcessorFunc(name, sigmoid_alpha=1.0, ratio_c=1.0)

Specification for postprocessor of prediction outputs

Parameters:

name (str) – Name of the postprocessor. Consult List of postprocessor functions for the list of available postprocessor functions.
sigmoid_alpha (float) – Scaling parameter for sigmoid function sigmoid(x) = 1 / (1 + exp(-alpha * x)). This parameter is applicable only when name="sigmoid" or name="multiclass_ova". It must be strictly positive.
ratio_c (float) – Scaling parameter for exponential standard ratio transformation expstdratio(x) = exp2(-x / c). This parameter is applicable only when name="exponential_standard_ratio".

Methods:

asdict()

Convert to dictionary

asdict()

Convert to dictionary

Return type:: Dict[str, Any]

class treelite.model_builder.ModelBuilder(*, threshold_type, leaf_output_type, metadata, tree_annotation, postprocessor, base_scores, attributes=None)

Model builder class, to iteratively build a tree ensemble model.

Note

The model builder object must be only accessed by a single thread. To build multiple trees in parallel, create multiple builder objects and use model concatenation (concatenate()).

Parameters:

threshold_type (str) – Type of thresholds in the tree model
leaf_output_type (str) – Type of leaf outputs in the tree model
metadata (Metadata) – Model metadata
tree_annotation (TreeAnnotation) – Annotation for individual trees
postprocessor (PostProcessorFunc) – Postprocessor for prediction outputs
base_scores (List[float]) – Baseline scores for targets and classes, before adding tree outputs. Also known as the intercept.
attributes (Dict[Any, Any] | None) – Arbitrary JSON object, to be stored in the “attributes” field in the model object.

Attributes:

handle

Access the handle to the associated C++ object

Methods:

`start_tree`()	Start a new tree
`end_tree`()	End the current tree
`start_node`(node_key)	Start a new node
`end_node`()	End the current node
`numerical_test`(feature_id, threshold, *, ...)	Declare the current node as a numerical test node, where the test is of form [feature value] [op] [threshold].
`categorical_test`(feature_id, *, ...)	Declare the current node as a categorical test node, where the test is of form [feature value] in [category list].
`leaf`(leaf_value)	Declare the current node as a leaf node
`gain`(gain)	Specify the gain (loss reduction) that's resulted from the current split.
`data_count`(data_count)	Specify the number of data points (samples) that are mapped to the current node.
`sum_hess`(sum_hess)	Specify the weighted sample count or the sum of Hessians for the data points that are mapped to the current node.
`commit`()	Conclude model building and obtain the final model object.

property handle: Access the handle to the associated C++ object

start_tree(): Start a new tree

end_tree(): End the current tree

start_node(node_key)

Start a new node

Parameters:: node_key (int) – Integer key that unique identifies the node

end_node(): End the current node

numerical_test(feature_id, threshold, *, default_left, opname, left_child_key, right_child_key)

Declare the current node as a numerical test node, where the test is of form [feature value] [op] [threshold]. Data points for which the test evaluates to True will be mapped to the left child node; all other data points (for which the test evaluates to False) will be mapped to the right child node.

Parameters:

feature_id (int) – Feature ID
threshold (float) – Threshold
default_left (bool) – Whether the missing value should be mapped to the left child
opname (str) – Comparison operator
left_child_key (int) – Integer key that unique identifies the left child node.
right_child_key (int) – Integer key that unique identifies the right child node.

categorical_test(feature_id, *, default_left, category_list, category_list_right_child, left_child_key, right_child_key)

Declare the current node as a categorical test node, where the test is of form [feature value] in [category list].

Parameters:

feature_id (int) – Feature ID
default_left (bool) – Whether the missing value should be mapped to the left child
category_list (List[int]) – List of categories to be tested for match
category_list_right_child (bool) – Whether the data points for which the test evaluates to True should be mapped to the right child or the left child.
left_child_key (int) – Integer key that unique identifies the left child node.
right_child_key (int) – Integer key that unique identifies the right child node.

leaf(leaf_value)

Declare the current node as a leaf node

Parameters:: leaf_value (float | Sequence[float]) – Value of leaf output

gain(gain)

Specify the gain (loss reduction) that’s resulted from the current split.

Parameters:: gain (float) – Gain (loss reduction)

data_count(data_count)

Specify the number of data points (samples) that are mapped to the current node.

Parameters:: data_count (int) – Number of data points

sum_hess(sum_hess)

Specify the weighted sample count or the sum of Hessians for the data points that are mapped to the current node.

Parameters:: sum_hess (float) – Weighted sample count or the sum of Hessians

commit()

Conclude model building and obtain the final model object.

Returns:: model – Finished model
Return type:: Model

Model class 

class treelite.Model(*, handle=None)

Decision tree ensemble model

Parameters:: handle (Optional[Any]) – Handle to C++ object

Attributes:

`handle`	Access the handle to the associated C++ object
`num_tree`	Number of decision trees in the model
`num_feature`	Number of features used in the model
`input_type`	Input type
`output_type`	Output type

Methods:

`concatenate`(model_objs)	Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.
`dump_as_json`(*[, pretty_print])	Dump the model as a JSON string.
`get_tree_depth`()	Query the depth of each tree.
`get_header_accessor`()	Obtain accessor for fields in the header.
`get_tree_accessor`(tree_id)	Obtain accessor for fields in a tree.
`serialize`(filename)	Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation.
`serialize_bytes`()	Serialize (persist) the model to a byte sequence, using a fast binary representation.
`deserialize`(filename)	Deserialize (recover) the model from a checkpoint file in the disk.
`deserialize_bytes`(model_bytes)	Deserialize (recover) the model from a byte sequence.

property handle: Access the handle to the associated C++ object

property num_tree: int: Number of decision trees in the model

property num_feature: int: Number of features used in the model

property input_type: str: Input type

property output_type: str: Output type

classmethod concatenate(model_objs)

Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.

Parameters:: model_objs (List[Model]) – List of Model objects
Returns:: model – Concatenated model
Return type:: Model

Example

concatenated_model = treelite.Model.concatenate([model1, model2, model3])

dump_as_json(*, pretty_print=True)

Dump the model as a JSON string. This is useful for inspecting details of the tree ensemble model.

Parameters:: pretty_print (bool) – Whether to pretty-print the JSON string, set this to False to make the string compact
Returns:: json_str – JSON string representing the model
Return type:: str

get_tree_depth()

Query the depth of each tree.

Return type:: ndarray

get_header_accessor()

Obtain accessor for fields in the header. See Field accessors (Advanced) for more details.

Return type:: HeaderAccessor

get_tree_accessor(tree_id)

Obtain accessor for fields in a tree. See Field accessors (Advanced) for more details.

Parameters:: tree_id (int) – ID of the tree
Return type:: TreeAccessor

serialize(filename)

Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation. To recover the model from the checkpoint, use deserialize() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Parameters:: filename (str | Path) – Path to checkpoint

serialize_bytes()

Serialize (persist) the model to a byte sequence, using a fast binary representation. To recover the model from the byte sequence, use deserialize_bytes() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Return type:: bytes

classmethod deserialize(filename)

Deserialize (recover) the model from a checkpoint file in the disk. It is expected that the file was generated by a call to the serialize() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Parameters:: filename (str | Path) – Path to checkpoint
Returns:: model – Recovered model
Return type:: Model

classmethod deserialize_bytes(model_bytes)

Deserialize (recover) the model from a byte sequence. It is expected that the byte sequence was generated by a call to the serialize_bytes() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Parameters:: model_bytes (bytes) – Byte sequence representing the serialized model
Returns:: model – Recovered model
Return type:: Model

Field accessors (Advanced)

Using field accessors, users can query and modify the value of fields in a Model object. See Editing tree models (Advanced) for more details.

Note

Modifying a field is an unsafe operation

Treelite does not prevent users from assigning an invalid value to a field. Setting an invalid value may cause undefined behavior. Always consult the model spec to carefully examine model invariants and constraints on fields. For example, most tree fields must have an array of length num_nodes.

class treelite.model.HeaderAccessor(model)

Accessor for fields in the header

Parameters:: model (Model) – The model object

Methods:

`get_field`(name)	Get a field
`set_field`(name, value)	Set a field

get_field(name)

Get a field

Parameters:: name (str) – Name of the field. Consult the model spec for the list of fields.
Returns:: field – Value in the field (str for a string field, np.ndarray for other fields)
Return type:: numpy.ndarray or str

set_field(name, value)

Set a field

Parameters:

name (str) – Name of the field. Consult the model spec for the list of fields.
value (ndarray | str) – New value for the field (str for a string field, np.ndarray for other fields)

Return type:

None

class treelite.model.TreeAccessor(model, *, tree_id)

Accessor for fields in a tree

Parameters:

model (Model) – The model object
tree_id (int) – ID of the tree

Methods:

`get_field`(name)	Get a field
`set_field`(name, value)	Set a field

get_field(name)

Get a field

Parameters:: name (str) – Name of the field. Consult the model spec for the list of fields.
Returns:: field – Value in the field
Return type:: numpy.ndarray

set_field(name, value)

Set a field

Parameters:

name (str) – Name of the field. Consult the model spec for the list of fields.
value (ndarray) – New value for the field

Return type:

None

Treelite API

Model loaders

Scikit-learn importer

Model builder

Model class

Field accessors (Advanced)

Model loaders 

Scikit-learn importer 

Model builder 

Model class 