Treelite API

API of Treelite Python package.

Model loaders

Functions to load and build model objects

Functions:

load_xgboost_model_legacy_binary(filename)

Load a tree ensemble model from XGBoost model, stored using the legacy binary format.

load_xgboost_model(filename, *[, ...])

Load a tree ensemble model from XGBoost model, stored using the JSON format.

load_lightgbm_model(filename)

Load a tree ensemble model from a LightGBM model file.

from_xgboost(booster)

Load a tree ensemble model from an XGBoost Booster object

from_xgboost_json(model_json_str, *[, ...])

Load a tree ensemble model from a string containing XGBoost JSON

from_lightgbm(booster)

Load a tree ensemble model from a LightGBM Booster object

treelite.frontend.load_xgboost_model_legacy_binary(filename)

Load a tree ensemble model from XGBoost model, stored using the legacy binary format. Note: new XGBoost models should be stored in the JSON format, to take advantage of the latest functionalities of XGBoost.

Parameters:

filename (str | Path) – Path to model file

Returns:

model – Loaded model

Return type:

Model

Example

xgb_model = treelite.frontend.load_xgboost_model_legacy_binary(
    "xgboost_model.model")
treelite.frontend.load_xgboost_model(filename, *, allow_unknown_field=False)

Load a tree ensemble model from XGBoost model, stored using the JSON format.

Parameters:
  • filename (str | Path) – Path to model file

  • allow_unknown_field (bool) – Whether to allow extra fields with unrecognized keys

Returns:

model – Loaded model

Return type:

Model

Example

xgb_model = treelite.frontend.load_xgboost_model("xgboost_model.json")
treelite.frontend.load_lightgbm_model(filename)

Load a tree ensemble model from a LightGBM model file.

Parameters:

filename (str | Path) – Path to model file

Returns:

model – Loaded model

Return type:

Model

Example

lgb_model = treelite.frontend.load_lightgbm_model("lightgbm_model.txt")
treelite.frontend.from_xgboost(booster)

Load a tree ensemble model from an XGBoost Booster object

Parameters:

booster (Object of type xgboost.Booster) – Python handle to XGBoost model

Returns:

model – Loaded model

Return type:

Model

treelite.frontend.from_xgboost_json(model_json_str, *, allow_unknown_field=False)

Load a tree ensemble model from a string containing XGBoost JSON

Parameters:
  • model_json_str (bytes | bytearray | str) – A string specifying an XGBoost model in the XGBoost JSON format

  • allow_unknown_field (bool) – Whether to allow extra fields with unrecognized keys

Returns:

model – Loaded model

Return type:

Model

treelite.frontend.from_lightgbm(booster)

Load a tree ensemble model from a LightGBM Booster object

Parameters:

booster (object of type lightgbm.Booster) – Python handle to LightGBM model

Returns:

model – Loaded model

Return type:

Model

Scikit-learn importer

Model loader ingest scikit-learn models into Treelite

Functions:

import_model(sklearn_model)

Load a tree ensemble model from a scikit-learn model object

import_model_with_model_builder(sklearn_model)

This function was removed in Treelite 4.0; please use import_model() instead.

treelite.sklearn.import_model(sklearn_model)

Load a tree ensemble model from a scikit-learn model object

Note

For IsolationForest, the loaded model will calculate the outlier score using the standardized ratio as proposed in the original reference, which matches with _compute_chunked_score_samples() but is a bit different from decision_function().

Parameters:

sklearn_model (object of type RandomForestRegressor / RandomForestClassifier / ExtraTreesRegressor / ExtraTreesClassifier / GradientBoostingRegressor / GradientBoostingClassifier / HistGradientBoostingRegressor / HistGradientBoostingClassifier / IsolationForest) – Python handle to scikit-learn model

Returns:

model – Loaded model

Return type:

Model

Example

import sklearn.datasets
import sklearn.ensemble
X, y = sklearn.datasets.load_diabetes(return_X_y=True)
clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10)
clf.fit(X, y)

import treelite.sklearn
model = treelite.sklearn.import_model(clf)

Note

This function does not yet support categorical splits in HistGradientBoostingRegressor and HistGradientBoostingClassifier. If you are using either estimator types, make sure that all test nodes have numerical test conditions.

treelite.sklearn.import_model_with_model_builder(sklearn_model)

This function was removed in Treelite 4.0; please use import_model() instead.

Model builder

Treelite Model builder class

Classes:

Metadata(num_feature, task_type, ...)

Metadata object, consisting of metadata information about the model at large.

TreeAnnotation(num_tree, target_id, class_id)

Annotation for individual trees.

PostProcessorFunc(name[, sigmoid_alpha, ratio_c])

Specification for postprocessor of prediction outputs

ModelBuilder(*, threshold_type, ...[, ...])

Model builder class, to iteratively build a tree ensemble model.

class treelite.model_builder.Metadata(num_feature, task_type, average_tree_output, num_target, num_class, leaf_vector_shape)

Metadata object, consisting of metadata information about the model at large.

Parameters:
  • num_feature (int) – Number of features used in the model. We assume that all feature indices are between 0 and num_feature - 1.

  • task_type (str) – Task type. Can be one of kBinaryClf, kRegressor, kMultiClf, kLearningToRank, or kIsolationForest.

  • average_tree_output (bool) – Whether to average outputs of trees

  • num_target (int) – Number of targets

  • num_class (List[int]) – Number of classes. num_class[i] is the number of classes of target i.

  • leaf_vector_shape (Tuple[int, int]) – Shape of the output from each leaf node

Methods:

asdict()

Convert to dictionary

asdict()

Convert to dictionary

Return type:

Dict[str, Any]

class treelite.model_builder.TreeAnnotation(num_tree, target_id, class_id)

Annotation for individual trees. Use this object to look up which target and class each tree is associated with.

The output of each target / class is obtained by summing the outputs of all trees that are associated with that target / class. target_id[i] indicates the target the i-th tree is associated with. (-1 indicates that the tree is a multi-target tree, whose output gets counted for all targets.) class_id[i] indicates the class the i-th tree is associated with. (-1 indicates that the tree’s output gets counted for all classes.)

Parameters:
  • num_tree (int) – Number of trees

  • target_id (List[int]) – Target that each tree is associated with (see explanation above)

  • class_id (List[int]) – Class that each tree is associated with (see explanation above)

Methods:

asdict()

Convert to dictionary

asdict()

Convert to dictionary

Return type:

Dict[str, Any]

class treelite.model_builder.PostProcessorFunc(name, sigmoid_alpha=1.0, ratio_c=1.0)

Specification for postprocessor of prediction outputs

Parameters:
  • name (str) – Name of the postprocessor. Consult List of postprocessor functions for the list of available postprocessor functions.

  • sigmoid_alpha (float) – Scaling parameter for sigmoid function sigmoid(x) = 1 / (1 + exp(-alpha * x)). This parameter is applicable only when name="sigmoid" or name="multiclass_ova". It must be strictly positive.

  • ratio_c (float) – Scaling parameter for exponential standard ratio transformation expstdratio(x) = exp2(-x / c). This parameter is applicable only when name="exponential_standard_ratio".

Methods:

asdict()

Convert to dictionary

asdict()

Convert to dictionary

Return type:

Dict[str, Any]

class treelite.model_builder.ModelBuilder(*, threshold_type, leaf_output_type, metadata, tree_annotation, postprocessor, base_scores, attributes=None)

Model builder class, to iteratively build a tree ensemble model.

Note

The model builder object must be only accessed by a single thread. To build multiple trees in parallel, create multiple builder objects and use model concatenation (concatenate()).

Parameters:
  • threshold_type (str) – Type of thresholds in the tree model

  • leaf_output_type (str) – Type of leaf outputs in the tree model

  • metadata (Metadata) – Model metadata

  • tree_annotation (TreeAnnotation) – Annotation for individual trees

  • postprocessor (PostProcessorFunc) – Postprocessor for prediction outputs

  • base_scores (List[float]) – Baseline scores for targets and classes, before adding tree outputs. Also known as the intercept.

  • attributes (Dict[Any, Any] | None) – Arbitrary JSON object, to be stored in the “attributes” field in the model object.

Attributes:

handle

Access the handle to the associated C++ object

Methods:

start_tree()

Start a new tree

end_tree()

End the current tree

start_node(node_key)

Start a new node

end_node()

End the current node

numerical_test(feature_id, threshold, ...)

Declare the current node as a numerical test node, where the test is of form [feature value] [op] [threshold].

categorical_test(feature_id, default_left, ...)

Declare the current node as a categorical test node, where the test is of form [feature value] in [category list].

leaf(leaf_value)

Declare the current node as a leaf node

gain(gain)

Specify the gain (loss reduction) that's resulted from the current split.

data_count(data_count)

Specify the number of data points (samples) that are mapped to the current node.

sum_hess(sum_hess)

Specify the weighted sample count or the sum of Hessians for the data points that are mapped to the current node.

commit()

Conclude model building and obtain the final model object.

property handle

Access the handle to the associated C++ object

start_tree()

Start a new tree

end_tree()

End the current tree

start_node(node_key)

Start a new node

Parameters:

node_key (int) – Integer key that unique identifies the node

end_node()

End the current node

numerical_test(feature_id, threshold, default_left, opname, left_child_key, right_child_key)

Declare the current node as a numerical test node, where the test is of form [feature value] [op] [threshold]. Data points for which the test evaluates to True will be mapped to the left child node; all other data points (for which the test evaluates to False) will be mapped to the right child node.

Parameters:
  • feature_id (int) – Feature ID

  • threshold (float) – Threshold

  • default_left (bool) – Whether the missing value should be mapped to the left child

  • opname (str) – Comparison operator

  • left_child_key (int) – Integer key that unique identifies the left child node.

  • right_child_key (int) – Integer key that unique identifies the right child node.

categorical_test(feature_id, default_left, category_list, category_list_right_child, left_child_key, right_child_key)

Declare the current node as a categorical test node, where the test is of form [feature value] in [category list].

Parameters:
  • feature_id (int) – Feature ID

  • default_left (bool) – Whether the missing value should be mapped to the left child

  • category_list (List[int]) – List of categories to be tested for match

  • category_list_right_child (bool) – Whether the data points for which the test evaluates to True should be mapped to the right child or the left child.

  • left_child_key (int) – Integer key that unique identifies the left child node.

  • right_child_key (int) – Integer key that unique identifies the right child node.

leaf(leaf_value)

Declare the current node as a leaf node

Parameters:

leaf_value (float | Sequence[float]) – Value of leaf output

gain(gain)

Specify the gain (loss reduction) that’s resulted from the current split.

Parameters:

gain (float) – Gain (loss reduction)

data_count(data_count)

Specify the number of data points (samples) that are mapped to the current node.

Parameters:

data_count (int) – Number of data points

sum_hess(sum_hess)

Specify the weighted sample count or the sum of Hessians for the data points that are mapped to the current node.

Parameters:

sum_hess (float) – Weighted sample count or the sum of Hessians

commit()

Conclude model building and obtain the final model object.

Returns:

model – Finished model

Return type:

Model

Model builder (Legacy)

class treelite.ModelBuilder(num_feature, num_class=1, average_tree_output=False, threshold_type='float32', leaf_output_type='float32', *, pred_transform='identity', sigmoid_alpha=1.0, ratio_c=1.0, global_bias=0.0)

Legacy model builder class. New code should use the new model builder treelite.model_builder.ModelBuilder instead.

This module is meant to enable existing code using the old model builder API to continue functioning. Users are highly encouraged to migrate to the new model builder API, to take advantage of new features including the support for multi-target tree models.

Parameters:
  • num_feature (int) – Number of features used in model being built. We assume that all feature indices are between 0 and num_feature - 1.

  • num_class (int) – Number of output groups; >1 indicates multiclass classification

  • average_tree_output (bool) – Whether the model is a random forest; True indicates a random forest and False indicates gradient boosted trees

  • threshold_type (str) – Type of thresholds in the tree model

  • leaf_output_type (str) – Type of leaf outputs in the tree model

  • pred_transform (str) – Postprocessor for prediction outputs

  • sigmoid_alpha (float) – Scaling parameter for sigmoid function sigmoid(x) = 1 / (1 + exp(-alpha * x)). This parameter is applicable only when name="sigmoid" or name="multiclass_ova". It must be strictly positive.

  • ratio_c (float) – Scaling parameter for exponential standard ratio transformation expstdratio(x) = exp2(-x / c). This parameter is applicable only when name="exponential_standard_ratio".

  • global_bias (float) – Global bias of the model. Predicted margin scores of all instances will be adjusted by the global bias.

Classes:

Node()

A node in a tree

Tree([threshold_type, leaf_output_type])

A decision tree in a tree ensemble builder

Methods:

insert(index, tree)

Insert a tree at specified location in the ensemble

append(tree)

Add a tree at the end of the ensemble

commit()

Finalize the ensemble model

class Node

A node in a tree

Methods:

set_root()

Set the node as the root

set_leaf_node(leaf_value[, leaf_value_type])

Set the node as a leaf node

set_numerical_test_node(feature_id, opname, ...)

Set the node as a test node with numerical split.

set_categorical_test_node(feature_id, ...)

Set the node as a test node with categorical split.

set_root()

Set the node as the root

set_leaf_node(leaf_value, leaf_value_type='float32')

Set the node as a leaf node

Parameters:
  • leaf_value (float | List[float]) – Usually a single leaf value (weight) of the leaf node. For multi-class random forest classifier, leaf_value should be a list of leaf weights.

  • leaf_value_type (str) – Data type used for leaf_value

set_numerical_test_node(feature_id, opname, threshold, default_left, left_child_key, right_child_key, threshold_type='float32')

Set the node as a test node with numerical split. The test is in the form [feature value] OP [threshold]. Depending on the result of the test, either left or right child would be taken.

Parameters:
  • feature_id (int) – Feature index

  • opname (str) – Binary operator to use in the test

  • threshold (float) – Threshold value

  • default_left (bool) – Default direction for missing values (True for left; False for right)

  • left_child_key (int) – Unique integer key to identify the left child node

  • right_child_key (int) – Unique integer key to identify the right child node

  • threshold_type (str) – Data type for threshold value (e.g. ‘float32’)

set_categorical_test_node(feature_id, left_categories, default_left, left_child_key, right_child_key)

Set the node as a test node with categorical split. A list defines all categories that would be classified as the left side. Categories are integers ranging from 0 to n-1, where n is the number of categories in that particular feature.

Parameters:
  • feature_id (int) – Feature index

  • left_categories (List[int]) – List of categories belonging to the left child.

  • default_left (bool) – Default direction for missing values (True for left; False for right)

  • left_child_key (int) – Unique integer key to identify the left child node

  • right_child_key (int) – Unique integer key to identify the right child node

class Tree(threshold_type='float32', leaf_output_type='float32')

A decision tree in a tree ensemble builder

Parameters:
  • threshold_type (str) – Type of thresholds in the tree model

  • leaf_output_type (str) – Type of leaf outputs in the tree model

insert(index, tree)

Insert a tree at specified location in the ensemble

Parameters:
  • index (int) – Index of the element before which to insert the tree

  • tree (Tree) – Tree to be inserted

append(tree)

Add a tree at the end of the ensemble

Parameters:

tree (Tree) – tree to be added

commit()

Finalize the ensemble model

Returns:

model – Finished model

Return type:

Model

Model class

class treelite.Model(*, handle=None)

Decision tree ensemble model

Parameters:

handle (Optional[Any]) – Handle to C++ object

Attributes:

handle

Access the handle to the associated C++ object

num_tree

Number of decision trees in the model

num_feature

Number of features used in the model

input_type

Input type

output_type

Output type

Methods:

concatenate(model_objs)

Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.

load(filename, model_format[, ...])

Deprecated; please use load_xgboost_model() instead.

from_xgboost(booster)

Deprecated; please use from_xgboost() instead.

from_xgboost_json(model_json_str, *[, ...])

Deprecated; please use from_xgboost_json() instead.

from_lightgbm(booster)

Deprecated; please use from_lightgbm() instead.

dump_as_json(*[, pretty_print])

Dump the model as a JSON string.

get_header_accessor()

Obtain accessor for fields in the header.

get_tree_accessor(tree_id)

Obtain accessor for fields in a tree.

serialize(filename)

Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation.

serialize_bytes()

Serialize (persist) the model to a byte sequence, using a fast binary representation.

deserialize(filename)

Deserialize (recover) the model from a checkpoint file in the disk.

deserialize_bytes(model_bytes)

Deserialize (recover) the model from a byte sequence.

property handle

Access the handle to the associated C++ object

property num_tree: int

Number of decision trees in the model

property num_feature: int

Number of features used in the model

property input_type: str

Input type

property output_type: str

Output type

classmethod concatenate(model_objs)

Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.

Parameters:

model_objs (List[Model]) – List of Model objects

Returns:

model – Concatenated model

Return type:

Model

Example

concatenated_model = treelite.Model.concatenate([model1, model2, model3])
classmethod load(filename, model_format, allow_unknown_field=False)

Deprecated; please use load_xgboost_model() instead. Load a tree ensemble model from a file.

Parameters:
  • filename (str) – Path to model file

  • model_format (str) – Model file format. Must be “xgboost”, “xgboost_json”, or “lightgbm”

  • allow_unknown_field (bool) – Whether to allow extra fields with unrecognized keys. This flag is only applicable if model_format=”xgboost_json”

Returns:

model – Loaded model

Return type:

Model

classmethod from_xgboost(booster)

Deprecated; please use from_xgboost() instead. Load a tree ensemble model from an XGBoost Booster object.

Parameters:

booster (Object of type xgboost.Booster) – Python handle to XGBoost model

Returns:

model – Loaded model

Return type:

Model

classmethod from_xgboost_json(model_json_str, *, allow_unknown_field=False)

Deprecated; please use from_xgboost_json() instead. Load a tree ensemble model from a string containing XGBoost JSON.

Parameters:
  • model_json_str (bytes | bytearray | str) – A string specifying an XGBoost model in the XGBoost JSON format

  • allow_unknown_field (bool) – Whether to allow extra fields with unrecognized keys

Returns:

model – Loaded model

Return type:

Model

classmethod from_lightgbm(booster)

Deprecated; please use from_lightgbm() instead. Load a tree ensemble model from a LightGBM Booster object.

Parameters:

booster (object of type lightgbm.Booster) – Python handle to LightGBM model

Returns:

model – loaded model

Return type:

Model

dump_as_json(*, pretty_print=True)

Dump the model as a JSON string. This is useful for inspecting details of the tree ensemble model.

Parameters:

pretty_print (bool) – Whether to pretty-print the JSON string, set this to False to make the string compact

Returns:

json_str – JSON string representing the model

Return type:

str

get_header_accessor()

Obtain accessor for fields in the header. See Field accessors (Advanced) for more details.

Return type:

HeaderAccessor

get_tree_accessor(tree_id)

Obtain accessor for fields in a tree. See Field accessors (Advanced) for more details.

Parameters:

tree_id (int) – ID of the tree

Return type:

TreeAccessor

serialize(filename)

Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation. To recover the model from the checkpoint, use deserialize() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Parameters:

filename (str | Path) – Path to checkpoint

serialize_bytes()

Serialize (persist) the model to a byte sequence, using a fast binary representation. To recover the model from the byte sequence, use deserialize_bytes() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Return type:

bytes

classmethod deserialize(filename)

Deserialize (recover) the model from a checkpoint file in the disk. It is expected that the file was generated by a call to the serialize() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Parameters:

filename (str | Path) – Path to checkpoint

Returns:

model – Recovered model

Return type:

Model

classmethod deserialize_bytes(model_bytes)

Deserialize (recover) the model from a byte sequence. It is expected that the byte sequence was generated by a call to the serialize_bytes() method.

Note

Notes on forward and backward compatibility

Please see Notes on Serialization.

Parameters:

model_bytes (bytes) – Byte sequence representing the serialized model

Returns:

model – Recovered model

Return type:

Model

Field accessors (Advanced)

Using field accessors, users can query and modify the value of fields in a Model object. See Editing tree models (Advanced) for more details.

Note

Modifying a field is an unsafe operation

Treelite does not prevent users from assigning an invalid value to a field. Setting an invalid value may cause undefined behavior. Always consult the model spec to carefully examine model invariants and constraints on fields. For example, most tree fields must have an array of length num_nodes.

class treelite.model.HeaderAccessor(model)

Accessor for fields in the header

Parameters:

model (Model) – The model object

Methods:

get_field(name)

Get a field

set_field(name, value)

Set a field

get_field(name)

Get a field

Parameters:

name (str) – Name of the field. Consult the model spec for the list of fields.

Returns:

field – Value in the field (str for a string field, np.ndarray for other fields)

Return type:

numpy.ndarray or str

set_field(name, value)

Set a field

Parameters:
  • name (str) – Name of the field. Consult the model spec for the list of fields.

  • value (ndarray | str) – New value for the field (str for a string field, np.ndarray for other fields)

class treelite.model.TreeAccessor(model, *, tree_id)

Accessor for fields in a tree

Parameters:
  • model (Model) – The model object

  • tree_id (int) – ID of the tree

Methods:

get_field(name)

Get a field

set_field(name, value)

Set a field

get_field(name)

Get a field

Parameters:

name (str) – Name of the field. Consult the model spec for the list of fields.

Returns:

field – Value in the field

Return type:

numpy.ndarray

set_field(name, value)

Set a field

Parameters:
  • name (str) – Name of the field. Consult the model spec for the list of fields.

  • value (ndarray) – New value for the field