Treelite API
API of Treelite Python package.
Model loaders
Functions to load and build model objects
Functions:
|
Load a tree ensemble model from XGBoost model, stored using the legacy binary format. |
|
Load a tree ensemble model from XGBoost model, stored using JSON or UBJSON format. |
|
Load a tree ensemble model from a LightGBM model file. |
|
Load a tree ensemble model from an XGBoost Booster object |
|
Load a tree ensemble model from a string containing XGBoost JSON |
|
Load a XGBoost model from a byte sequence containing UBJSON |
|
Load a tree ensemble model from a LightGBM Booster object |
- treelite.frontend.load_xgboost_model_legacy_binary(filename)
Load a tree ensemble model from XGBoost model, stored using the legacy binary format. Note: new XGBoost models should be stored in the JSON format, to take advantage of the latest functionalities of XGBoost.
- Parameters:
- Returns:
model – Loaded model
- Return type:
Model
Example
xgb_model = treelite.frontend.load_xgboost_model_legacy_binary( "xgboost_model.model")
- treelite.frontend.load_xgboost_model(filename, *, format_choice='use_suffix', allow_unknown_field=False)
Load a tree ensemble model from XGBoost model, stored using JSON or UBJSON format.
- Parameters:
format_choice (str) –
Method to select the model format
use_suffix
(default): Use the suffix of the file name (also known as file extension) to detect the format. Files whose names end with.json
will be parsed as JSON; all other files will be parsed as UBJSON.inspect
: Inspect the first few bytes of the file to heuristically determine whether the file is JSON or UBJSON.ubjson
: Parse the file as UBJSON.json
: Parse the file as JSON.
allow_unknown_field (bool) – Whether to allow extra fields with unrecognized keys
- Returns:
model – Loaded model
- Return type:
Model
Example
xgb_model = treelite.frontend.load_xgboost_model("xgboost_model.json")
- treelite.frontend.load_lightgbm_model(filename)
Load a tree ensemble model from a LightGBM model file.
- Parameters:
- Returns:
model – Loaded model
- Return type:
Model
Example
lgb_model = treelite.frontend.load_lightgbm_model("lightgbm_model.txt")
- treelite.frontend.from_xgboost(booster)
Load a tree ensemble model from an XGBoost Booster object
- Parameters:
booster (Object of type
xgboost.Booster
) – Python handle to XGBoost model- Returns:
model – Loaded model
- Return type:
Model
- treelite.frontend.from_xgboost_json(model_json_str, *, allow_unknown_field=False)
Load a tree ensemble model from a string containing XGBoost JSON
- treelite.frontend.from_xgboost_ubjson(model_ubjson_str, *, allow_unknown_field=False)
Load a XGBoost model from a byte sequence containing UBJSON
- treelite.frontend.from_lightgbm(booster)
Load a tree ensemble model from a LightGBM Booster object
- Parameters:
booster (object of type
lightgbm.Booster
) – Python handle to LightGBM model- Returns:
model – Loaded model
- Return type:
Model
Scikit-learn importer
Model loader to ingest scikit-learn models into Treelite
Functions:
|
Load a tree ensemble model from a scikit-learn model object |
|
Export a model as a scikit-learn RandomForest. |
|
This function was removed in Treelite 4.0; please use |
- treelite.sklearn.import_model(sklearn_model)
Load a tree ensemble model from a scikit-learn model object
Note
For
IsolationForest
, the loaded model will calculate the outlier score using the standardized ratio as proposed in the original reference, which matches withscore_samples()
but is a bit different fromdecision_function()
.More precisely, the following relation holds:
treelite.gtil.predict(tl_model, X) == -clf.score_samples(X) # clf is an IsolationForest # tl_model is a Treelite representation of clf
- Parameters:
sklearn_model (object of type
RandomForestRegressor
/RandomForestClassifier
/ExtraTreesRegressor
/ExtraTreesClassifier
/GradientBoostingRegressor
/GradientBoostingClassifier
/HistGradientBoostingRegressor
/HistGradientBoostingClassifier
/IsolationForest
) – Python handle to scikit-learn model- Returns:
model – Loaded model
- Return type:
Model
Example
import sklearn.datasets import sklearn.ensemble X, y = sklearn.datasets.load_diabetes(return_X_y=True) clf = sklearn.ensemble.RandomForestRegressor(n_estimators=10) clf.fit(X, y) import treelite.sklearn model = treelite.sklearn.import_model(clf)
- treelite.sklearn.export_model(model)
Export a model as a scikit-learn RandomForest.
Note
Currently only random forests can be exported as scikit-learn model objects. Support for gradient boosted trees and other kinds of tree models will be added in the future.
- Parameters:
model (
Model
) – Treelite mobel to export- Returns:
sklearn_model – Scikit-learn model
- Return type:
object of type
RandomForestRegressor
/RandomForestClassifier
/GradientBoostingRegressor
/GradientBoostingClassifier
- treelite.sklearn.import_model_with_model_builder(sklearn_model)
This function was removed in Treelite 4.0; please use
import_model()
instead.
Model builder
Treelite Model builder class
Classes:
|
Metadata object, consisting of metadata information about the model at large. |
|
Annotation for individual trees. |
|
Specification for postprocessor of prediction outputs |
|
Model builder class, to iteratively build a tree ensemble model. |
- class treelite.model_builder.Metadata(num_feature, task_type, average_tree_output, num_target, num_class, leaf_vector_shape)
Metadata object, consisting of metadata information about the model at large.
- Parameters:
num_feature (int) – Number of features used in the model. We assume that all feature indices are between
0
andnum_feature - 1
.task_type (str) – Task type. Can be one of
kBinaryClf
,kRegressor
,kMultiClf
,kLearningToRank
, orkIsolationForest
.average_tree_output (bool) – Whether to average outputs of trees
num_target (int) – Number of targets
num_class (List[int]) – Number of classes. num_class[i] is the number of classes of target i.
leaf_vector_shape (Tuple[int, int]) – Shape of the output from each leaf node
Methods:
asdict
()Convert to dictionary
- class treelite.model_builder.TreeAnnotation(num_tree, target_id, class_id)
Annotation for individual trees. Use this object to look up which target and class each tree is associated with.
The output of each target / class is obtained by summing the outputs of all trees that are associated with that target / class. target_id[i] indicates the target the i-th tree is associated with. (-1 indicates that the tree is a multi-target tree, whose output gets counted for all targets.) class_id[i] indicates the class the i-th tree is associated with. (-1 indicates that the tree’s output gets counted for all classes.)
- Parameters:
Methods:
asdict
()Convert to dictionary
- class treelite.model_builder.PostProcessorFunc(name, sigmoid_alpha=1.0, ratio_c=1.0)
Specification for postprocessor of prediction outputs
- Parameters:
name (str) – Name of the postprocessor. Consult List of postprocessor functions for the list of available postprocessor functions.
sigmoid_alpha (float) – Scaling parameter for sigmoid function
sigmoid(x) = 1 / (1 + exp(-alpha * x))
. This parameter is applicable only whenname="sigmoid"
orname="multiclass_ova"
. It must be strictly positive.ratio_c (float) – Scaling parameter for exponential standard ratio transformation
expstdratio(x) = exp2(-x / c)
. This parameter is applicable only whenname="exponential_standard_ratio"
.
Methods:
asdict
()Convert to dictionary
- class treelite.model_builder.ModelBuilder(*, threshold_type, leaf_output_type, metadata, tree_annotation, postprocessor, base_scores, attributes=None)
Model builder class, to iteratively build a tree ensemble model.
Note
The model builder object must be only accessed by a single thread. To build multiple trees in parallel, create multiple builder objects and use model concatenation (
concatenate()
).- Parameters:
threshold_type (str) – Type of thresholds in the tree model
leaf_output_type (str) – Type of leaf outputs in the tree model
metadata (Metadata) – Model metadata
tree_annotation (TreeAnnotation) – Annotation for individual trees
postprocessor (PostProcessorFunc) – Postprocessor for prediction outputs
base_scores (List[float]) – Baseline scores for targets and classes, before adding tree outputs. Also known as the intercept.
attributes (Dict[Any, Any] | None) – Arbitrary JSON object, to be stored in the “attributes” field in the model object.
Attributes:
Access the handle to the associated C++ object
Methods:
Start a new tree
end_tree
()End the current tree
start_node
(node_key)Start a new node
end_node
()End the current node
numerical_test
(feature_id, threshold, *, ...)Declare the current node as a numerical test node, where the test is of form [feature value] [op] [threshold].
categorical_test
(feature_id, *, ...)Declare the current node as a categorical test node, where the test is of form [feature value] in [category list].
leaf
(leaf_value)Declare the current node as a leaf node
gain
(gain)Specify the gain (loss reduction) that's resulted from the current split.
data_count
(data_count)Specify the number of data points (samples) that are mapped to the current node.
sum_hess
(sum_hess)Specify the weighted sample count or the sum of Hessians for the data points that are mapped to the current node.
commit
()Conclude model building and obtain the final model object.
- property handle
Access the handle to the associated C++ object
- start_tree()
Start a new tree
- end_tree()
End the current tree
- start_node(node_key)
Start a new node
- Parameters:
node_key (int) – Integer key that unique identifies the node
- end_node()
End the current node
- numerical_test(feature_id, threshold, *, default_left, opname, left_child_key, right_child_key)
Declare the current node as a numerical test node, where the test is of form [feature value] [op] [threshold]. Data points for which the test evaluates to True will be mapped to the left child node; all other data points (for which the test evaluates to False) will be mapped to the right child node.
- Parameters:
feature_id (int) – Feature ID
threshold (float) – Threshold
default_left (bool) – Whether the missing value should be mapped to the left child
opname (str) – Comparison operator
left_child_key (int) – Integer key that unique identifies the left child node.
right_child_key (int) – Integer key that unique identifies the right child node.
- categorical_test(feature_id, *, default_left, category_list, category_list_right_child, left_child_key, right_child_key)
Declare the current node as a categorical test node, where the test is of form [feature value] in [category list].
- Parameters:
feature_id (int) – Feature ID
default_left (bool) – Whether the missing value should be mapped to the left child
category_list (List[int]) – List of categories to be tested for match
category_list_right_child (bool) – Whether the data points for which the test evaluates to True should be mapped to the right child or the left child.
left_child_key (int) – Integer key that unique identifies the left child node.
right_child_key (int) – Integer key that unique identifies the right child node.
- leaf(leaf_value)
Declare the current node as a leaf node
- gain(gain)
Specify the gain (loss reduction) that’s resulted from the current split.
- Parameters:
gain (float) – Gain (loss reduction)
- data_count(data_count)
Specify the number of data points (samples) that are mapped to the current node.
- Parameters:
data_count (int) – Number of data points
- sum_hess(sum_hess)
Specify the weighted sample count or the sum of Hessians for the data points that are mapped to the current node.
- Parameters:
sum_hess (float) – Weighted sample count or the sum of Hessians
- commit()
Conclude model building and obtain the final model object.
- Returns:
model – Finished model
- Return type:
Model
Model builder (Legacy)
- class treelite.ModelBuilder(*, num_feature, num_class=1, average_tree_output=False, threshold_type='float32', leaf_output_type='float32', pred_transform='identity', sigmoid_alpha=1.0, ratio_c=1.0, global_bias=0.0)
Legacy model builder class. New code should use the new model builder
treelite.model_builder.ModelBuilder
instead.This module is meant to enable existing code using the old model builder API to continue functioning. Users are highly encouraged to migrate to the new model builder API, to take advantage of new features including the support for multi-target tree models.
- Parameters:
num_feature (int) – Number of features used in model being built. We assume that all feature indices are between
0
andnum_feature - 1
.num_class (int) – Number of output groups;
>1
indicates multiclass classificationaverage_tree_output (bool) – Whether the model is a random forest;
True
indicates a random forest andFalse
indicates gradient boosted treesthreshold_type (str) – Type of thresholds in the tree model
leaf_output_type (str) – Type of leaf outputs in the tree model
pred_transform (str) – Postprocessor for prediction outputs
sigmoid_alpha (float) – Scaling parameter for sigmoid function
sigmoid(x) = 1 / (1 + exp(-alpha * x))
. This parameter is applicable only whenname="sigmoid"
orname="multiclass_ova"
. It must be strictly positive.ratio_c (float) – Scaling parameter for exponential standard ratio transformation
expstdratio(x) = exp2(-x / c)
. This parameter is applicable only whenname="exponential_standard_ratio"
.global_bias (float) – Global bias of the model. Predicted margin scores of all instances will be adjusted by the global bias.
Classes:
Node
()A node in a tree
Tree
([threshold_type, leaf_output_type])A decision tree in a tree ensemble builder
Methods:
insert
(index, tree)Insert a tree at specified location in the ensemble
append
(tree)Add a tree at the end of the ensemble
commit
()Finalize the ensemble model
- class Node
A node in a tree
Methods:
set_root
()Set the node as the root
set_leaf_node
(leaf_value[, leaf_value_type])Set the node as a leaf node
set_numerical_test_node
(feature_id, opname, ...)Set the node as a test node with numerical split.
set_categorical_test_node
(feature_id, ...)Set the node as a test node with categorical split.
- set_root()
Set the node as the root
- set_leaf_node(leaf_value, leaf_value_type='float32')
Set the node as a leaf node
- set_numerical_test_node(feature_id, opname, threshold, *, default_left, left_child_key, right_child_key, threshold_type='float32')
Set the node as a test node with numerical split. The test is in the form
[feature value] OP [threshold]
. Depending on the result of the test, either left or right child would be taken.- Parameters:
feature_id (int) – Feature index
opname (str) – Binary operator to use in the test
threshold (float) – Threshold value
default_left (bool) – Default direction for missing values (
True
for left;False
for right)left_child_key (int) – Unique integer key to identify the left child node
right_child_key (int) – Unique integer key to identify the right child node
threshold_type (str) – Data type for threshold value (e.g. ‘float32’)
- set_categorical_test_node(feature_id, left_categories, *, default_left, left_child_key, right_child_key)
Set the node as a test node with categorical split. A list defines all categories that would be classified as the left side. Categories are integers ranging from
0
ton-1
, wheren
is the number of categories in that particular feature.- Parameters:
feature_id (int) – Feature index
left_categories (List[int]) – List of categories belonging to the left child.
default_left (bool) – Default direction for missing values (
True
for left;False
for right)left_child_key (int) – Unique integer key to identify the left child node
right_child_key (int) – Unique integer key to identify the right child node
- class Tree(threshold_type='float32', leaf_output_type='float32')
A decision tree in a tree ensemble builder
- insert(index, tree)
Insert a tree at specified location in the ensemble
Model class
- class treelite.Model(*, handle=None)
Decision tree ensemble model
- Parameters:
handle (Optional[Any]) – Handle to C++ object
Attributes:
Access the handle to the associated C++ object
Number of decision trees in the model
Number of features used in the model
Input type
Output type
Methods:
concatenate
(model_objs)Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.
load
(filename, model_format[, ...])Deprecated; please use
load_xgboost_model()
,load_xgboost_model_legacy_binary()
, orload_lightgbm_model()
instead.from_xgboost
(booster)Deprecated; please use
from_xgboost()
instead.from_xgboost_json
(model_json_str, *[, ...])Deprecated; please use
from_xgboost_json()
instead.from_lightgbm
(booster)Deprecated; please use
from_lightgbm()
instead.dump_as_json
(*[, pretty_print])Dump the model as a JSON string.
Query the depth of each tree.
Obtain accessor for fields in the header.
get_tree_accessor
(tree_id)Obtain accessor for fields in a tree.
serialize
(filename)Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation.
Serialize (persist) the model to a byte sequence, using a fast binary representation.
deserialize
(filename)Deserialize (recover) the model from a checkpoint file in the disk.
deserialize_bytes
(model_bytes)Deserialize (recover) the model from a byte sequence.
- property handle
Access the handle to the associated C++ object
- classmethod concatenate(model_objs)
Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.
- Parameters:
- Returns:
model – Concatenated model
- Return type:
Example
concatenated_model = treelite.Model.concatenate([model1, model2, model3])
- classmethod load(filename, model_format, allow_unknown_field=False)
Deprecated; please use
load_xgboost_model()
,load_xgboost_model_legacy_binary()
, orload_lightgbm_model()
instead.Load a tree ensemble model from a file.
- Parameters:
- Returns:
model – Loaded model
- Return type:
- classmethod from_xgboost(booster)
Deprecated; please use
from_xgboost()
instead. Load a tree ensemble model from an XGBoost Booster object.- Parameters:
booster (Object of type
xgboost.Booster
) – Python handle to XGBoost model- Returns:
model – Loaded model
- Return type:
- classmethod from_xgboost_json(model_json_str, *, allow_unknown_field=False)
Deprecated; please use
from_xgboost_json()
instead. Load a tree ensemble model from a string containing XGBoost JSON.
- classmethod from_lightgbm(booster)
Deprecated; please use
from_lightgbm()
instead. Load a tree ensemble model from a LightGBM Booster object.- Parameters:
booster (object of type
lightgbm.Booster
) – Python handle to LightGBM model- Returns:
model – loaded model
- Return type:
- dump_as_json(*, pretty_print=True)
Dump the model as a JSON string. This is useful for inspecting details of the tree ensemble model.
- get_header_accessor()
Obtain accessor for fields in the header. See Field accessors (Advanced) for more details.
- Return type:
- get_tree_accessor(tree_id)
Obtain accessor for fields in a tree. See Field accessors (Advanced) for more details.
- Parameters:
tree_id (int) – ID of the tree
- Return type:
- serialize(filename)
Serialize (persist) the model to a checkpoint file in the disk, using a fast binary representation. To recover the model from the checkpoint, use
deserialize()
method.
- serialize_bytes()
Serialize (persist) the model to a byte sequence, using a fast binary representation. To recover the model from the byte sequence, use
deserialize_bytes()
method.- Return type:
- classmethod deserialize(filename)
Deserialize (recover) the model from a checkpoint file in the disk. It is expected that the file was generated by a call to the
serialize()
method.
- classmethod deserialize_bytes(model_bytes)
Deserialize (recover) the model from a byte sequence. It is expected that the byte sequence was generated by a call to the
serialize_bytes()
method.
Field accessors (Advanced)
Using field accessors, users can query and modify the value of fields in a Model
object.
See Editing tree models (Advanced) for more details.
Note
Modifying a field is an unsafe operation
Treelite does not prevent users from assigning an invalid value to a field. Setting an invalid value may
cause undefined behavior. Always consult the model spec to carefully examine
model invariants and constraints on fields. For example, most tree fields must have an array of length num_nodes
.
- class treelite.model.HeaderAccessor(model)
Accessor for fields in the header
- Parameters:
model (Model) – The model object
Methods:
get_field
(name)Get a field
set_field
(name, value)Set a field
- get_field(name)
Get a field
- Parameters:
name (str) – Name of the field. Consult the model spec for the list of fields.
- Returns:
field – Value in the field (
str
for a string field,np.ndarray
for other fields)- Return type:
numpy.ndarray
orstr
- set_field(name, value)
Set a field
- Parameters:
name (str) – Name of the field. Consult the model spec for the list of fields.
value (ndarray | str) – New value for the field (
str
for a string field,np.ndarray
for other fields)
- Return type:
None
- class treelite.model.TreeAccessor(model, *, tree_id)
Accessor for fields in a tree
Methods:
get_field
(name)Get a field
set_field
(name, value)Set a field
- get_field(name)
Get a field
- Parameters:
name (str) – Name of the field. Consult the model spec for the list of fields.
- Returns:
field – Value in the field
- Return type:
- set_field(name, value)
Set a field
- Parameters:
name (str) – Name of the field. Consult the model spec for the list of fields.
value (ndarray) – New value for the field
- Return type:
None