Treelite C API
Treelite exposes a set of C functions to enable interfacing with a variety of languages. This page will be most useful for:
those writing a new language binding (glue code).
those wanting to incorporate functions of Treelite into their own native libraries.
We recommend the Python API for everyday uses.
Note
Use of C and C++ in Treelite
Core logic of Treelite are written in C++ to take advantage of higher abstractions. We provide C only interface here, as many more programming languages bind with C than with C++. See this page for more details.
Data matrix interface
Use the following functions to load and manipulate data from a variety of sources.
-
int TreeliteDMatrixCreateFromCSR(const void *data, const char *data_type, const uint32_t *col_ind, const size_t *row_ptr, size_t num_row, size_t num_col, DMatrixHandle *out)
create DMatrix from a (in-memory) CSR matrix
- Parameters:
data – feature values
data_type – Type of data elements
col_ind – feature indices
row_ptr – pointer to row headers
num_row – number of rows
num_col – number of columns
out – the created DMatrix
- Returns:
0 for success, -1 for failure
-
int TreeliteDMatrixCreateFromMat(const void *data, const char *data_type, size_t num_row, size_t num_col, const void *missing_value, DMatrixHandle *out)
create DMatrix from a (in-memory) dense matrix
- Parameters:
data – feature values
data_type – Type of data elements
num_row – number of rows
num_col – number of columns
missing_value – value to represent missing value
out – the created DMatrix
- Returns:
0 for success, -1 for failure
-
int TreeliteDMatrixGetDimension(DMatrixHandle handle, size_t *out_num_row, size_t *out_num_col, size_t *out_nelem)
get dimensions of a DMatrix
- Parameters:
handle – handle to DMatrix
out_num_row – used to set number of rows
out_num_col – used to set number of columns
out_nelem – used to set number of nonzero entries
- Returns:
0 for success, -1 for failure
-
int TreeliteDMatrixFree(DMatrixHandle handle)
delete DMatrix from memory
- Parameters:
handle – handle to DMatrix
- Returns:
0 for success, -1 for failure
Branch annotator interface
Use the following functions to annotate branches in decision trees.
-
int TreeliteAnnotateBranch(ModelHandle model, DMatrixHandle dmat, int nthread, int verbose, AnnotationHandle *out)
annotate branches in a given model using frequency patterns in the training data.
- Parameters:
model – model to annotate
dmat – training data matrix
nthread – number of threads to use
verbose – whether to produce extra messages
out – used to save handle for the created annotation
- Returns:
0 for success, -1 for failure
-
int TreeliteAnnotationSave(AnnotationHandle handle, const char *path)
save branch annotation to a JSON file
- Parameters:
handle – annotation to save
path – path to JSON file
- Returns:
0 for success, -1 for failure
-
int TreeliteAnnotationFree(AnnotationHandle handle)
delete branch annotation from memory
- Parameters:
handle – annotation to remove
- Returns:
0 for success, -1 for failure
Compiler interface
Use the following functions to produce optimize prediction subroutine (in C) from a given decision tree ensemble.
-
int TreeliteCompilerCreateV2(const char *name, const char *params_json_str, CompilerHandle *out)
Create a compiler with a given name.
- Parameters:
name – name of compiler
params_json_str – JSON string representing the parameters for the compiler
out – created compiler
- Returns:
0 for success, -1 for failure
-
int TreeliteCompilerGenerateCodeV2(CompilerHandle compiler, ModelHandle model, const char *dirpath)
Generate prediction code from a tree ensemble model. The code will be C99 compliant. One header file (.h) will be generated, along with one or more source files (.c).
Usage example:
TreeliteCompilerGenerateCodeV2(compiler, model, "./my/model"); // files to generate: ./my/model/header.h, ./my/model/main.c // if parallel compilation is enabled: // ./my/model/header.h, ./my/model/main.c, ./my/model/tu0.c, // ./my/model/tu1.c, and so forth
- Parameters:
compiler – handle for compiler
model – handle for tree ensemble model
dirpath – directory to store header and source files
- Returns:
0 for success, -1 for failure
-
int TreeliteCompilerFree(CompilerHandle handle)
delete compiler from memory
- Parameters:
handle – compiler to remove
- Returns:
0 for success, -1 for failure
Model loader interface
Use the following functions to load decision tree ensemble models from a file. Treelite supports multiple model file formats.
-
int TreeliteLoadLightGBMModel(const char *filename, ModelHandle *out)
Deprecated. Please use TreeliteLoadLightGBMModelEx instead.
-
int TreeliteLoadLightGBMModelEx(const char *filename, const char *config_json, ModelHandle *out)
Load a model file generated by LightGBM (Microsoft/LightGBM). The model file must contain a decision tree ensemble.
- Parameters:
filename – name of model file
config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser
out – loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadXGBoostModel(const char *filename, ModelHandle *out)
Deprecated. Please use TreeliteLoadXGBoostModelEx instead.
-
int TreeliteLoadXGBoostModelEx(const char *filename, const char *config_json, ModelHandle *out)
Load a model file generated by XGBoost (dmlc/xgboost). The model file must contain a decision tree ensemble.
- Parameters:
filename – name of model file
config_json – JSON string consisting key-value pairs; used for configuring the model parser
out – loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadXGBoostJSON(const char *filename, ModelHandle *out)
Deprecated. Please use TreeliteLoadXGBoostJSONEx instead.
-
int TreeliteLoadXGBoostJSONEx(const char *filename, const char *config_json, ModelHandle *out)
Load a json model file generated by XGBoost (dmlc/xgboost). The model file must contain a decision tree ensemble.
- Parameters:
filename – name of model file
config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser
out – loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadXGBoostJSONString(const char *json_str, size_t length, ModelHandle *out)
Deprecated. Please use TreeliteLoadXGBoostJSONStringEx instead.
-
int TreeliteLoadXGBoostJSONStringEx(const char *json_str, size_t length, const char *config_json, ModelHandle *out)
Load a model stored as JSON string by XGBoost (dmlc/xgboost). The model json must contain a decision tree ensemble.
- Parameters:
json_str – the string containing the JSON model specification
length – the length of the JSON string
config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser
out – loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadXGBoostModelFromMemoryBuffer(const void *buf, size_t len, ModelHandle *out)
Deprecated. Please use TreeliteLoadXGBoostModelFromMemoryBufferEx instead.
-
int TreeliteLoadXGBoostModelFromMemoryBufferEx(const void *buf, size_t len, const char *config_json, ModelHandle *out)
Load an XGBoost model from a memory buffer.
- Parameters:
buf – memory buffer
len – size of memory buffer
config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser
out – loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadLightGBMModelFromString(const char *model_str, ModelHandle *out)
Deprecated. Please use TreeliteLoadLightGBMModelFromStringEx instead.
-
int TreeliteLoadLightGBMModelFromStringEx(const char *model_str, const char *config_json, ModelHandle *out)
Load a LightGBM model from a string. The string should be created with the model_to_string() method in LightGBM.
- Parameters:
model_str – the model string
config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser
out – loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteBuildModelFromJSONString(const char *json_str, const char *config_json, ModelHandle *out)
Construct a new Treelite model from a JSON string.
- Parameters:
json_str – JSON string
config_json – Configuration to use when parsing the JSON string. Configuration should be a JSON object consisting of key-value pairs.
out – Constructed model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadSKLearnRandomForestRegressor(int n_estimators, int n_features, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, ModelHandle *out)
Load a scikit-learn random forest regressor model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail. Note that this function can also be used to load an ensemble of extremely randomized trees (sklearn.ensemble.ExtraTreesRegressor).
- Parameters:
n_estimators – number of trees in the random forest
n_features – number of features in the training data
node_count – node_count[i] stores the number of nodes in the i-th tree
children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.
n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.
weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.
impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.
out – pointer to store the loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadSKLearnIsolationForest(int n_estimators, int n_features, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, const double ratio_c, ModelHandle *out)
Load a scikit-learn isolation forest model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail.
- Parameters:
n_estimators – number of trees in the random forest
n_features – number of features in the training data
node_count – node_count[i] stores the number of nodes in the i-th tree
children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
value – value[i][k] stores the expected isolation depth of node k of the i-th tree. This is only defined if node k is a leaf node.
n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.
weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.
impurity – not used, but must be passed as array of arrays for each tree and node.
ratio_c – standardizing constant to use for calculation of the anomaly score.
out – pointer to store the loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadSKLearnRandomForestClassifier(int n_estimators, int n_features, int n_classes, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, ModelHandle *out)
Load a scikit-learn random forest classifier model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail. Note that this function can also be used to load an ensemble of extremely randomized trees (sklearn.ensemble.ExtraTreesClassifier).
- Parameters:
n_estimators – number of trees in the random forest
n_features – number of features in the training data
n_classes – number of classes in the target variable
node_count – node_count[i] stores the number of nodes in the i-th tree
children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.
n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.
weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.
impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.
out – pointer to store the loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadSKLearnGradientBoostingRegressor(int n_iter, int n_features, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, ModelHandle *out)
Load a scikit-learn gradient boosting regressor model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail.
- Parameters:
n_iter – Number of boosting iterations
n_features – number of features in the training data
node_count – node_count[i] stores the number of nodes in the i-th tree
children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.
n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.
weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.
impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.
out – pointer to store the loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadSKLearnGradientBoostingClassifier(int n_iter, int n_features, int n_classes, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, ModelHandle *out)
Load a scikit-learn gradient boosting classifier model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail.
- Parameters:
n_iter – Number of boosting iterations
n_features – number of features in the training data
n_classes – number of classes in the target variable
node_count – node_count[i] stores the number of nodes in the i-th tree
children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.
n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.
weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.
impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.
out – pointer to store the loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadSKLearnHistGradientBoostingRegressor(int n_iter, int n_features, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const int8_t **default_left, const double **value, const int64_t **n_node_samples, const double **gain, const double *baseline_prediction, ModelHandle *out)
Load a scikit-learn HistGradientBoostingRegressor model from a collection of arrays. Unlike other scikit-learn models, this model class natively handles missing values like XGBooost does.
- Parameters:
n_iter – Number of boosting iterations
n_features – Number of features in the training data
node_count – node_count[i] stores the number of nodes in the i-th tree
children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
default_left – default_left[i][k] indicates how the missing value should be handled at node k of the i-th tree. This flag is defined if node k is an internal (non-leaf) node. If True, the missing value will be associated with the left child; if False, the missing value will be associated with the right child.
value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.
n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.
gain – gain[i][k] stores the gain (reduction of the loss function) associate with node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
baseline_prediction – Baseline predictions for outputs. At prediction, margin scores will be adjusted by this amount before applying the post-processing (link) function. Required shape: (1,)
out – pointer to store the loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteLoadSKLearnHistGradientBoostingClassifier(int n_iter, int n_features, int n_classes, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const int8_t **default_left, const double **value, const int64_t **n_node_samples, const double **gain, const double *baseline_prediction, ModelHandle *out)
Load a scikit-learn HistGradientBoostingClassifier model from a collection of arrays. Unlike other scikit-learn models, this model class natively handles missing values like XGBooost does.
- Parameters:
n_iter – Number of boosting iterations
n_features – Number of features in the training data
n_classes – Number of classes in the target variable
node_count – node_count[i] stores the number of nodes in the i-th tree
children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
default_left – default_left[i][k] indicates how the missing value should be handled at node k of the i-th tree. This flag is defined if node k is an internal (non-leaf) node. If True, the missing value will be associated with the left child; if False, the missing value will be associated with the right child.
value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.
n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.
gain – gain[i][k] stores the gain (reduction of the loss function) associate with node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.
baseline_prediction – Baseline predictions for outputs. At prediction, margin scores will be adjusted by this amount before applying the post-processing (link) function. Required shape: (1,) for binary classification; (n_classes,) for multi-class classification
out – pointer to store the loaded model
- Returns:
0 for success, -1 for failure
-
int TreeliteQueryNumTree(ModelHandle handle, size_t *out)
Query the number of trees in the model.
- Parameters:
handle – model to query
out – number of trees
- Returns:
0 for success, -1 for failure
-
int TreeliteQueryNumFeature(ModelHandle handle, size_t *out)
Query the number of features used in the model.
- Parameters:
handle – model to query
out – number of features
- Returns:
0 for success, -1 for failure
-
int TreeliteQueryNumClass(ModelHandle handle, size_t *out)
Query the number of classes of the model. (1 if the model is binary classifier or regressor)
- Parameters:
handle – model to query
out – number of output groups
- Returns:
0 for success, -1 for failure
-
int TreeliteSetTreeLimit(ModelHandle handle, size_t limit)
keep first N trees of model, limit must smaller than number of trees.
- Parameters:
handle – model
limit – number of trees to keep
- Returns:
0 for success, -1 for failure
-
int TreeliteSerializeModel(const char *filename, ModelHandle handle)
Deprecated. Please use TreeliteSerializeModelToFile instead.
-
int TreeliteDeserializeModel(const char *filename, ModelHandle *out)
Deprecated. Please use TreeliteDeserializeModelFromFile instead.
-
int TreeliteSerializeModelToFile(ModelHandle handle, const char *filename)
Serialize (persist) a model object to disk.
- Parameters:
handle – Handle to the model object
filename – Name of the file to which to serialize the model. The file will be using a binary format that’s optimized to store the Treelite model object efficiently.
- Returns:
0 for success, -1 for failure
-
int TreeliteDeserializeModelFromFile(const char *filename, ModelHandle *out)
Deserialize (load) a model object from disk.
- Parameters:
filename – Name of the file from which to deserialize the model. The file should be created by a call to TreeliteSerializeModelToFile.
out – Handle to the model object
- Returns:
0 for success, -1 for failure
-
int TreeliteSerializeModelToBytes(ModelHandle handle, const char **out_bytes, size_t *out_bytes_len)
Serialize (persist) a model object to a byte sequence.
- Parameters:
handle – Handle to the model object
out_bytes – Byte sequence containing serialized model
out_bytes_len – Length of out_bytes
- Returns:
0 for success, -1 for failure
-
int TreeliteDeserializeModelFromBytes(const char *bytes, size_t bytes_len, ModelHandle *out)
Deserialize (load) a model object from a byte sequence.
- Parameters:
bytes – Byte sequence containing serialized model. The string should be created by a call to TreeliteSerializeModelToBytes.
bytes_len – Length of bytes
out – Handle to the model object
- Returns:
0 for success, -1 for failure
-
int TreeliteConcatenateModelObjects(const ModelHandle *objs, size_t len, ModelHandle *out)
Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.
- Parameters:
objs – Pointer to the beginning of the list of model objects
len – Number of model objects
out – Used to save the concatenated model
-
int TreeliteDumpAsJSON(ModelHandle handle, int pretty_print, const char **out_json_str)
Dump a model object as a JSON string.
- Parameters:
handle – The handle to the model object
pretty_print – Whether to pretty-print JSON string (0 for false, != 0 for true)
out_json_str – The JSON string
- Returns:
0 for success, -1 for failure
-
int TreeliteFreeModel(ModelHandle handle)
delete model from memory
- Parameters:
handle – model to remove
- Returns:
0 for success, -1 for failure
Model builder interface
Use the following functions to incrementally build decisio n tree ensemble models.
-
int TreeliteTreeBuilderCreateValue(const void *init_value, const char *type, ValueHandle *out)
Create a new Value object. Some model builder API functions accept this Value type to accommodate values of multiple types.
- Parameters:
init_value – pointer to the value to be stored
type – Type of the value to be stored
out – newly created Value object
- Returns:
0 for success; -1 for failure
-
int TreeliteTreeBuilderDeleteValue(ValueHandle handle)
Delete a Value object from memory.
- Parameters:
handle – pointer to the Value object to be deleted
- Returns:
0 for success; -1 for failure
-
int TreeliteCreateTreeBuilder(const char *threshold_type, const char *leaf_output_type, TreeBuilderHandle *out)
Create a new tree builder.
- Parameters:
threshold_type – Type of thresholds in numerical splits. All thresholds in a given model must have the same type.
leaf_output_type – Type of leaf outputs. All leaf outputs in a given model must have the same type.
out – newly created tree builder
- Returns:
0 for success; -1 for failure
-
int TreeliteDeleteTreeBuilder(TreeBuilderHandle handle)
Delete a tree builder from memory.
- Parameters:
handle – tree builder to remove
- Returns:
0 for success; -1 for failure
-
int TreeliteTreeBuilderCreateNode(TreeBuilderHandle handle, int node_key)
Create an empty node within a tree.
- Parameters:
handle – tree builder
node_key – unique integer key to identify the new node
- Returns:
0 for success; -1 for failure
-
int TreeliteTreeBuilderDeleteNode(TreeBuilderHandle handle, int node_key)
Remove a node from a tree.
- Parameters:
handle – tree builder
node_key – unique integer key to identify the node to be removed
- Returns:
0 for success; -1 for failure
-
int TreeliteTreeBuilderSetRootNode(TreeBuilderHandle handle, int node_key)
Set a node as the root of a tree.
- Parameters:
handle – tree builder
node_key – unique integer key to identify the root node
- Returns:
0 for success; -1 for failure
-
int TreeliteTreeBuilderSetNumericalTestNode(TreeBuilderHandle handle, int node_key, unsigned feature_id, const char *opname, ValueHandle threshold, int default_left, int left_child_key, int right_child_key)
Turn an empty node into a test node with numerical split. The test is in the form [feature value] OP [threshold]. Depending on the result of the test, either left or right child would be taken.
- Parameters:
handle – tree builder
node_key – unique integer key to identify the node being modified; this node needs to be empty
feature_id – id of feature
opname – binary operator to use in the test
threshold – threshold value
default_left – default direction for missing values
left_child_key – unique integer key to identify the left child node
right_child_key – unique integer key to identify the right child node
- Returns:
0 for success; -1 for failure
-
int TreeliteTreeBuilderSetCategoricalTestNode(TreeBuilderHandle handle, int node_key, unsigned feature_id, const unsigned int *left_categories, size_t left_categories_len, int default_left, int left_child_key, int right_child_key)
Turn an empty node into a test node with categorical split. A list defines all categories that would be classified as the left side. Categories are integers ranging from 0 to (n-1), where n is the number of categories in that particular feature. Let’s assume n <= 64.
- Parameters:
handle – tree builder
node_key – unique integer key to identify the node being modified; this node needs to be empty
feature_id – id of feature
left_categories – list of categories belonging to the left child
left_categories_len – length of left_cateogries
default_left – default direction for missing values
left_child_key – unique integer key to identify the left child node
right_child_key – unique integer key to identify the right child node
- Returns:
0 for success; -1 for failure
-
int TreeliteTreeBuilderSetLeafNode(TreeBuilderHandle handle, int node_key, ValueHandle leaf_value)
Turn an empty node into a leaf node.
- Parameters:
handle – tree builder
node_key – unique integer key to identify the node being modified; this node needs to be empty
leaf_value – leaf value (weight) of the leaf node
- Returns:
0 for success; -1 for failure
-
int TreeliteTreeBuilderSetLeafVectorNode(TreeBuilderHandle handle, int node_key, const ValueHandle *leaf_vector, size_t leaf_vector_len)
Turn an empty node into a leaf vector node The leaf vector (collection of multiple leaf weights per leaf node) is useful for multi-class random forest classifier.
- Parameters:
handle – tree builder
node_key – unique integer key to identify the node being modified; this node needs to be empty
leaf_vector – leaf vector of the leaf node
leaf_vector_len – length of leaf_vector
- Returns:
0 for success; -1 for failure
-
int TreeliteCreateModelBuilder(int num_feature, int num_class, int average_tree_output, const char *threshold_type, const char *leaf_output_type, ModelBuilderHandle *out)
Create a new model builder.
- Parameters:
num_feature – number of features used in model being built. We assume that all feature indices are between 0 and (num_feature - 1).
num_class – number of output groups. Set to 1 for binary classification and regression; >1 for multiclass classification
average_tree_output – whether the outputs from the trees should be averaged (!=0 yes, =0 no)
threshold_type – Type of thresholds in numerical splits. All thresholds in a given model must have the same type.
leaf_output_type – Type of leaf outputs. All leaf outputs in a given model must have the same type.
out – newly created model builder
- Returns:
0 for success; -1 for failure
-
int TreeliteModelBuilderSetModelParam(ModelBuilderHandle handle, const char *name, const char *value)
Set a model parameter.
- Parameters:
handle – model builder
name – name of parameter
value – value of parameter
- Returns:
0 for success; -1 for failure
-
int TreeliteDeleteModelBuilder(ModelBuilderHandle handle)
Delete a model builder from memory.
- Parameters:
handle – model builder to remove
- Returns:
0 for success; -1 for failure
-
int TreeliteModelBuilderInsertTree(ModelBuilderHandle handle, TreeBuilderHandle tree_builder, int index)
Insert a tree at specified location.
- Parameters:
handle – model builder
tree_builder – builder for the tree to be inserted. The tree must not be part of any other existing tree ensemble. Note: The tree_builder argument will become unusuable after the tree insertion. Should you want to modify the tree afterwards, use GetTree(*) method to get a fresh handle to the tree.
index – index of the element before which to insert the tree; use -1 to insert at the end
- Returns:
index of the new tree within the ensemble; -1 for failure
-
int TreeliteModelBuilderGetTree(ModelBuilderHandle handle, int index, TreeBuilderHandle *out)
Get a reference to a tree in the ensemble.
- Parameters:
handle – model builder
index – index of the tree in the ensemble
out – used to save reference to the tree
- Returns:
0 for success; -1 for failure
-
int TreeliteModelBuilderDeleteTree(ModelBuilderHandle handle, int index)
Remove a tree from the ensemble.
- Parameters:
handle – model builder
index – index of the tree that would be removed
- Returns:
0 for success; -1 for failure
-
int TreeliteModelBuilderCommitModel(ModelBuilderHandle handle, ModelHandle *out)
finalize the model and produce the in-memory representation
- Parameters:
handle – model builder
out – used to save handle to in-memory representation of the finished model
- Returns:
0 for success; -1 for failure
Predictor interface
Use the following functions to load compiled prediction subroutines from shared libraries and to make predictions.
-
int TreelitePredictorLoad(const char *library_path, int num_worker_thread, PredictorHandle *out)
load prediction code into memory. This function assumes that the prediction code has been already compiled into a dynamic shared library object (.so/.dll/.dylib).
- Parameters:
library_path – path to library object file containing prediction code
num_worker_thread – number of worker threads (-1 to use max number)
out – handle to predictor
- Returns:
0 for success, -1 for failure
-
int TreelitePredictorPredictBatch(PredictorHandle handle, DMatrixHandle batch, int verbose, int pred_margin, PredictorOutputHandle out_result, size_t *out_result_size)
Make predictions on a batch of data rows (synchronously). This function internally divides the workload among all worker threads.
Note. This function does not allocate the result vector. Use TreeliteCreatePredictorOutputVector() convenience function to allocate the vector of the right length and type.
Note. To access the element values from the output vector, you should cast the opaque handle (PredictorOutputHandle type) to an appropriate pointer LeafOutputType*, where the type is either float, double, or uint32_t. So carry out the following steps:
Call TreelitePredictorQueryLeafOutputType() to obtain the type of the leaf output. It will return a string (“float32”, “float64”, or “uint32”) representing the type.
Depending on the type string, cast the output handle to float*, double*, or uint32_t*.
Now access the array with the casted pointer. The array’s length is given by TreelitePredictorQueryResultSize().
- Parameters:
handle – predictor
batch – the data matrix containing a batch of rows
verbose – whether to produce extra messages
pred_margin – whether to produce raw margin scores instead of transformed probabilities
out_result – Resulting output vector. This pointer must point to an array of length TreelitePredictorQueryResultSize() and of type TreelitePredictorQueryLeafOutputType().
out_result_size – used to save length of the output vector, which is guaranteed to be less than or equal to TreelitePredictorQueryResultSize()
- Returns:
0 for success, -1 for failure
-
int TreeliteCreatePredictorOutputVector(PredictorHandle handle, DMatrixHandle batch, PredictorOutputHandle *out_output_vector)
Convenience function to allocate an output vector that is able to hold the prediction result for a given data matrix. The vector’s length will be identical to TreelitePredictorQueryResultSize() and its type will be identical to TreelitePredictorQueryLeafOutputType(). To prevent memory leak, make sure to de-allocate the vector with TreeliteDeletePredictorOutputVector().
Note. To access the element values from the output vector, you should cast the opaque handle (PredictorOutputHandle type) to an appropriate pointer LeafOutputType*, where the type is either float, double, or uint32_t. So carry out the following steps:
Call TreelitePredictorQueryLeafOutputType() to obtain the type of the leaf output. It will return a string (“float32”, “float64”, or “uint32”) representing the type.
Depending on the type string, cast the output handle to float*, double*, or uint32_t*.
Now access the array with the casted pointer. The array’s length is given by TreelitePredictorQueryResultSize().
- Parameters:
handle – predictor
batch – the data matrix containing a batch of rows
out_output_vector – Handle to the newly allocated output vector.
- Returns:
0 for success, -1 for failure
-
int TreeliteDeletePredictorOutputVector(PredictorHandle handle, PredictorOutputHandle output_vector)
De-allocate an output vector.
- Parameters:
handle – predictor
output_vector – Output vector to delete from memory
- Returns:
0 for success, -1 for failure
-
int TreelitePredictorQueryResultSize(PredictorHandle handle, DMatrixHandle batch, size_t *out)
Given a batch of data rows, query the necessary size of array to hold predictions for all data points.
- Parameters:
handle – predictor
batch – the data matrix containing a batch of rows
out – used to store the length of prediction array
- Returns:
0 for success, -1 for failure
-
int TreelitePredictorQueryNumClass(PredictorHandle handle, size_t *out)
Get the number classes in the loaded model The number is 1 for most tasks; it is greater than 1 for multiclass classification.
- Parameters:
handle – predictor
out – length of prediction array
- Returns:
0 for success, -1 for failure
-
int TreelitePredictorQueryNumFeature(PredictorHandle handle, size_t *out)
Get the width (number of features) of each instance used to train the loaded model.
- Parameters:
handle – predictor
out – number of features
- Returns:
0 for success, -1 for failure
-
int TreelitePredictorQueryPredTransform(PredictorHandle handle, const char **out)
Get name of post prediction transformation used to train the loaded model.
- Parameters:
handle – predictor
out – name of post prediction transformation
- Returns:
0 for success, -1 for failure
-
int TreelitePredictorQuerySigmoidAlpha(PredictorHandle handle, float *out)
Get alpha value of sigmoid transformation used to train the loaded model.
- Parameters:
handle – predictor
out – alpha value of sigmoid transformation
- Returns:
0 for success, -1 for failure
-
int TreelitePredictorQueryRatioC(PredictorHandle handle, float *out)
Get c value of exponential standard ratio transformation used to train the loaded model.
- Parameters:
handle – predictor
out – C value of transformation
- Returns:
0 for success, -1 for failure
-
int TreelitePredictorQueryGlobalBias(PredictorHandle handle, float *out)
Get global bias which adjusting predicted margin scores.
- Parameters:
handle – predictor
out – global bias value
- Returns:
0 for success, -1 for failure
-
int TreelitePredictorQueryThresholdType(PredictorHandle handle, const char **out)
-
int TreelitePredictorQueryLeafOutputType(PredictorHandle handle, const char **out)
-
int TreelitePredictorFree(PredictorHandle handle)
delete predictor from memory
- Parameters:
handle – predictor to remove
- Returns:
0 for success, -1 for failure
General Tree Inference Library (GTIL)
-
int TreeliteGTILParseConfig(const char *config_json, GTILConfigHandle *out)
Load a configuration for GTIL predictor from a JSON string.
- Parameters:
config_json – a JSON string with the following fields:
”nthread” (optional): Number of threads used for initializing DMatrix. Set <= 0 to use all CPU cores.
”predict_type” (required): Must be one of the following.
”default”: Sum over trees and apply post-processing
”raw”: Sum over trees, but don’t apply post-processing; get raw margin scores instead.
”leaf_id”: Output one (integer) leaf ID per tree.
”score_per_tree”: Output one or more margin scores per tree.
out – Parsed configuration
- Returns:
0 for success; -1 for failure
-
int TreeliteGTILDeleteConfig(GTILConfigHandle handle)
Delete a GTIL configuration from memory.
- Parameters:
handle – Handle to the GTIL configuration to be deleted
- Returns:
0 for success; -1 for failure
-
int TreeliteGTILGetPredictOutputSize(ModelHandle model, size_t num_row, size_t *out)
Deprecated. Please use TreeliteGTILGetPredictOutputSizeEx instead.
-
int TreeliteGTILGetPredictOutputSizeEx(ModelHandle model, size_t num_row, GTILConfigHandle config, size_t *out)
Given a batch of data rows, query the necessary size of array to hold predictions for all data points.
- Parameters:
model – Treelite Model object
num_row – Number of rows in the input
config – Configuration of GTIL predictor. Set this by calling TreeliteGTILParseConfig.
out – Size of output buffer that should be allocated
- Returns:
0 for success; -1 for failure
-
int TreeliteGTILPredict(ModelHandle model, const float *input, size_t num_row, float *output, int nthread, int pred_transform, size_t *out_result_size)
Deprecated. Please use TreeliteGTILPredictEx instead.
-
int TreeliteGTILPredictEx(ModelHandle model, const float *input, size_t num_row, float *output, GTILConfigHandle config, size_t *out_result_size, size_t *out_result_ndim, size_t **out_result_shape)
Predict with a 2D dense array.
- Parameters:
model – Treelite Model object
input – The 2D data array, laid out in row-major layout
num_row – Number of rows in the data matrix.
output – Pointer to buffer to store the output. Call TreeliteGTILGetPredictOutputSizeEx to get the amount of buffer you should allocate for this parameter.
config – Configuration of GTIL predictor. Set this by calling TreeliteGTILParseConfig.
out_result_size – Size of output. This could be smaller than TreeliteGTILGetPredictOutputSizeEx but could never be larger than TreeliteGTILGetPredictOutputSizeEx.
out_result_ndim – Number of dimensions in the output array.
out_result_shape – Pointer to an array containing dimensions of the prediction output. This array shall have [out_result_ndim] elements. The product of the elements shall be equal to out_result_size.
- Returns:
0 for success; -1 for failure
Handle types
Treelite uses C++ classes to define its internal data structures. In order to
pass C++ objects to C functions, opaque handles are used. Opaque handles
are void*
pointers that store raw memory addresses.
-
typedef void *ModelHandle
handle to a decision tree ensemble model
-
typedef void *TreeBuilderHandle
handle to tree builder class
-
typedef void *ModelBuilderHandle
handle to ensemble builder class
-
typedef void *AnnotationHandle
handle to branch annotation data
-
typedef void *CompilerHandle
handle to compiler class
-
typedef void *ValueHandle
handle to a polymorphic value type, used in the model builder API
-
typedef void *GTILConfigHandle
handle to a configuration of GTIL predictor
-
typedef void *PredictorHandle
handle to predictor class
-
typedef void *PredictorOutputHandle
handle to output from predictor