Treelite C API

Treelite exposes a set of C functions to enable interfacing with a variety of languages. This page will be most useful for:

  • those writing a new language binding (glue code).

  • those wanting to incorporate functions of Treelite into their own native libraries.

We recommend the Python API for everyday uses.

Note

Use of C and C++ in Treelite

Core logic of Treelite are written in C++ to take advantage of higher abstractions. We provide C only interface here, as many more programming languages bind with C than with C++. See this page for more details.

Data matrix interface

Use the following functions to load and manipulate data from a variety of sources.

int TreeliteDMatrixCreateFromCSR(const void *data, const char *data_type, const uint32_t *col_ind, const size_t *row_ptr, size_t num_row, size_t num_col, DMatrixHandle *out)

create DMatrix from a (in-memory) CSR matrix

Parameters:
  • data – feature values

  • data_type – Type of data elements

  • col_ind – feature indices

  • row_ptr – pointer to row headers

  • num_row – number of rows

  • num_col – number of columns

  • out – the created DMatrix

Returns:

0 for success, -1 for failure

int TreeliteDMatrixCreateFromMat(const void *data, const char *data_type, size_t num_row, size_t num_col, const void *missing_value, DMatrixHandle *out)

create DMatrix from a (in-memory) dense matrix

Parameters:
  • data – feature values

  • data_type – Type of data elements

  • num_row – number of rows

  • num_col – number of columns

  • missing_value – value to represent missing value

  • out – the created DMatrix

Returns:

0 for success, -1 for failure

int TreeliteDMatrixGetDimension(DMatrixHandle handle, size_t *out_num_row, size_t *out_num_col, size_t *out_nelem)

get dimensions of a DMatrix

Parameters:
  • handle – handle to DMatrix

  • out_num_row – used to set number of rows

  • out_num_col – used to set number of columns

  • out_nelem – used to set number of nonzero entries

Returns:

0 for success, -1 for failure

int TreeliteDMatrixFree(DMatrixHandle handle)

delete DMatrix from memory

Parameters:

handle – handle to DMatrix

Returns:

0 for success, -1 for failure

Branch annotator interface

Use the following functions to annotate branches in decision trees.

int TreeliteAnnotateBranch(ModelHandle model, DMatrixHandle dmat, int nthread, int verbose, AnnotationHandle *out)

annotate branches in a given model using frequency patterns in the training data.

Parameters:
  • model – model to annotate

  • dmat – training data matrix

  • nthread – number of threads to use

  • verbose – whether to produce extra messages

  • out – used to save handle for the created annotation

Returns:

0 for success, -1 for failure

int TreeliteAnnotationSave(AnnotationHandle handle, const char *path)

save branch annotation to a JSON file

Parameters:
  • handle – annotation to save

  • path – path to JSON file

Returns:

0 for success, -1 for failure

int TreeliteAnnotationFree(AnnotationHandle handle)

delete branch annotation from memory

Parameters:

handle – annotation to remove

Returns:

0 for success, -1 for failure

Compiler interface

Use the following functions to produce optimize prediction subroutine (in C) from a given decision tree ensemble.

int TreeliteCompilerCreateV2(const char *name, const char *params_json_str, CompilerHandle *out)

Create a compiler with a given name.

Parameters:
  • name – name of compiler

  • params_json_str – JSON string representing the parameters for the compiler

  • out – created compiler

Returns:

0 for success, -1 for failure

int TreeliteCompilerGenerateCodeV2(CompilerHandle compiler, ModelHandle model, const char *dirpath)

Generate prediction code from a tree ensemble model. The code will be C99 compliant. One header file (.h) will be generated, along with one or more source files (.c).

Usage example:

TreeliteCompilerGenerateCodeV2(compiler, model, "./my/model");
// files to generate: ./my/model/header.h, ./my/model/main.c
// if parallel compilation is enabled:
// ./my/model/header.h, ./my/model/main.c, ./my/model/tu0.c,
// ./my/model/tu1.c, and so forth

Parameters:
  • compiler – handle for compiler

  • model – handle for tree ensemble model

  • dirpath – directory to store header and source files

Returns:

0 for success, -1 for failure

int TreeliteCompilerFree(CompilerHandle handle)

delete compiler from memory

Parameters:

handle – compiler to remove

Returns:

0 for success, -1 for failure

Model loader interface

Use the following functions to load decision tree ensemble models from a file. Treelite supports multiple model file formats.

int TreeliteLoadLightGBMModel(const char *filename, ModelHandle *out)

Deprecated. Please use TreeliteLoadLightGBMModelEx instead.

int TreeliteLoadLightGBMModelEx(const char *filename, const char *config_json, ModelHandle *out)

Load a model file generated by LightGBM (Microsoft/LightGBM). The model file must contain a decision tree ensemble.

Parameters:
  • filename – name of model file

  • config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadXGBoostModel(const char *filename, ModelHandle *out)

Deprecated. Please use TreeliteLoadXGBoostModelEx instead.

int TreeliteLoadXGBoostModelEx(const char *filename, const char *config_json, ModelHandle *out)

Load a model file generated by XGBoost (dmlc/xgboost). The model file must contain a decision tree ensemble.

Parameters:
  • filename – name of model file

  • config_json – JSON string consisting key-value pairs; used for configuring the model parser

  • out – loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadXGBoostJSON(const char *filename, ModelHandle *out)

Deprecated. Please use TreeliteLoadXGBoostJSONEx instead.

int TreeliteLoadXGBoostJSONEx(const char *filename, const char *config_json, ModelHandle *out)

Load a json model file generated by XGBoost (dmlc/xgboost). The model file must contain a decision tree ensemble.

Parameters:
  • filename – name of model file

  • config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadXGBoostJSONString(const char *json_str, size_t length, ModelHandle *out)

Deprecated. Please use TreeliteLoadXGBoostJSONStringEx instead.

int TreeliteLoadXGBoostJSONStringEx(const char *json_str, size_t length, const char *config_json, ModelHandle *out)

Load a model stored as JSON string by XGBoost (dmlc/xgboost). The model json must contain a decision tree ensemble.

Parameters:
  • json_str – the string containing the JSON model specification

  • length – the length of the JSON string

  • config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadXGBoostModelFromMemoryBuffer(const void *buf, size_t len, ModelHandle *out)

Deprecated. Please use TreeliteLoadXGBoostModelFromMemoryBufferEx instead.

int TreeliteLoadXGBoostModelFromMemoryBufferEx(const void *buf, size_t len, const char *config_json, ModelHandle *out)

Load an XGBoost model from a memory buffer.

Parameters:
  • buf – memory buffer

  • len – size of memory buffer

  • config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadLightGBMModelFromString(const char *model_str, ModelHandle *out)

Deprecated. Please use TreeliteLoadLightGBMModelFromStringEx instead.

int TreeliteLoadLightGBMModelFromStringEx(const char *model_str, const char *config_json, ModelHandle *out)

Load a LightGBM model from a string. The string should be created with the model_to_string() method in LightGBM.

Parameters:
  • model_str – the model string

  • config_json – null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – loaded model

Returns:

0 for success, -1 for failure

int TreeliteBuildModelFromJSONString(const char *json_str, const char *config_json, ModelHandle *out)

Construct a new Treelite model from a JSON string.

Parameters:
  • json_str – JSON string

  • config_json – Configuration to use when parsing the JSON string. Configuration should be a JSON object consisting of key-value pairs.

  • out – Constructed model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnRandomForestRegressor(int n_estimators, int n_features, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, ModelHandle *out)

Load a scikit-learn random forest regressor model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail. Note that this function can also be used to load an ensemble of extremely randomized trees (sklearn.ensemble.ExtraTreesRegressor).

Parameters:
  • n_estimators – number of trees in the random forest

  • n_features – number of features in the training data

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.

  • out – pointer to store the loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnIsolationForest(int n_estimators, int n_features, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, const double ratio_c, ModelHandle *out)

Load a scikit-learn isolation forest model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail.

Parameters:
  • n_estimators – number of trees in the random forest

  • n_features – number of features in the training data

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the expected isolation depth of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – not used, but must be passed as array of arrays for each tree and node.

  • ratio_c – standardizing constant to use for calculation of the anomaly score.

  • out – pointer to store the loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnRandomForestClassifier(int n_estimators, int n_features, int n_classes, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, ModelHandle *out)

Load a scikit-learn random forest classifier model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail. Note that this function can also be used to load an ensemble of extremely randomized trees (sklearn.ensemble.ExtraTreesClassifier).

Parameters:
  • n_estimators – number of trees in the random forest

  • n_features – number of features in the training data

  • n_classes – number of classes in the target variable

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.

  • out – pointer to store the loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnGradientBoostingRegressor(int n_iter, int n_features, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, ModelHandle *out)

Load a scikit-learn gradient boosting regressor model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail.

Parameters:
  • n_iter – Number of boosting iterations

  • n_features – number of features in the training data

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.

  • out – pointer to store the loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnGradientBoostingClassifier(int n_iter, int n_features, int n_classes, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const double **value, const int64_t **n_node_samples, const double **weighted_n_node_samples, const double **impurity, ModelHandle *out)

Load a scikit-learn gradient boosting classifier model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail.

Parameters:
  • n_iter – Number of boosting iterations

  • n_features – number of features in the training data

  • n_classes – number of classes in the target variable

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.

  • out – pointer to store the loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnHistGradientBoostingRegressor(int n_iter, int n_features, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const int8_t **default_left, const double **value, const int64_t **n_node_samples, const double **gain, const double *baseline_prediction, ModelHandle *out)

Load a scikit-learn HistGradientBoostingRegressor model from a collection of arrays. Unlike other scikit-learn models, this model class natively handles missing values like XGBooost does.

Parameters:
  • n_iter – Number of boosting iterations

  • n_features – Number of features in the training data

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • default_left – default_left[i][k] indicates how the missing value should be handled at node k of the i-th tree. This flag is defined if node k is an internal (non-leaf) node. If True, the missing value will be associated with the left child; if False, the missing value will be associated with the right child.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • gain – gain[i][k] stores the gain (reduction of the loss function) associate with node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • baseline_prediction – Baseline predictions for outputs. At prediction, margin scores will be adjusted by this amount before applying the post-processing (link) function. Required shape: (1,)

  • out – pointer to store the loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnHistGradientBoostingClassifier(int n_iter, int n_features, int n_classes, const int64_t *node_count, const int64_t **children_left, const int64_t **children_right, const int64_t **feature, const double **threshold, const int8_t **default_left, const double **value, const int64_t **n_node_samples, const double **gain, const double *baseline_prediction, ModelHandle *out)

Load a scikit-learn HistGradientBoostingClassifier model from a collection of arrays. Unlike other scikit-learn models, this model class natively handles missing values like XGBooost does.

Parameters:
  • n_iter – Number of boosting iterations

  • n_features – Number of features in the training data

  • n_classes – Number of classes in the target variable

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • default_left – default_left[i][k] indicates how the missing value should be handled at node k of the i-th tree. This flag is defined if node k is an internal (non-leaf) node. If True, the missing value will be associated with the left child; if False, the missing value will be associated with the right child.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • gain – gain[i][k] stores the gain (reduction of the loss function) associate with node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • baseline_prediction – Baseline predictions for outputs. At prediction, margin scores will be adjusted by this amount before applying the post-processing (link) function. Required shape: (1,) for binary classification; (n_classes,) for multi-class classification

  • out – pointer to store the loaded model

Returns:

0 for success, -1 for failure

int TreeliteQueryNumTree(ModelHandle handle, size_t *out)

Query the number of trees in the model.

Parameters:
  • handle – model to query

  • out – number of trees

Returns:

0 for success, -1 for failure

int TreeliteQueryNumFeature(ModelHandle handle, size_t *out)

Query the number of features used in the model.

Parameters:
  • handle – model to query

  • out – number of features

Returns:

0 for success, -1 for failure

int TreeliteQueryNumClass(ModelHandle handle, size_t *out)

Query the number of classes of the model. (1 if the model is binary classifier or regressor)

Parameters:
  • handle – model to query

  • out – number of output groups

Returns:

0 for success, -1 for failure

int TreeliteSetTreeLimit(ModelHandle handle, size_t limit)

keep first N trees of model, limit must smaller than number of trees.

Parameters:
  • handle – model

  • limit – number of trees to keep

Returns:

0 for success, -1 for failure

int TreeliteSerializeModel(const char *filename, ModelHandle handle)

Deprecated. Please use TreeliteSerializeModelToFile instead.

int TreeliteDeserializeModel(const char *filename, ModelHandle *out)

Deprecated. Please use TreeliteDeserializeModelFromFile instead.

int TreeliteSerializeModelToFile(ModelHandle handle, const char *filename)

Serialize (persist) a model object to disk.

Parameters:
  • handle – Handle to the model object

  • filename – Name of the file to which to serialize the model. The file will be using a binary format that’s optimized to store the Treelite model object efficiently.

Returns:

0 for success, -1 for failure

int TreeliteDeserializeModelFromFile(const char *filename, ModelHandle *out)

Deserialize (load) a model object from disk.

Parameters:
  • filename – Name of the file from which to deserialize the model. The file should be created by a call to TreeliteSerializeModelToFile.

  • out – Handle to the model object

Returns:

0 for success, -1 for failure

int TreeliteSerializeModelToBytes(ModelHandle handle, const char **out_bytes, size_t *out_bytes_len)

Serialize (persist) a model object to a byte sequence.

Parameters:
  • handle – Handle to the model object

  • out_bytes – Byte sequence containing serialized model

  • out_bytes_len – Length of out_bytes

Returns:

0 for success, -1 for failure

int TreeliteDeserializeModelFromBytes(const char *bytes, size_t bytes_len, ModelHandle *out)

Deserialize (load) a model object from a byte sequence.

Parameters:
  • bytes – Byte sequence containing serialized model. The string should be created by a call to TreeliteSerializeModelToBytes.

  • bytes_len – Length of bytes

  • out – Handle to the model object

Returns:

0 for success, -1 for failure

int TreeliteConcatenateModelObjects(const ModelHandle *objs, size_t len, ModelHandle *out)

Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.

Parameters:
  • objs – Pointer to the beginning of the list of model objects

  • len – Number of model objects

  • out – Used to save the concatenated model

int TreeliteDumpAsJSON(ModelHandle handle, int pretty_print, const char **out_json_str)

Dump a model object as a JSON string.

Parameters:
  • handle – The handle to the model object

  • pretty_print – Whether to pretty-print JSON string (0 for false, != 0 for true)

  • out_json_str – The JSON string

Returns:

0 for success, -1 for failure

int TreeliteFreeModel(ModelHandle handle)

delete model from memory

Parameters:

handle – model to remove

Returns:

0 for success, -1 for failure

Model builder interface

Use the following functions to incrementally build decisio n tree ensemble models.

int TreeliteTreeBuilderCreateValue(const void *init_value, const char *type, ValueHandle *out)

Create a new Value object. Some model builder API functions accept this Value type to accommodate values of multiple types.

Parameters:
  • init_value – pointer to the value to be stored

  • type – Type of the value to be stored

  • out – newly created Value object

Returns:

0 for success; -1 for failure

int TreeliteTreeBuilderDeleteValue(ValueHandle handle)

Delete a Value object from memory.

Parameters:

handle – pointer to the Value object to be deleted

Returns:

0 for success; -1 for failure

int TreeliteCreateTreeBuilder(const char *threshold_type, const char *leaf_output_type, TreeBuilderHandle *out)

Create a new tree builder.

Parameters:
  • threshold_type – Type of thresholds in numerical splits. All thresholds in a given model must have the same type.

  • leaf_output_type – Type of leaf outputs. All leaf outputs in a given model must have the same type.

  • out – newly created tree builder

Returns:

0 for success; -1 for failure

int TreeliteDeleteTreeBuilder(TreeBuilderHandle handle)

Delete a tree builder from memory.

Parameters:

handle – tree builder to remove

Returns:

0 for success; -1 for failure

int TreeliteTreeBuilderCreateNode(TreeBuilderHandle handle, int node_key)

Create an empty node within a tree.

Parameters:
  • handle – tree builder

  • node_key – unique integer key to identify the new node

Returns:

0 for success; -1 for failure

int TreeliteTreeBuilderDeleteNode(TreeBuilderHandle handle, int node_key)

Remove a node from a tree.

Parameters:
  • handle – tree builder

  • node_key – unique integer key to identify the node to be removed

Returns:

0 for success; -1 for failure

int TreeliteTreeBuilderSetRootNode(TreeBuilderHandle handle, int node_key)

Set a node as the root of a tree.

Parameters:
  • handle – tree builder

  • node_key – unique integer key to identify the root node

Returns:

0 for success; -1 for failure

int TreeliteTreeBuilderSetNumericalTestNode(TreeBuilderHandle handle, int node_key, unsigned feature_id, const char *opname, ValueHandle threshold, int default_left, int left_child_key, int right_child_key)

Turn an empty node into a test node with numerical split. The test is in the form [feature value] OP [threshold]. Depending on the result of the test, either left or right child would be taken.

Parameters:
  • handle – tree builder

  • node_key – unique integer key to identify the node being modified; this node needs to be empty

  • feature_id – id of feature

  • opname – binary operator to use in the test

  • threshold – threshold value

  • default_left – default direction for missing values

  • left_child_key – unique integer key to identify the left child node

  • right_child_key – unique integer key to identify the right child node

Returns:

0 for success; -1 for failure

int TreeliteTreeBuilderSetCategoricalTestNode(TreeBuilderHandle handle, int node_key, unsigned feature_id, const unsigned int *left_categories, size_t left_categories_len, int default_left, int left_child_key, int right_child_key)

Turn an empty node into a test node with categorical split. A list defines all categories that would be classified as the left side. Categories are integers ranging from 0 to (n-1), where n is the number of categories in that particular feature. Let’s assume n <= 64.

Parameters:
  • handle – tree builder

  • node_key – unique integer key to identify the node being modified; this node needs to be empty

  • feature_id – id of feature

  • left_categories – list of categories belonging to the left child

  • left_categories_len – length of left_cateogries

  • default_left – default direction for missing values

  • left_child_key – unique integer key to identify the left child node

  • right_child_key – unique integer key to identify the right child node

Returns:

0 for success; -1 for failure

int TreeliteTreeBuilderSetLeafNode(TreeBuilderHandle handle, int node_key, ValueHandle leaf_value)

Turn an empty node into a leaf node.

Parameters:
  • handle – tree builder

  • node_key – unique integer key to identify the node being modified; this node needs to be empty

  • leaf_value – leaf value (weight) of the leaf node

Returns:

0 for success; -1 for failure

int TreeliteTreeBuilderSetLeafVectorNode(TreeBuilderHandle handle, int node_key, const ValueHandle *leaf_vector, size_t leaf_vector_len)

Turn an empty node into a leaf vector node The leaf vector (collection of multiple leaf weights per leaf node) is useful for multi-class random forest classifier.

Parameters:
  • handle – tree builder

  • node_key – unique integer key to identify the node being modified; this node needs to be empty

  • leaf_vector – leaf vector of the leaf node

  • leaf_vector_len – length of leaf_vector

Returns:

0 for success; -1 for failure

int TreeliteCreateModelBuilder(int num_feature, int num_class, int average_tree_output, const char *threshold_type, const char *leaf_output_type, ModelBuilderHandle *out)

Create a new model builder.

Parameters:
  • num_feature – number of features used in model being built. We assume that all feature indices are between 0 and (num_feature - 1).

  • num_class – number of output groups. Set to 1 for binary classification and regression; >1 for multiclass classification

  • average_tree_output – whether the outputs from the trees should be averaged (!=0 yes, =0 no)

  • threshold_type – Type of thresholds in numerical splits. All thresholds in a given model must have the same type.

  • leaf_output_type – Type of leaf outputs. All leaf outputs in a given model must have the same type.

  • out – newly created model builder

Returns:

0 for success; -1 for failure

int TreeliteModelBuilderSetModelParam(ModelBuilderHandle handle, const char *name, const char *value)

Set a model parameter.

Parameters:
  • handle – model builder

  • name – name of parameter

  • value – value of parameter

Returns:

0 for success; -1 for failure

int TreeliteDeleteModelBuilder(ModelBuilderHandle handle)

Delete a model builder from memory.

Parameters:

handle – model builder to remove

Returns:

0 for success; -1 for failure

int TreeliteModelBuilderInsertTree(ModelBuilderHandle handle, TreeBuilderHandle tree_builder, int index)

Insert a tree at specified location.

Parameters:
  • handle – model builder

  • tree_builder – builder for the tree to be inserted. The tree must not be part of any other existing tree ensemble. Note: The tree_builder argument will become unusuable after the tree insertion. Should you want to modify the tree afterwards, use GetTree(*) method to get a fresh handle to the tree.

  • index – index of the element before which to insert the tree; use -1 to insert at the end

Returns:

index of the new tree within the ensemble; -1 for failure

int TreeliteModelBuilderGetTree(ModelBuilderHandle handle, int index, TreeBuilderHandle *out)

Get a reference to a tree in the ensemble.

Parameters:
  • handle – model builder

  • index – index of the tree in the ensemble

  • out – used to save reference to the tree

Returns:

0 for success; -1 for failure

int TreeliteModelBuilderDeleteTree(ModelBuilderHandle handle, int index)

Remove a tree from the ensemble.

Parameters:
  • handle – model builder

  • index – index of the tree that would be removed

Returns:

0 for success; -1 for failure

int TreeliteModelBuilderCommitModel(ModelBuilderHandle handle, ModelHandle *out)

finalize the model and produce the in-memory representation

Parameters:
  • handle – model builder

  • out – used to save handle to in-memory representation of the finished model

Returns:

0 for success; -1 for failure

Predictor interface

Use the following functions to load compiled prediction subroutines from shared libraries and to make predictions.

int TreelitePredictorLoad(const char *library_path, int num_worker_thread, PredictorHandle *out)

load prediction code into memory. This function assumes that the prediction code has been already compiled into a dynamic shared library object (.so/.dll/.dylib).

Parameters:
  • library_path – path to library object file containing prediction code

  • num_worker_thread – number of worker threads (-1 to use max number)

  • out – handle to predictor

Returns:

0 for success, -1 for failure

int TreelitePredictorPredictBatch(PredictorHandle handle, DMatrixHandle batch, int verbose, int pred_margin, PredictorOutputHandle out_result, size_t *out_result_size)

Make predictions on a batch of data rows (synchronously). This function internally divides the workload among all worker threads.

Note. This function does not allocate the result vector. Use TreeliteCreatePredictorOutputVector() convenience function to allocate the vector of the right length and type.

Note. To access the element values from the output vector, you should cast the opaque handle (PredictorOutputHandle type) to an appropriate pointer LeafOutputType*, where the type is either float, double, or uint32_t. So carry out the following steps:

  1. Call TreelitePredictorQueryLeafOutputType() to obtain the type of the leaf output. It will return a string (“float32”, “float64”, or “uint32”) representing the type.

  2. Depending on the type string, cast the output handle to float*, double*, or uint32_t*.

  3. Now access the array with the casted pointer. The array’s length is given by TreelitePredictorQueryResultSize().

Parameters:
Returns:

0 for success, -1 for failure

int TreeliteCreatePredictorOutputVector(PredictorHandle handle, DMatrixHandle batch, PredictorOutputHandle *out_output_vector)

Convenience function to allocate an output vector that is able to hold the prediction result for a given data matrix. The vector’s length will be identical to TreelitePredictorQueryResultSize() and its type will be identical to TreelitePredictorQueryLeafOutputType(). To prevent memory leak, make sure to de-allocate the vector with TreeliteDeletePredictorOutputVector().

Note. To access the element values from the output vector, you should cast the opaque handle (PredictorOutputHandle type) to an appropriate pointer LeafOutputType*, where the type is either float, double, or uint32_t. So carry out the following steps:

  1. Call TreelitePredictorQueryLeafOutputType() to obtain the type of the leaf output. It will return a string (“float32”, “float64”, or “uint32”) representing the type.

  2. Depending on the type string, cast the output handle to float*, double*, or uint32_t*.

  3. Now access the array with the casted pointer. The array’s length is given by TreelitePredictorQueryResultSize().

Parameters:
  • handle – predictor

  • batch – the data matrix containing a batch of rows

  • out_output_vector – Handle to the newly allocated output vector.

Returns:

0 for success, -1 for failure

int TreeliteDeletePredictorOutputVector(PredictorHandle handle, PredictorOutputHandle output_vector)

De-allocate an output vector.

Parameters:
  • handle – predictor

  • output_vector – Output vector to delete from memory

Returns:

0 for success, -1 for failure

int TreelitePredictorQueryResultSize(PredictorHandle handle, DMatrixHandle batch, size_t *out)

Given a batch of data rows, query the necessary size of array to hold predictions for all data points.

Parameters:
  • handle – predictor

  • batch – the data matrix containing a batch of rows

  • out – used to store the length of prediction array

Returns:

0 for success, -1 for failure

int TreelitePredictorQueryNumClass(PredictorHandle handle, size_t *out)

Get the number classes in the loaded model The number is 1 for most tasks; it is greater than 1 for multiclass classification.

Parameters:
  • handle – predictor

  • out – length of prediction array

Returns:

0 for success, -1 for failure

int TreelitePredictorQueryNumFeature(PredictorHandle handle, size_t *out)

Get the width (number of features) of each instance used to train the loaded model.

Parameters:
  • handle – predictor

  • out – number of features

Returns:

0 for success, -1 for failure

int TreelitePredictorQueryPredTransform(PredictorHandle handle, const char **out)

Get name of post prediction transformation used to train the loaded model.

Parameters:
  • handle – predictor

  • out – name of post prediction transformation

Returns:

0 for success, -1 for failure

int TreelitePredictorQuerySigmoidAlpha(PredictorHandle handle, float *out)

Get alpha value of sigmoid transformation used to train the loaded model.

Parameters:
  • handle – predictor

  • out – alpha value of sigmoid transformation

Returns:

0 for success, -1 for failure

int TreelitePredictorQueryRatioC(PredictorHandle handle, float *out)

Get c value of exponential standard ratio transformation used to train the loaded model.

Parameters:
  • handle – predictor

  • out – C value of transformation

Returns:

0 for success, -1 for failure

int TreelitePredictorQueryGlobalBias(PredictorHandle handle, float *out)

Get global bias which adjusting predicted margin scores.

Parameters:
  • handle – predictor

  • out – global bias value

Returns:

0 for success, -1 for failure

int TreelitePredictorQueryThresholdType(PredictorHandle handle, const char **out)
int TreelitePredictorQueryLeafOutputType(PredictorHandle handle, const char **out)
int TreelitePredictorFree(PredictorHandle handle)

delete predictor from memory

Parameters:

handle – predictor to remove

Returns:

0 for success, -1 for failure

General Tree Inference Library (GTIL)

int TreeliteGTILParseConfig(const char *config_json, GTILConfigHandle *out)

Load a configuration for GTIL predictor from a JSON string.

Parameters:
  • config_json – a JSON string with the following fields:

    • ”nthread” (optional): Number of threads used for initializing DMatrix. Set <= 0 to use all CPU cores.

    • ”predict_type” (required): Must be one of the following.

      • ”default”: Sum over trees and apply post-processing

      • ”raw”: Sum over trees, but don’t apply post-processing; get raw margin scores instead.

      • ”leaf_id”: Output one (integer) leaf ID per tree.

      • ”score_per_tree”: Output one or more margin scores per tree.

  • out – Parsed configuration

Returns:

0 for success; -1 for failure

int TreeliteGTILDeleteConfig(GTILConfigHandle handle)

Delete a GTIL configuration from memory.

Parameters:

handle – Handle to the GTIL configuration to be deleted

Returns:

0 for success; -1 for failure

int TreeliteGTILGetPredictOutputSize(ModelHandle model, size_t num_row, size_t *out)

Deprecated. Please use TreeliteGTILGetPredictOutputSizeEx instead.

int TreeliteGTILGetPredictOutputSizeEx(ModelHandle model, size_t num_row, GTILConfigHandle config, size_t *out)

Given a batch of data rows, query the necessary size of array to hold predictions for all data points.

Parameters:
  • model – Treelite Model object

  • num_row – Number of rows in the input

  • config – Configuration of GTIL predictor. Set this by calling TreeliteGTILParseConfig.

  • out – Size of output buffer that should be allocated

Returns:

0 for success; -1 for failure

int TreeliteGTILPredict(ModelHandle model, const float *input, size_t num_row, float *output, int nthread, int pred_transform, size_t *out_result_size)

Deprecated. Please use TreeliteGTILPredictEx instead.

int TreeliteGTILPredictEx(ModelHandle model, const float *input, size_t num_row, float *output, GTILConfigHandle config, size_t *out_result_size, size_t *out_result_ndim, size_t **out_result_shape)

Predict with a 2D dense array.

Parameters:
  • model – Treelite Model object

  • input – The 2D data array, laid out in row-major layout

  • num_row – Number of rows in the data matrix.

  • output – Pointer to buffer to store the output. Call TreeliteGTILGetPredictOutputSizeEx to get the amount of buffer you should allocate for this parameter.

  • config – Configuration of GTIL predictor. Set this by calling TreeliteGTILParseConfig.

  • out_result_size – Size of output. This could be smaller than TreeliteGTILGetPredictOutputSizeEx but could never be larger than TreeliteGTILGetPredictOutputSizeEx.

  • out_result_ndim – Number of dimensions in the output array.

  • out_result_shape – Pointer to an array containing dimensions of the prediction output. This array shall have [out_result_ndim] elements. The product of the elements shall be equal to out_result_size.

Returns:

0 for success; -1 for failure

Handle types

Treelite uses C++ classes to define its internal data structures. In order to pass C++ objects to C functions, opaque handles are used. Opaque handles are void* pointers that store raw memory addresses.

typedef void *ModelHandle

handle to a decision tree ensemble model

typedef void *TreeBuilderHandle

handle to tree builder class

typedef void *ModelBuilderHandle

handle to ensemble builder class

typedef void *AnnotationHandle

handle to branch annotation data

typedef void *CompilerHandle

handle to compiler class

typedef void *ValueHandle

handle to a polymorphic value type, used in the model builder API

typedef void *GTILConfigHandle

handle to a configuration of GTIL predictor

typedef void *PredictorHandle

handle to predictor class

typedef void *PredictorOutputHandle

handle to output from predictor