Treelite C API

Treelite exposes a set of C functions to enable interfacing with a variety of languages. This page will be most useful for:

  • those writing a new language binding (glue code).

  • those wanting to incorporate functions of Treelite into their own native libraries.

We recommend the Python API for everyday uses.

Note

Use of C and C++ in Treelite

Core logic of Treelite are written in C++ to take advantage of higher abstractions. We provide C only interface here, as many more programming languages bind with C than with C++. See this page for more details.

Model loader interface

Use the following functions to load decision tree ensemble models from a file. Treelite supports multiple model file formats.

int TreeliteLoadXGBoostModelLegacyBinary(char const *filename, char const *config_json, TreeliteModelHandle *out)

Load a model file generated by XGBoost (dmlc/xgboost), stored in the legacy binary format.

Parameters:
  • filename – Name of model file

  • config_json – JSON string consisting key-value pairs; used for configuring the model parser

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadXGBoostModelLegacyBinaryFromMemoryBuffer(void const *buf, size_t len, char const *config_json, TreeliteModelHandle *out)

Load an XGBoost model from a memory buffer using the legacy binary format.

Parameters:
  • buf – Memory buffer

  • len – Size of memory buffer

  • config_json – Null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadXGBoostModel(char const *filename, char const *config_json, TreeliteModelHandle *out)

Load a model file generated by XGBoost (dmlc/xgboost), stored in the JSON format.

Parameters:
  • filename – Name of model file

  • config_json – Null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadXGBoostModelFromString(char const *json_str, size_t length, char const *config_json, TreeliteModelHandle *out)

Load an XGBoost model from a JSON string.

Parameters:
  • json_str – JSON string containing the XGBoost model

  • length – Length of the JSON string

  • config_json – Null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadLightGBMModel(char const *filename, char const *config_json, TreeliteModelHandle *out)

Load a model file generated by LightGBM (Microsoft/LightGBM). The model file must contain a decision tree ensemble.

Parameters:
  • filename – Name of model file

  • config_json – Null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadLightGBMModelFromString(char const *model_str, char const *config_json, TreeliteModelHandle *out)

Load a LightGBM model from a string. The string should be created with the model_to_string() method in LightGBM.

Parameters:
  • model_str – Model string

  • config_json – Null-terminated JSON string consisting key-value pairs; used for configuring the model parser

  • out – Loaded model

Returns:

0 for success, -1 for failure

Model loader interface for scikit-learn models

Use the following functions to load decision tree ensemble models from a scikit-learn model object.

int TreeliteLoadSKLearnRandomForestRegressor(int n_estimators, int n_features, int n_targets, int64_t const *node_count, int64_t const **children_left, int64_t const **children_right, int64_t const **feature, double const **threshold, double const **value, int64_t const **n_node_samples, double const **weighted_n_node_samples, double const **impurity, TreeliteModelHandle *out)

Load a scikit-learn RandomForestRegressor model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail. Note that this function can also be used to load an ensemble of extremely randomized trees (sklearn.ensemble.ExtraTreesRegressor).

Parameters:
  • n_estimators – Number of trees in the random forest

  • n_features – Number of features in the training data

  • n_targets – Number of targets (outputs)

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnIsolationForest(int n_estimators, int n_features, int64_t const *node_count, int64_t const **children_left, int64_t const **children_right, int64_t const **feature, double const **threshold, double const **value, int64_t const **n_node_samples, double const **weighted_n_node_samples, double const **impurity, double ratio_c, TreeliteModelHandle *out)

Load a scikit-learn IsolationForest model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail.

Parameters:
  • n_estimators – Number of trees in the isolation forest

  • n_features – Number of features in the training data

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the expected isolation depth of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – Not used, but must be passed as array of arrays for each tree and node.

  • ratio_c – Standardizing constant to use for calculation of the anomaly score.

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnRandomForestClassifier(int n_estimators, int n_features, int n_targets, int32_t const *n_classes, int64_t const *node_count, int64_t const **children_left, int64_t const **children_right, int64_t const **feature, double const **threshold, double const **value, int64_t const **n_node_samples, double const **weighted_n_node_samples, double const **impurity, TreeliteModelHandle *out)

Load a scikit-learn RandomForestClassifier model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail. Note that this function can also be used to load an ensemble of extremely randomized trees (sklearn.ensemble.ExtraTreesClassifier).

Parameters:
  • n_estimators – Number of trees in the random forest

  • n_features – Number of features in the training data

  • n_targets – Number of targets (outputs)

  • n_classes – n_classes[i] stores the number of classes in the i-th target

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnGradientBoostingRegressor(int n_iter, int n_features, int64_t const *node_count, int64_t const **children_left, int64_t const **children_right, int64_t const **feature, double const **threshold, double const **value, int64_t const **n_node_samples, double const **weighted_n_node_samples, double const **impurity, double const *base_scores, TreeliteModelHandle *out)

Load a scikit-learn GradientBoostingRegressor model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail. Note: GradientBoostingRegressor does not support multiple targets (outputs).

Parameters:
  • n_iter – Number of boosting iterations

  • n_features – Number of features in the training data

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.

  • base_scores – Baseline predictions for outputs. At prediction, margin scores will be adjusted by this amount before applying the post-processing (link) function. Required shape: (1,)

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnGradientBoostingClassifier(int n_iter, int n_features, int n_classes, int64_t const *node_count, int64_t const **children_left, int64_t const **children_right, int64_t const **feature, double const **threshold, double const **value, int64_t const **n_node_samples, double const **weighted_n_node_samples, double const **impurity, double const *base_scores, TreeliteModelHandle *out)

Load a scikit-learn GradientBoostingClassifier model from a collection of arrays. Refer to https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html to learn the meaning of the arrays in detail. Note: GradientBoostingClassifier does not support multiple targets (outputs).

Parameters:
  • n_iter – Number of boosting iterations

  • n_features – Number of features in the training data

  • n_classes – Number of classes in the target variable

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • children_left – children_left[i][k] stores the ID of the left child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • children_right – children_right[i][k] stores the ID of the right child node of node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • feature – feature[i][k] stores the ID of the feature used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • threshold – threshold[i][k] stores the threshold used in the binary tree split at node k of the i-th tree. This is only defined if node k is an internal (non-leaf) node.

  • value – value[i][k] stores the leaf output of node k of the i-th tree. This is only defined if node k is a leaf node.

  • n_node_samples – n_node_samples[i][k] stores the number of data samples associated with node k of the i-th tree.

  • weighted_n_node_samples – weighted_n_node_samples[i][k] stores the sum of weighted data samples associated with node k of the i-th tree.

  • impurity – impurity[i][k] stores the impurity measure (gini, entropy etc) associated with node k of the i-th tree.

  • base_scores – Baseline predictions for outputs. At prediction, margin scores will be adjusted by this amount before applying the post-processing (link) function. Required shape: (n_classes,)

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnHistGradientBoostingRegressor(int n_iter, int n_features, int64_t const *node_count, void const **nodes, int expected_sizeof_node_struct, uint32_t n_categorical_splits, uint32_t const **raw_left_cat_bitsets, uint32_t const *known_cat_bitsets, uint32_t const *known_cat_bitsets_offset_map, int32_t const *features_map, int64_t const **categories_map, double const *base_scores, TreeliteModelHandle *out)

Load a scikit-learn HistGradientBoostingRegressor model from a collection of arrays. Note: HistGradientBoostingRegressor does not support multiple targets (outputs).

Parameters:
  • n_iter – Number of boosting iterations

  • n_features – Number of features in the training data

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • nodes – nodes[i][k] stores the k-th node of the i-th tree.

  • expected_sizeof_node_struct – Expected size of Node struct, in bytes

  • n_categorical_splits – n_categorical_splits[i] stores the number of categorical splits in the i-th tree.

  • raw_left_cat_bitsets – raw_left_cat_bitsets[i][k] stores the bitmaps for node k of tree i. The bitmaps are used to represent categorical tests. Shape of raw_left_cat_bitsets[i]: (n_categorical_splits, 8)

  • known_cat_bitsets – Bitsets representing the list of known categories per categorical feature. Shape: (n_categorical_features, 8)

  • known_cat_bitsets_offset_map – Map from an original feature index to the corresponding index in the known_cat_bitsets array. Shape: (n_features,)

  • features_map – Mapping to re-order features. This is needed because HistGradientBoosting estimator internally re-orders features using ColumnTransformer so that the categorical features come before the numerical features.

  • categories_map – Mapping to transform categorical features. This is needed because HistGradientBoosting estimator embeds an OrdinalEncoder. categories_map[i] represents the mapping for i-th categorical feature.

  • base_scores – Baseline predictions for outputs. At prediction, margin scores will be adjusted by this amount before applying the post-processing (link) function. Required shape: (1,)

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteLoadSKLearnHistGradientBoostingClassifier(int n_iter, int n_features, int n_classes, int64_t const *node_count, void const **nodes, int expected_sizeof_node_struct, uint32_t n_categorical_splits, uint32_t const **raw_left_cat_bitsets, uint32_t const *known_cat_bitsets, uint32_t const *known_cat_bitsets_offset_map, int32_t const *features_map, int64_t const **categories_map, double const *base_scores, TreeliteModelHandle *out)

Load a scikit-learn HistGradientBoostingClassifier model from a collection of arrays. Note: HistGradientBoostingClassifier does not support multiple targets (outputs).

Parameters:
  • n_iter – Number of boosting iterations

  • n_features – Number of features in the training data

  • n_classes – Number of classes in the target variable

  • node_count – node_count[i] stores the number of nodes in the i-th tree

  • nodes – nodes[i][k] stores the k-th node of the i-th tree.

  • expected_sizeof_node_struct – Expected size of Node struct, in bytes

  • n_categorical_splits – n_categorical_splits[i] stores the number of categorical splits in the i-th tree.

  • raw_left_cat_bitsets – raw_left_cat_bitsets[i][k] stores the bitmaps for node k of tree i. The bitmaps are used to represent categorical tests. Shape of raw_left_cat_bitsets[i]: (n_categorical_splits, 8)

  • known_cat_bitsets – Bitsets representing the list of known categories per categorical feature. Shape: (n_categorical_features, 8)

  • known_cat_bitsets_offset_map – Map from an original feature index to the corresponding index in the known_cat_bitsets array. Shape: (n_features,)

  • features_map – Mapping to re-order features. This is needed because HistGradientBoosting estimator internally re-orders features using ColumnTransformer so that the categorical features come before the numerical features.

  • categories_map – Mapping to transform categorical features. This is needed because HistGradientBoosting estimator embeds an OrdinalEncoder. categories_map[i] represents the mapping for i-th categorical feature.

  • base_scores – Baseline predictions for outputs. At prediction, margin scores will be adjusted by this amount before applying the post-processing (link) function. Required shape: (1,) for binary classification; (n_classes,) for multi-class classification

  • out – Loaded model

Returns:

0 for success, -1 for failure

Model builder interface

Use the following functions to incrementally build decisio n tree ensemble models.

int TreeliteGetModelBuilder(char const *json_str, TreeliteModelBuilderHandle *out)

Initialize a model builder object from a JSON string.

The JSON string must contain all relevant metadata, including:

  • threshold_type: Type of thresholds in the tree model

  • leaf_output_type: Type of leaf outputs in the tree model

  • metadata: Model metadata, consisting of following subfields:

    • num_feature: Number of features

    • task_type: Task type

    • average_tree_output: Whether to average outputs of trees

    • num_target: Number of targets

    • num_class: Number of classes. num_class[i] is the number of classes of target i.

    • leaf_vector_shape: Shape of the output from each leaf node

  • tree_annotation: Annotation for individual trees, consisting of following subfields:

    • num_tree: Number of trees

    • target_id: target_id Target that each tree is associated with

    • class_id: Class that each tree is associated with

  • postprocessor: Postprocessor for prediction outputs, consisting of following subfields:

    • name: Name of postprocessor

    • config_json: Optional JSON string to configure the postprocessor

  • base_scores: Baseline scores for targets and classes, before adding tree outputs. Also known as the intercept.

  • attributes: Arbitrary JSON object, to be stored in the “attributes” field in the model object.

Parameters:
  • json_str – JSON string containing relevant metadata.

  • out – Model builder object

Returns:

0 for success, -1 for failure

int TreeliteDeleteModelBuilder(TreeliteModelBuilderHandle model_builder)

Delete model builder object from memory.

Parameters:

model_builder – Model builder object to be deleted

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderStartTree(TreeliteModelBuilderHandle model_builder)

Start a new tree.

Parameters:

model_builder – Model builder object

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderEndTree(TreeliteModelBuilderHandle model_builder)

End the current tree.

Parameters:

model_builder – Model builder object

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderStartNode(TreeliteModelBuilderHandle model_builder, int node_key)

Start a new node.

Parameters:
  • model_builder – Model builder object

  • node_key – Integer key that unique identifies the node.

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderEndNode(TreeliteModelBuilderHandle model_builder)

End the current node.

Parameters:

model_builder – Model builder object

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderNumericalTest(TreeliteModelBuilderHandle model_builder, int32_t split_index, double threshold, int default_left, char const *cmp, int left_child_key, int right_child_key)

Declare the current node as a numerical test node, where the test is of form [feature value] [cmp] [threshold]. Data points for which the test evaluates to True will be mapped to the left child node; all other data points (for which the test evaluates to False) will be mapped to the right child node.

Parameters:
  • model_builder – Model builder object

  • split_index – Feature ID

  • threshold – Threshold

  • default_left – Whether the missing value should be mapped to the left child

  • cmp – Comparison operator

  • left_child_key – Integer key that unique identifies the left child node.

  • right_child_key – Integer key that unique identifies the right child node.

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderCategoricalTest(TreeliteModelBuilderHandle model_builder, int32_t split_index, int default_left, uint32_t const *category_list, size_t category_list_len, int category_list_right_child, int left_child_key, int right_child_key)

Declare the current node as a categorical test node, where the test is of form [feature value] \in [category list].

Parameters:
  • model_builder – Model builder object

  • split_index – Feature ID

  • default_left – Whether the missing value should be mapped to the left child

  • category_list – List of categories to be tested for match

  • category_list_len – Length of category_list

  • category_list_right_child – Whether the data points for which the test evaluates to True should be mapped to the right child or the left child.

  • left_child_key – Integer key that unique identifies the left child node.

  • right_child_key – Integer key that unique identifies the right child node.

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderLeafScalar(TreeliteModelBuilderHandle model_builder, double leaf_value)

Declare the current node as a leaf node with a scalar output.

Parameters:
  • model_builder – Model builder object

  • leaf_value – Value of leaf output

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderLeafVectorFloat32(TreeliteModelBuilderHandle model_builder, float const *leaf_vector, size_t leaf_vector_len)

Declare the current node as a leaf node with a vector output (float32)

Parameters:
  • model_builder – Model builder object

  • leaf_vector – Value of leaf output

  • leaf_vector_len – Length of leaf_vector

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderLeafVectorFloat64(TreeliteModelBuilderHandle model_builder, double const *leaf_vector, size_t leaf_vector_len)

Declare the current node as a leaf node with a vector output (float64)

Parameters:
  • model_builder – Model builder object

  • leaf_vector – Value of leaf output

  • leaf_vector_len – Length of leaf_vector

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderGain(TreeliteModelBuilderHandle model_builder, double gain)

Specify the gain (loss reduction) that’s resulted from the current split.

Parameters:
  • model_builder – Model builder object

  • gain – Gain (loss reduction)

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderDataCount(TreeliteModelBuilderHandle model_builder, uint64_t data_count)

Specify the number of data points (samples) that are mapped to the current node.

Parameters:
  • model_builder – Model builder object

  • data_count – Number of data points

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderSumHess(TreeliteModelBuilderHandle model_builder, double sum_hess)

Specify the weighted sample count or the sum of Hessians for the data points that are mapped to the current node.

Parameters:
  • model_builder – Model builder object

  • sum_hess – Weighted sample count or the sum of Hessians

Returns:

0 for success, -1 for failure

int TreeliteModelBuilderCommitModel(TreeliteModelBuilderHandle model_builder, TreeliteModelHandle *out)

Conclude model building and obtain the final model object.

Parameters:
  • model_builder – Model builder object

  • out – Final model object

Model manager interface

int TreeliteDumpAsJSON(TreeliteModelHandle handle, int pretty_print, char const **out_json_str)

Dump a model object as a JSON string.

Parameters:
  • handle – The handle to the model object

  • pretty_print – Whether to pretty-print JSON string (0 for false, != 0 for true)

  • out_json_str – The JSON string

Returns:

0 for success, -1 for failure

int TreeliteGetInputType(TreeliteModelHandle model, char const **out_str)

Query the input type of a Treelite model object.

Parameters:
  • model – Treelite Model object

  • out_str – String representation of input type

Returns:

0 for success; -1 for failure

int TreeliteGetOutputType(TreeliteModelHandle model, char const **out_str)

Query the output type of a Treelite model object.

Parameters:
  • model – Treelite Model object

  • out_str – String representation of output type

Returns:

0 for success; -1 for failure

int TreeliteQueryNumTree(TreeliteModelHandle model, size_t *out)

Query the number of trees in the model.

Parameters:
  • model – Model to query

  • out – Number of trees

Returns:

0 for success, -1 for failure

int TreeliteQueryNumFeature(TreeliteModelHandle model, int *out)

Query the number of features used in the model.

Parameters:
  • model – Model to query

  • out – Number of features

Returns:

0 for success, -1 for failure

int TreeliteConcatenateModelObjects(TreeliteModelHandle const *objs, size_t len, TreeliteModelHandle *out)

Concatenate multiple model objects into a single model object by copying all member trees into the destination model object.

Parameters:
  • objs – Pointer to the beginning of the list of model objects

  • len – Number of model objects

  • out – Used to save the concatenated model

int TreeliteFreeModel(TreeliteModelHandle handle)

Delete model from memory.

Parameters:

handle – Model to remove

Returns:

0 for success, -1 for failure

Serializer

int TreeliteSerializeModelToFile(TreeliteModelHandle handle, char const *filename)

Serialize (persist) a model object to disk.

Parameters:
  • handle – Handle to the model object

  • filename – Name of the file to which to serialize the model. The file will be using a binary format that’s optimized to store the Treelite model object efficiently.

Returns:

0 for success, -1 for failure

int TreeliteDeserializeModelFromFile(char const *filename, TreeliteModelHandle *out)

Deserialize (load) a model object from disk.

Parameters:
  • filename – Name of the file from which to deserialize the model. The file should be created by a call to TreeliteSerializeModelToFile.

  • out – Handle to the model object

Returns:

0 for success, -1 for failure

int TreeliteSerializeModelToBytes(TreeliteModelHandle handle, char const **out_bytes, size_t *out_bytes_len)

Serialize (persist) a model object to a byte sequence.

Parameters:
  • handle – Handle to the model object

  • out_bytes – Byte sequence containing serialized model

  • out_bytes_len – Length of out_bytes

Returns:

0 for success, -1 for failure

int TreeliteDeserializeModelFromBytes(char const *bytes, size_t bytes_len, TreeliteModelHandle *out)

Deserialize (load) a model object from a byte sequence.

Parameters:
  • bytes – Byte sequence containing serialized model. The string should be created by a call to TreeliteSerializeModelToBytes.

  • bytes_len – Length of bytes

  • out – Loaded model

Returns:

0 for success, -1 for failure

int TreeliteSerializeModelToPyBuffer(TreeliteModelHandle handle, TreelitePyBufferFrame **out_frames, size_t *out_num_frames)

Serialize a model object using the Python buffer protocol (PEP 3118).

Parameters:
  • handle – Handle to the model object

  • out_frames – Pointer to buffer frames

  • out_num_frames – Number of buffer frames

Returns:

0 for success, -1 for failure

int TreeliteDeserializeModelFromPyBuffer(TreelitePyBufferFrame *frames, size_t num_frames, TreeliteModelHandle *out)

Deserialize a model object using the Python buffer protocol (PEP 3118).

Parameters:
  • frames – Buffer frames

  • num_frames – Number of buffer frames

  • out – Loaded model

Returns:

0 for success, -1 for failure

Getters and setters for the model object

int TreeliteGetHeaderField(TreeliteModelHandle model, char const *name, TreelitePyBufferFrame *out_frame)

Get a field in the header.

This function returns the requested field using the Python buffer protocol (PEP 3118).

Parameters:
  • model – Treelite Model object

  • name – Name of the field

  • out_frame – Buffer frame representing the requested field

Returns:

0 for success; -1 for failure

int TreeliteGetTreeField(TreeliteModelHandle model, uint64_t tree_id, char const *name, TreelitePyBufferFrame *out_frame)

Get a field in a tree.

This function returns the requested field using the Python buffer protocol (PEP 3118).

Parameters:
  • model – Treelite Model object

  • tree_id – ID of the tree

  • name – Name of the field

  • out_frame – Buffer frame representing the requested field

Returns:

0 for success; -1 for failure

int TreeliteSetHeaderField(TreeliteModelHandle model, char const *name, TreelitePyBufferFrame frame)

Set a field in the header.

This function accepts the field’s new value using the Python buffer protocol (PEP 3118).

Parameters:
  • model – Treelite Model object

  • name – Name of the field

  • frame – Buffer frame representing the new value for the field

Returns:

0 for success; -1 for failure

int TreeliteSetTreeField(TreeliteModelHandle model, uint64_t tree_id, char const *name, TreelitePyBufferFrame frame)

Set a field in a tree.

This function accepts the field’s new value using the Python buffer protocol (PEP 3118).

Parameters:
  • model – Treelite Model object

  • tree_id – ID of the tree

  • name – Name of the field

  • frame – Buffer frame representing the new value for the field

Returns:

0 for success; -1 for failure

General Tree Inference Library (GTIL)

int TreeliteGTILParseConfig(char const *config_json, TreeliteGTILConfigHandle *out)

Load a configuration for GTIL predictor from a JSON string.

Parameters:
  • config_json – a JSON string with the following fields:

    • ”nthread” (optional): Number of threads used for initializing DMatrix. Set <= 0 to use all CPU cores.

    • ”predict_type” (required): Must be one of the following.

      • ”default”: Sum over trees and apply post-processing

      • ”raw”: Sum over trees, but don’t apply post-processing; get raw margin scores instead.

      • ”leaf_id”: Output one (integer) leaf ID per tree.

      • ”score_per_tree”: Output one or more margin scores per tree.

  • out – Parsed configuration

Returns:

0 for success; -1 for failure

int TreeliteGTILDeleteConfig(TreeliteGTILConfigHandle handle)

Delete a GTIL configuration from memory.

Parameters:

handle – Handle to the GTIL configuration to be deleted

Returns:

0 for success; -1 for failure

int TreeliteGTILGetOutputShape(TreeliteModelHandle model, uint64_t num_row, TreeliteGTILConfigHandle config, uint64_t const **out, uint64_t *out_ndim)

Given a data matrix, query the necessary shape of array to hold predictions for all data points.

Parameters:
  • model – Treelite Model object

  • num_row – Number of rows in the input

  • config – Configuration of GTIL predictor. Set this by calling TreeliteGTILParseConfig.

  • out_shape – Array of dimensions

  • out_ndim – Number of dimensions in out_shape

Returns:

0 for success; -1 for failure

int TreeliteGTILPredict(TreeliteModelHandle model, void const *input, char const *input_type, uint64_t num_row, void *output, TreeliteGTILConfigHandle config)

Predict with a 2D dense array.

Parameters:
  • model – Treelite Model object

  • input – The 2D data array, laid out in row-major layout

  • input_type – Data type of the data matrix

  • num_row – Number of rows in the data matrix.

  • output – Pointer to buffer to store the output. Call TreeliteGTILGetOutputShape to get the amount of buffer you should allocate for this parameter.

  • config – Configuration of GTIL predictor. Set this by calling TreeliteGTILParseConfig.

Returns:

0 for success; -1 for failure

int TreeliteGTILPredictSparse(TreeliteModelHandle model, void const *data, char const *input_type, uint64_t const *col_ind, uint64_t const *row_ptr, uint64_t num_row, void *output, TreeliteGTILConfigHandle config)

Predict with sparse data with CSR (compressed sparse row) layout.

In the CSR layout, data[row_ptr[i]:row_ptr[i+1]] store the nonzero entries of row i, and col_ind[row_ptr[i]:row_ptr[i+1]] stores the corresponding column indices.

Parameters:
  • model – Treelite Model object

  • data – Nonzero elements in the data matrix

  • input_type – Data type of the data matrix

  • col_ind – Feature indices. col_ind[i] indicates the feature index associated with data[i].

  • row_ptr – Pointer to row headers. Length is [num_row] + 1.

  • num_row – Number of rows in the data matrix.

  • output – Pointer to buffer to store the output. Call GetOutputShape to get the amount of buffer you should allocate for this parameter.

  • config – Configuration of GTIL predictor. Set this by calling TreeliteGTILParseConfig.

Returns:

0 for success; -1 for failure

Handle types

Treelite uses C++ classes to define its internal data structures. In order to pass C++ objects to C functions, opaque handles are used. Opaque handles are void* pointers that store raw memory addresses.

typedef void *TreeliteModelHandle

Handle to a decision tree ensemble model.

typedef void *TreeliteModelBuilderHandle

Handle to a model builder object.

typedef void *TreeliteGTILConfigHandle

Handle to a configuration of GTIL predictor.