Treelite exposes a set of C functions to enable interfacing with a variety of languages. This page will be most useful for:
We recommend the Python API for everyday uses.
Note
Use of C and C++ in treelite
Core logic of treelite are written in C++ to take advantage of higher abstractions. We provide C only interface here, as many more programming languages bind with C than with C++. See this page for more details.
Contents
Use the following functions to load and manipulate data from a variety of sources.
TreeliteDMatrixCreateFromFile
(const char *path, const char *format, int nthread, int verbose, DMatrixHandle *out)¶create DMatrix from a file
path
: file path format
: file format nthread
: number of threads to use verbose
: whether to produce extra messages out
: the created DMatrix
TreeliteDMatrixCreateFromCSR
(const float *data, const unsigned *col_ind, const size_t *row_ptr, size_t num_row, size_t num_col, DMatrixHandle *out)¶create DMatrix from a (in-memory) CSR matrix
data
: feature values col_ind
: feature indices row_ptr
: pointer to row headers num_row
: number of rows num_col
: number of columns out
: the created DMatrix
TreeliteDMatrixCreateFromMat
(const float *data, size_t num_row, size_t num_col, float missing_value, DMatrixHandle *out)¶create DMatrix from a (in-memory) dense matrix
data
: feature values num_row
: number of rows num_col
: number of columns missing_value
: value to represent missing value out
: the created DMatrix
TreeliteDMatrixGetDimension
(DMatrixHandle handle, size_t *out_num_row, size_t *out_num_col, size_t *out_nelem)¶get dimensions of a DMatrix
handle
: handle to DMatrix out_num_row
: used to set number of rows out_num_col
: used to set number of columns out_nelem
: used to set number of nonzero entries
TreeliteDMatrixGetPreview
(DMatrixHandle handle, const char **out_preview)¶produce a human-readable preview of a DMatrix Will print first and last 25 non-zero entries, along with their locations
handle
: handle to DMatrix out_preview
: used to save the address of the string literal
TreeliteDMatrixGetArrays
(DMatrixHandle handle, const float **out_data, const uint32_t **out_col_ind, const size_t **out_row_ptr)¶extract three arrays (data, col_ind, row_ptr) that define a DMatrix.
handle
: handle to DMatrix out_data
: used to save pointer to array containing feature values out_col_ind
: used to save pointer to array containing feature indices out_row_ptr
: used to save pointer to array containing pointers to row headers
TreeliteDMatrixFree
(DMatrixHandle handle)¶delete DMatrix from memory
handle
: handle to DMatrix Use the following functions to annotate branches in decision trees.
TreeliteAnnotateBranch
(ModelHandle model, DMatrixHandle dmat, int nthread, int verbose, AnnotationHandle *out)¶annotate branches in a given model using frequency patterns in the training data.
model
: model to annotate dmat
: training data matrix nthread
: number of threads to use verbose
: whether to produce extra messages out
: used to save handle for the created annotation
TreeliteAnnotationLoad
(const char *path, AnnotationHandle *out)¶load branch annotation from a JSON file
path
: path to JSON file out
: used to save handle for the loaded annotation
TreeliteAnnotationSave
(AnnotationHandle handle, const char *path)¶save branch annotation to a JSON file
handle
: annotation to save path
: path to JSON file
TreeliteAnnotationFree
(AnnotationHandle handle)¶delete branch annotation from memory
handle
: annotation to remove Use the following functions to produce optimize prediction subroutine (in C) from a given decision tree ensemble.
TreeliteCompilerCreate
(const char *name, CompilerHandle *out)¶create a compiler with a given name
name
: name of compiler out
: created compiler
TreeliteCompilerSetParam
(CompilerHandle handle, const char *name, const char *value)¶set a parameter for a compiler
handle
: compiler name
: name of parameter value
: value of parameter
TreeliteCompilerGenerateCode
(CompilerHandle compiler, ModelHandle model, int verbose, const char *dirpath)¶generate prediction code from a tree ensemble model. The code will be C99 compliant. One header file (.h) will be generated, along with one or more source files (.c).
Usage example:
TreeliteCompilerGenerateCode(compiler, model, 1, "./my/model");
// files to generate: ./my/model/header.h, ./my/model/main.c
// if parallel compilation is enabled:
// ./my/model/header.h, ./my/model/main.c, ./my/model/tu0.c,
// ./my/model/tu1.c, and so forth
compiler
: handle for compiler model
: handle for tree ensemble model verbose
: whether to produce extra messages dirpath
: directory to store header and source files
TreeliteCompilerFree
(CompilerHandle handle)¶delete compiler from memory
handle
: compiler to remove Use the following functions to load decision tree ensemble models from a file. Treelite supports multiple model file formats.
TreeliteLoadLightGBMModel
(const char *filename, ModelHandle *out)¶load a model file generated by LightGBM (Microsoft/LightGBM). The model file must contain a decision tree ensemble.
filename
: name of model file out
: loaded model
TreeliteLoadXGBoostModel
(const char *filename, ModelHandle *out)¶load a model file generated by XGBoost (dmlc/xgboost). The model file must contain a decision tree ensemble.
filename
: name of model file out
: loaded model
TreeliteLoadXGBoostModelFromMemoryBuffer
(const void *buf, size_t len, ModelHandle *out)¶load an XGBoost model from a memory buffer.
buf
: memory buffer len
: size of memory buffer out
: loaded model
TreeliteLoadProtobufModel
(const char *filename, ModelHandle *out)¶load a model in Protocol Buffers format. Protocol Buffers (google/protobuf) is a language- and platform-neutral mechanism for serializing structured data. See tree.proto for format spec.
filename
: name of model file out
: loaded model
TreeliteExportXGBoostModel
(const char *filename, ModelHandle model, const char *name_obj)¶(EXPERIMENTAL FEATURE) export a model in XGBoost format. The exported model can be read by XGBoost (dmlc/xgboost).
filename
: name of model file model
: model to export name_obj
: name of objective function
TreeliteFreeModel
(ModelHandle handle)¶delete model from memory
handle
: model to remove Use the following functions to incrementally build decisio n tree ensemble models.
TreeliteCreateTreeBuilder
(TreeBuilderHandle *out)¶Create a new tree builder.
out
: newly created tree builder
TreeliteDeleteTreeBuilder
(TreeBuilderHandle handle)¶Delete a tree builder from memory.
handle
: tree builder to remove
TreeliteTreeBuilderCreateNode
(TreeBuilderHandle handle, int node_key)¶Create an empty node within a tree.
handle
: tree builder node_key
: unique integer key to identify the new node
TreeliteTreeBuilderDeleteNode
(TreeBuilderHandle handle, int node_key)¶Remove a node from a tree.
handle
: tree builder node_key
: unique integer key to identify the node to be removed
TreeliteTreeBuilderSetRootNode
(TreeBuilderHandle handle, int node_key)¶Set a node as the root of a tree.
handle
: tree builder node_key
: unique integer key to identify the root node
TreeliteTreeBuilderSetNumericalTestNode
(TreeBuilderHandle handle, int node_key, unsigned feature_id, const char *opname, float threshold, int default_left, int left_child_key, int right_child_key)¶Turn an empty node into a test node with numerical split. The test is in the form [feature value] OP [threshold]. Depending on the result of the test, either left or right child would be taken.
handle
: tree builder node_key
: unique integer key to identify the node being modified; this node needs to be empty feature_id
: id of feature opname
: binary operator to use in the test threshold
: threshold value default_left
: default direction for missing values left_child_key
: unique integer key to identify the left child node right_child_key
: unique integer key to identify the right child node
TreeliteTreeBuilderSetCategoricalTestNode
(TreeBuilderHandle handle, int node_key, unsigned feature_id, const unsigned int *left_categories, size_t left_categories_len, int default_left, int left_child_key, int right_child_key)¶Turn an empty node into a test node with categorical split. A list defines all categories that would be classified as the left side. Categories are integers ranging from 0 to (n-1), where n is the number of categories in that particular feature. Let’s assume n <= 64.
handle
: tree builder node_key
: unique integer key to identify the node being modified; this node needs to be empty feature_id
: id of feature left_categories
: list of categories belonging to the left child left_categories_len
: length of left_cateogries default_left
: default direction for missing values left_child_key
: unique integer key to identify the left child node right_child_key
: unique integer key to identify the right child node
TreeliteTreeBuilderSetLeafNode
(TreeBuilderHandle handle, int node_key, float leaf_value)¶Turn an empty node into a leaf node.
handle
: tree builder node_key
: unique integer key to identify the node being modified; this node needs to be empty leaf_value
: leaf value (weight) of the leaf node
TreeliteTreeBuilderSetLeafVectorNode
(TreeBuilderHandle handle, int node_key, const float *leaf_vector, size_t leaf_vector_len)¶Turn an empty node into a leaf vector node The leaf vector (collection of multiple leaf weights per leaf node) is useful for multi-class random forest classifier.
handle
: tree builder node_key
: unique integer key to identify the node being modified; this node needs to be empty leaf_vector
: leaf vector of the leaf node leaf_vector_len
: length of leaf_vector
TreeliteCreateModelBuilder
(int num_feature, int num_output_group, int random_forest_flag, ModelBuilderHandle *out)¶Create a new model builder.
num_feature
: number of features used in model being built. We assume that all feature indices are between 0 and (num_feature - 1). num_output_group
: number of output groups. Set to 1 for binary classification and regression; >1 for multiclass classification random_forest_flag
: whether the model is a random forest. Set to 0 if the model is gradient boosted trees. Any nonzero value shall indicate that the model is a random forest. out
: newly created model builder
TreeliteModelBuilderSetModelParam
(ModelBuilderHandle handle, const char *name, const char *value)¶Set a model parameter.
handle
: model builder name
: name of parameter value
: value of parameter
TreeliteDeleteModelBuilder
(ModelBuilderHandle handle)¶Delete a model builder from memory.
handle
: model builder to remove
TreeliteModelBuilderInsertTree
(ModelBuilderHandle handle, TreeBuilderHandle tree_builder, int index)¶Insert a tree at specified location.
handle
: model builder tree_builder
: builder for the tree to be inserted. The tree must not be part of any other existing tree ensemble. Note: The tree_builder argument will become unusuable after the tree insertion. Should you want to modify the tree afterwards, use GetTree(*) method to get a fresh handle to the tree. index
: index of the element before which to insert the tree; use -1 to insert at the end
TreeliteModelBuilderGetTree
(ModelBuilderHandle handle, int index, TreeBuilderHandle *out)¶Get a reference to a tree in the ensemble.
handle
: model builder index
: index of the tree in the ensemble out
: used to save reference to the tree
TreeliteModelBuilderDeleteTree
(ModelBuilderHandle handle, int index)¶Remove a tree from the ensemble.
handle
: model builder index
: index of the tree that would be removed
TreeliteModelBuilderCommitModel
(ModelBuilderHandle handle, ModelHandle *out)¶finalize the model and produce the in-memory representation
handle
: model builder out
: used to save handle to in-memory representation of the finished model Use the following functions to load compiled prediction subroutines from shared libraries and to make predictions.
TreeliteAssembleSparseBatch
(const float *data, const uint32_t *col_ind, const size_t *row_ptr, size_t num_row, size_t num_col, CSRBatchHandle *out)¶assemble a sparse batch
data
: feature values col_ind
: feature indices row_ptr
: pointer to row headers num_row
: number of data rows in the batch num_col
: number of columns (features) in the batch out
: handle to sparse batch
TreeliteDeleteSparseBatch
(CSRBatchHandle handle)¶delete a sparse batch from memory
handle
: sparse batch
TreeliteAssembleDenseBatch
(const float *data, float missing_value, size_t num_row, size_t num_col, DenseBatchHandle *out)¶assemble a dense batch
data
: feature values missing_value
: value to represent the missing value num_row
: number of data rows in the batch num_col
: number of columns (features) in the batch out
: handle to sparse batch
TreeliteDeleteDenseBatch
(DenseBatchHandle handle)¶delete a dense batch from memory
handle
: dense batch
TreeliteBatchGetDimension
(void *handle, int batch_sparse, size_t *out_num_row, size_t *out_num_col)¶get dimensions of a batch
handle
: a batch of rows (must be of type SparseBatch or DenseBatch) batch_sparse
: whether the batch is sparse (true) or dense (false) out_num_row
: used to set number of rows out_num_col
: used to set number of columns
TreelitePredictorLoad
(const char *library_path, int num_worker_thread, int include_master_thread, PredictorHandle *out)¶load prediction code into memory. This function assumes that the prediction code has been already compiled into a dynamic shared library object (.so/.dll/.dylib).
library_path
: path to library object file containing prediction code num_worker_thread
: number of worker threads (-1 to use max number) include_master_thread
: whether to assign workload to the master thread. If not, only workers threads will be assigned work. out
: handle to predictor
TreelitePredictorPredictBatch
(PredictorHandle handle, void *batch, int batch_sparse, int verbose, int pred_margin, float *out_result, size_t *out_result_size)¶Make predictions on a batch of data rows (synchronously). This function internally divides the workload among all worker threads.
handle
: predictor batch
: a batch of rows (must be of type SparseBatch or DenseBatch) batch_sparse
: whether batch is sparse (1) or dense (0) verbose
: whether to produce extra messages pred_margin
: whether to produce raw margin scores instead of transformed probabilities out_result
: resulting output vector; use TreelitePredictorQueryResultSize() to allocate sufficient space out_result_size
: used to save length of the output vector, which is guaranteed to be less than or equal to TreelitePredictorQueryResultSize()
TreelitePredictorQueryResultSize
(PredictorHandle handle, void *batch, int batch_sparse, size_t *out)¶Given a batch of data rows, query the necessary size of array to hold predictions for all data points.
handle
: predictor batch
: a batch of rows (must be of type SparseBatch or DenseBatch) batch_sparse
: whether batch is sparse (1) or dense (0) out
: used to store the length of prediction array
TreelitePredictorQueryNumOutputGroup
(PredictorHandle handle, size_t *out)¶Get the number of output groups in the loaded model The number is 1 for most tasks; it is greater than 1 for multiclass classifcation.
handle
: predictor out
: length of prediction array
TreelitePredictorFree
(PredictorHandle handle)¶delete predictor from memory
handle
: predictor to remove Treelite uses C++ classes to define its internal data structures. In order to
pass C++ objects to C functions, opaque handles are used. Opaque handles
are void*
pointers that store raw memory addresses.
DMatrixHandle
¶handle to a data matrix
ModelHandle
¶handle to a decision tree ensemble model
TreeBuilderHandle
¶handle to tree builder class
ModelBuilderHandle
¶handle to ensemble builder class
AnnotationHandle
¶handle to branch annotation data
CompilerHandle
¶handle to compiler class
PredictorHandle
¶handle to predictor class
CSRBatchHandle
¶handle to batch of sparse data rows
DenseBatchHandle
¶handle to batch of dense data rows