General Tree Inference Library (GTIL)

GTIL is a reference implementation of a prediction runtime for all Treelite models. It has the following goals:

Universal coverage: GTIL shall support all tree ensemble models that can be represented as Treelite objects.
Accessible code: GTIL should be written in an easy-to-read style that can be understood to a first-time contributor. We prefer code legibility to performance optimization.
Correct output: As a reference implementation, GTIL should produce correct prediction outputs.

General Tree Inference Library (GTIL)

Functions:

`predict`(model, data, *[, nthread, pred_margin])	Predict with a Treelite model using the General Tree Inference Library (GTIL).
`predict_leaf`(model, data, *[, nthread])	Predict with a Treelite model, outputting the leaf node's ID for each row.
`predict_per_tree`(model, data, *[, nthread])	Predict with a Treelite model and output prediction of each tree.

treelite.gtil.predict(model, data, *, nthread=-1, pred_margin=None)

Predict with a Treelite model using the General Tree Inference Library (GTIL).

Parameters:

model (Model object) – Treelite model object
data (numpy.ndarray array) – 2D NumPy array, with which to run prediction
nthread (int, optional) – Number of CPU cores to use in prediction. If <= 0, use all CPU cores.
pred_margin (bool, optional (defaults to False)) – Whether to produce raw margin scores. If pred_margin=True, post-processing is no longer applied and raw margin scores are produced.

Returns:

prediction – Prediction output. Expected output dimensions:

(num_row,) for regressors and binary classifiers
(num_row, num_class) for multi-class classifiers (See Notes for a special case.)

Return type:

numpy.ndarray array

Notes

The output has shape (num_row,) if the model is a multi-class classifier with task_type=”MultiClfGrovePerClass” and pred_transform=”max_index”.

treelite.gtil.predict_leaf(model, data, *, nthread=-1)

Predict with a Treelite model, outputting the leaf node’s ID for each row.

Parameters:

model (Model object) – Treelite model object
data (numpy.ndarray array) – 2D NumPy array, with which to run prediction
nthread (int, optional) – Number of CPU cores to use in prediction. If <= 0, use all CPU cores.

Returns:

prediction – Prediction output. Expected output dimensions: (num_row, num_tree)

Return type:

numpy.ndarray array

Notes

Treelite assigns a unique integer ID for every node in the tree, including leaf nodes as well as internal nodes. It does so by traversing the tree breadth-first. So, for example, the root node is assigned ID 0, and the two nodes at depth=1 is assigned ID 1 and 2, respectively. Call treelite.Model.dump_as_json() to obtain the ID of every tree node.

treelite.gtil.predict_per_tree(model, data, *, nthread=-1)

Predict with a Treelite model and output prediction of each tree. This function computes one or more margin scores per tree.

Parameters:

model (Model object) – Treelite model object
data (numpy.ndarray array) – 2D NumPy array, with which to run prediction
nthread (int, optional) – Number of CPU cores to use in prediction. If <= 0, use all CPU cores.

Returns:

prediction – Prediction output. Expected output dimensions:

(num_row, num_tree) for regressors, binary classifiers, and multi-class classifiers with task_type=”MultiClfGrovePerClass”
(num_row, num_tree, num_class) for multi-class classifiers with task_type=”kMultiClfProbDistLeaf”

Return type:

numpy.ndarray array