Our previous article on decision trees dealt with techniques to speed up the evaluation process, though often the performance-critical component of the machine learning pipeline is the prediction side. Training takes place offline, whereas predictions are often in the hot path - consider ranking documents in response to a user query a-la Google, Bing, etc. Many candidate documents need to be scored as quickly as possible, and the top k results returned to the user.
Here, we'll focus on on a few methods to improve the performance of evaluating an ensemble of decision trees - encompassing random forests, gradient boosted decision trees, AdaBoost, etc.
There are three methods we'll focus on here:
- Recursive tree walking (naive)
- Flattening the decision tree (flattened)
- Compiling the tree to machine code (compiled)
We'll show that choosing the right strategy can improve evaluation time by more than 2x - which can be a very significant performance improvement indeed.
All code (implementation, drivers, analysis scripts) are available on GitHub at the decisiontrees-performance repository.