Regression trees and quantitative investing
Sophisticated models are essential for data-driven investment management.
Active investment managers aim to outperform their benchmarks by identifying a small subset of promising investments from a much larger universe of possibilities. To do this without getting overwhelmed, they need tools to help filter out the majority of irrelevant opportunities.
Traditional stockpickers often rely on heuristics, company visits, and simple factor screens to narrow their focus. In contrast, systematic or quantitative investors can use more sophisticated tools such as regression trees to make forecasts on individual stocks that guide investment decision-making.
The notion of regression trees derives from a 1984 publication called “Classification and Regression Trees.” Regression trees enable a prediction for a given company by posing a series of yes/no questions about the company’s characteristics. Importantly, each question is chosen based on the answer to the previous one, allowing the model to adapt its analysis to the specific characteristics of each company.
For example, the tree might first ask whether a company has recently issued shares and debt. If so, the next question might be about market sentiment, evaluating whether issuance could be justified by growth and investor optimism. If not, the next question might be about valuation instead. This branching structure enables the model to focus on the most relevant characteristics for each company — such as sentiment characteristics for growth companies — and downplay features that are less important to the outperformance potential of that type of company.
This adaptability is a key strength of regression trees. Unlike simpler linear models, which apply the same weighting scheme across all companies, regression trees tailor their analysis to different segments of the market. As a result, they can identify attractive investment opportunities across diverse types of companies – value, growth, quality, or otherwise – without forcing them into a one-size-fits-all framework.
Regression trees are machine learning algorithms that are trained on vast amounts of data to find companies with combinations of characteristics that have, in the past, signaled outperformance. The diverse range of investment opportunities they uncover helps to support a more broadly diversified risk-managed portfolio.
Forests of trees
The regression tree offers an effective and transparent way of identifying a diverse set of investment opportunities. The tree conducts one line of questioning, as if acting as a single expert investment analyst, ultimately arriving at a forecast for each stock.
For statistical reasons, a single tree is ultimately limited in the number of questions it can ask about a particular stock. Yet, being able to ask more questions often produces more robust forecasts, particularly if the questioners represent a diverse range of views and experiences. If multiple different analysts, especially those with different backgrounds, all agree that a stock is an attractive investment based on their own independent analyses, you can place greater confidence in that forecast.
This simple insight is the basis for combining multiple decision trees into a forest of trees. To improve forecast robustness and accuracy, a forest model combines the predictions of many trees, effectively assembling a team of expert analysts, each with a different perspective trained on a different view of the historical data. This approach produces ample variety in different lines of questioning across different trees, helping to thoroughly evaluate each company’s outperformance potential to produce more robust predictions.
Forest models preserve what makes regression trees so useful — transparency, adaptability, and the ability to identify diverse investment opportunities — while enhancing the quality of their forecasts to help seek improved investment outcomes.
For more on decision trees and their role in investment management, see the white papers Classification and regression trees and Using multiple decision trees for stock selection.