Low Probability Default

Chris Cormack and David Kelly

Stuck in the Tail

Challenges posed by regulators such as IFRS9 as well as the desire of asset and treasury managers to improve their portfolio investment decisions and risk indicators for low probability of default portfolios have led to a desire to improve forecast methods. This note highlights some of the research performed in our Quant Labs that shows a demonstrable improvement on previous studies.

The problem with low probability defaults such as those witnessed by sovereigns is that by their very definition, they do not happen very often and also unlike corporate and retail credit there are no more than two hundred issuers. Traditional statistical techniques that heroically link disparate probability distributions to create some narrative around contagion are hampered by fragmented market data that is concentrated in the highly ranked issuers such as G7.

The input data used in these traditional models are a summary of how the network of investors views a particular credit. It is the edited highlights, an aggregate view of the consequences of individuals across a number of institutions completing detailed due diligence. The market data also defines a relative statement or consensus of the risk-return for where the marginal dollar of investment should be applied. This marginal dollar of risk, for example, drives the over buying and retrenchment of bond investors in external emerging markets debt.

Governments fail for many reasons, but there is a pattern of behaviour that any analyst would look to see for history repeating itself. Examples include; material expansion of external debt, autocratic or corrupt government, low GDP per population, palm tree present on the national flag. OK not the last one although it was a criterion for a senior credit officer from the 1990s. The key point is that there are a number of non-market but still measurable attributes that can be combined to automate the due diligence process a traditional country credit officer would complete.

The advantage of using a model that leverages off a wide set of data and calibrates well to historic market and downgrade events is that it is much more consistent and avoids officers “going local” and not viewing how each country sits alongside its peers.

The Quant Foundry Labs division was approached by a Tier 1 client to introduce modelling techniques that improves the predictability of low probability defaults while reducing the burden on traditional credit risk management work. We used a combination of traditional subject matter knowledge and AI tools that together demonstrably provides a more powerful means of assessing events and finding complex probabilistic dependencies within the data.

The use of more powerful machine learning techniques comes at a challenge of model transparency. To address this, we have developed a suite of tools to provide insight and explanation into the model choices and influence of the model inputs. Together the predictive Machine Learning Algorithm and the diagnostic tools give rise to a powerful model combination that enables both portfolio mangers and regulatory risk teams to assess the risks from LDPs as a point in time and through the credit cycle.

Quant Foundry Approach

If there was a buzzword hit parade for 2018, AI and ML would certainly be in the top three. There is much hype even though the use of artificial intelligence has been a topic of conversation for decades – Forbidden Planet, C3PO, Hal? Massively improved computation and a tsunami of available data has enabled AI and ML to flourish and powerful tools have been democratised so that anyone with some technical background can use them.

Our Quant Foundry Labs division have embraced these techniques but also appreciate that they are still just tools that need to be treated with the same discipline that we now manage pricing and risk models. We apply the following approach to all of our AI projects: –

Industry Knowledge – Has to start with industry knowledge to understand the problem we have been asked to solve, articulate the desired outcome, available data, solution design using a blend of traditional and AI techniques

Implementation – Detailed understanding of how to combine traditional and AI techniques to build out a solution. Quality checks on gap-filled input set, appropriateness of feature set, code testing and deployment

Quality Framework – Documentation of solution design and implementation approach, preparation of input data test, performance results and explanatory artefacts, articulation of model limitations and points of instability

Model Scope

The goal of our model is to predict the rating transition probability as a point in time forecast and to leverage this model to forecast rating transitions up to four quarters ahead and beyond to address some of the challenges of IFRS9. The predictive power of our model lies in the fact that it uses advanced machine learning techniques to “learn” what is the most probable rating given different economical and financial indicators. The model is trained and calibrated using a large set of economic and financial historical data collected across many different countries: –

Classifier – We designed the classifier that given several economic and financial parameters at a given quarter predicts the rating at that quarter. Several machine learning algorithms and feature sets has been tested in order to maximize the classification accuracy.

Forecast Algorithm – We implemented an algorithm that forecast the economic and financial parameters needed by the classifier to predict the ratings. We explored both traditional econometric models like ARIMA (Auto Regressive Integrated Moving Average) models, and more innovative AI techniques like standard Recurrent Neural Networks and LSTM models (Long Short Term Memory)

Forward Evolution – Given the feature predictions and uncertainties we simulate possible evolution paths. Then we run each path through the classifier. This return a set of rating probability distributions.

Classifier and the Secret Sauce

AI as with all models cannot operate as an island. It is not credible or possible for a model to dredge the entire universe of data and come up with a Feature Set that drives the algorithm. The secret sauce here is industry knowledge and an understanding of the dynamics and red flags of government failure. Getting this step wrong will allow the AI model to adhere to the time-honoured principle of “garbage in – garbage out”. The definition of the Feature Set that contributes to the performance of the Classifier follows the following three steps: –

Feature Set Selection is a very important process in order to enhance the performances of a classifier. A higher number of features doesn’t necessarily mean higher accuracy and better results, although will make it slower. Not all the countries have data on all the features; If we want to include more features in the model we have to keep in mind that this always comes with the cost of dropping some countries for which not all the necessary features are available. We need to apply judgement to find a compromise.

The exclusion of critical features due to a poor feature selection can introduce significant bias in the classifier. We noticed that some features are critical for the classification of medium/lower rated countries but not for high rating countries and vice-versa. We need to be sure that we are including the most important once across the all range of ratings. We looked at the feature importance score reported by the classifier after the training phase for feedback on our selection: –

Expert Selection – We included different kind of indicators: economic data, market data and governance indicators. We looked for historical time series with the highest available frequencies and time coverage. We put each time-series of raw data into a coherent and uniform time grid with monthly and quarterly frequency.

Feature Forecasting – We enhanced the predictive power of our model by building additional features combining/converting the available ones. This included percentages, ratios, time decay, changes, spreads and basis. We also subtracted globally co-ordinated trends such as 2008 meltdown

Final Feature Set – We completed several iterative tests to select the best performing set of features for the model using a set of custom-built explanatory tools to highlight feature significance and explanatory power.

Gradient Learning Curve

At this point we have historical data from several countries based on the Feature Set as well as the issuer’s credit rating. We now need to introduce our enhanced gradient boosted algorithm (GB) to enable us to forecast the future rating of each country. The objective of our chosen approach is to learn from history to minimise the difference between predicted rating based on historical data and realised rating, while overcoming considerable variances in the quality of the historical dataset.

GB is an ensemble technique that combines multiple weak learning algorithms that individually are based on single decision trees, to gain a higher performance in terms of predictive power. The key advantage of a gradient boosting algorithm lies in the fact that it learns from its errors.

A GB algorithm creates trees sequentially and at each iteration it adds new decisions trees focusing the attention on the data points misclassified by the previous set of trees. In this way the predictive loss is iteratively decreased. The process continues until no further improvements can be made. The name “gradient boosting” refers to the fact that this technique uses a gradient descent algorithm to minimize the loss when adding new trees.

So, a trained GB model can be described as the aggregation of weak learners, where each added tree has been optimized. The final node condition of default can be converted into a probability of that event happening.

We can now provide a probability estimate for each issuer will land on each rating from AAA to D during each quarter in the coming year and then weights each member of the Feature Set by significance given to it.

Validate for Stability

At this stage we should be able to claim victory and we can show results that showcase its predictive power. What we don’t know is how the model will perform going forward, under different conditions. We need to make sure the model rating output does not flip flap based on small changes in the Feature Set.

Feature Set Forecast

The feature forecast algorithm consists of a typical principle component transformation of the data and then apply a auto-regressive technique to create a four quarter evolution of the feature set. We then assumed that the uncertainties are gaussian and we drew 100 possible new levels of Feature Set and then reapply our trained GB model to see how the predicted rating change.

Visualising

In machine learning, one of the biggest challenges is model explanation and interpretability.
Some machine learning techniques are intrinsically more transparent (Naive Bayes, Decision Trees, Random Forest). Others such as Neural Networks are trickier to decipher in particular understanding the importance of each member of the Feature Set. Just because a feature is used deeper in the tree it doesn’t necessarily mean its importance is lower than features used at higher levelsWe deploy visualisation tools that gives less importance to features near the “root of the tree” and give higher credence to those near the leaves. We deploy visualisation algorithm that looks at the average difference in predictions over all orderings of the features and gives a more consistent result.

Now for the Results

We described how we trained and calibrated each step of our model. During this process we took from our sample all the data relative to the last available year. We test the predictive performance of the classifier by considering how well it knows the true features evolution by un-blinding the test data set. The subsequent confusion matrix is shown where we can compare the predictions made using the true values of the features and the predicted values of the features. We can see with the high concentration in the diagonal the overall model accuracy and the enhanced predictive power around transitions

The graph below shows the strength of the predictability of the model for highly rated and thus low probability default countries and shows impressive performance. For the next phase of this model development we plan to capture the dynamics of cross-over movements in particular Southern European members of the Euro that do not control their domestic currency.

Conclusion

The use of a blend of traditional modelling approaches and AI has improved the predictability of low probability default models where the prevailing data tends to be a challenge. The use of AI demonstrably provides an efficient way of conducting non-linear regression across historical data that normal distribution mapping techniques fail.

The critical element of this exercise is the need industry knowledge of what are the key drivers of a typical government default and the understanding of how these risk factors interplay in the choice of the gradient AI model. Testing the model stability is central to any development and we certainly take that approach. Finally, the methods used to visualise the results are integral to the methodology stack for this type of model and needs to be included in any validation to make sure the recipients understand what is going on under the hood when making risk decisions.

At the Quant Foundry, we are very excited by this approach as the results are a step improvement in what has gone before, and we look forward to discussing this with our collaborators in the data vendor world as well those that address this challenge in the banks.