Fecha: Miércoles 28 de mayo
Sala: H-103
Hora: 14:00hrs
Área: Business Intelligence
Presenta:
David Diaz, FEN
Abstract:
This study empirically evaluates the performance of Large Language Models (LLMs) in predicting credit risk for retail banking in Chile, comparing their effectiveness to traditional machine learning models. A variety of LLM configurations were tested, including models with and without fine-tuning, different chunking sizes, and several prompt engineering strategies, such as credit analyst roleplay, chain-of-thought reasoning, emotional stimuli, take a breather, and example-based learning (one-shot and few-shot). The analysis compared open-source models like Llama 3 and commercial models like GPT-3.5 and GPT-4.0. Results indicate that fine-tuned LLMs can achieve predictive accuracy levels comparable to traditional models such as logistic regression and ensemble methods like LightGBM. The top-performing fine-tuned GPT-3.5, GPT-4.0 and Llama 3 configurations achieved AUROC values near 80%, closely matching the best-performing LightGBM benchmark.
A stability test was conducted to assess the consistency of predictions, crucial for credit risk applications. LLMs with a temperature setting of zero demonstrated high stability, producing consistent results across repeated queries, while higher temperature settings introduced variability, especially in default predictions, underscoring the importance of controlling this parameter for reliable results.
Additionally, the LLMs were evaluated for their ability to explain their predictions. Textual explanations provided by a best performing LLM model were blindly reviewed by a credit risk expert, who rated them with an average score of 5.5/7. This result shows promise for explainability but also revealing occasional inconsistencies. Some explanations omitted relevant variables or provided justifications that did not fully align with the underlying data. These findings suggest that, with fine-tuning and careful configuration, LLMs can complement traditional models by offering competitive predictive performance and enhanced transparency in financial applications, particularly in credit risk management.