Skip to content

FinBERT Research Project

Period: 06/2022 - 08/2022
Project Name: Evaluation of FinBERT Performance on Multi-Class ESG Classification Task based on the MSCI Framework
GitHub: Go to GitHub

Machine Learning Data Collection Data Preprocessing Python Tensorflow PyTorch Keras Naïve Bayes Logistic Regression SVM Random Forest MLP CNN LSTM Bi-LSTM GRU BERT FinBERT ESG

What's the Project For?

  • Project of Undergraduate Research Opportunities Program (UROP) in HKUST - Deep Learning in NLP.
  • Evaluate the performance of FinBERT, the Large Language Model that adapts to the financial domain, on multi-class classification of ESG categories based on the MSCI framework.

Project Screenshot

Responsibilities

  • Performed data collection and data preprocessing for machine learning by executing ESG labeling.
  • Fine-tuned 11 machine learning models for performance evaluation using Tensorflow and PyTorch, which include:
    1. Naïve Bayes
    2. Logistic Regression
    3. Linear Support Vector Machine (SVM)
    4. Random Forest
    5. Multi-Layer Perceptron (MLP)
    6. Convolutional Neural Network (CNN)
    7. Long Short-Term Memory (LSTM)
    8. Bi-directional Long Short-Term Memory (Bi-LSTM)
    9. Gated Recurrent Unit (GRU)
    10. Bidirectional Encoder Representations from Transformers (BERT)
    11. FinBERT - A Large Language Model (LLM) newly developed for financial context

Most Challenging Part of the Project?

The most challenging part is to prepare the code for BERT fine-tuning. It is because I have never learnt about BERT model before, and I have never used PyTorch before too, it had taken me some time to understand the process of loading the model and tokenizer, as well as defining training arguments for fine-tuning.

The problem was solved unexpectedly easy because supervisor had provided a sample code of fine-tuning FinBERT model, and in the end, only a few amendment in the sample code is needed for fine-tuning BERT. Although the solution was simple, but it has led me to do a lot of research on the BERT model, and gain some basic knowledge about it. It was a rewarding learning process and I enjoyed a lot!