Open Master Assignments | [B] Exploring the Uncertainty of Learned Associations in LLMs

Master Assignment

[B] Exploring the Uncertainty of Learned Associations in LLMs

Type: Master EE/CS/ITC

Period: TBD

Student: (Unassigned)

If you are interested please contact :

Background:

How certain are LLMs about their learned associations? Broad questions about what models such as GPT learn have been approached from many directions. Works [1] have studied how information flows through networks by measuring the direct/indirect/total effect that the input changes have to the output. Others have studied the syntactic agreement between LLMs [2], and the effect of erasing [3] or corrupting [4] information.

More recently, [5] explored confidence in the context of LLM's predictions given the consistency of responses. Albeit, the interesting advancements towards quantifying uncertainty of the learned representations are sparse. The open question of the project is how can we quantify the certainty of the learned feature associations of LLMs?

Objectives:

You will work on open-source LLMs (e.g. OpenLLaMa, GPT-J) investigating and adapting methods that can provide an estimate for uncertainty in the representations (e.g. [6]).

Your profile:

You are a graduate student that is enthusiastic about discovering methods for model interpretability. You have previous experience in developing projects on DL frameworks. You are also enthusiastic about researching new directions and applying, testing, and analysing the outcomes of your ideas.

Related works:

Vig, J., Gehrmann, S., Belinkov, Y., Qian, S., Nevo, D., Singer, Y. and Shieber, S., 2020. Investigating gender bias in language models using causal mediation analysis. Advances in neural information processing systems, 33, pp.12388-12401.
Finlayson, M., Mueller, A., Gehrmann, S., Shieber, S., Linzen, T. and Belinkov, Y., 2021. Causal analysis of syntactic agreement mechanisms in neural language models. arXiv preprint arXiv:2106.06087.
Feder, A., Oved, N., Shalit, U. and Reichart, R., 2021. Causalm: Causal model explanation through counterfactual language models. Computational Linguistics, 47(2), pp.333-386.
Meng, K., Bau, D., Andonian, A. and Belinkov, Y., 2022. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems, 35, pp.17359-17372.
Xiong, M., Hu, Z., Lu, X., Li, Y., Fu, J., He, J. and Hooi, B., 2023. Can LLMs Express Their Uncertainty. An Empirical Evaluation of Confidence Elicitation in LLMs.
Upadhyay, U., Karthik, S., Chen, Y., Mancini, M. and Akata, Z., 2022, October. Bayescap: Bayesian identity cap for calibrated uncertainty in frozen neural networks. In European Conference on Computer Vision (pp. 299-317). Cham: Springer Nature Switzerland.