Pretrained language models, particularly large language models (LLMs) like ChatGPT,
have achieved significant success across various NLP tasks. However, there is considerable
evidence that these models tend to adopt the cultural biases inherent in the datasets used for
training, thereby unintentionally reinforcing biased patterns and potentially causing harm.
This project aims to investigate these biases across different categories, such as racial
and gender biases, in both Estonian and English language models.
[1]
E. Kaukonen, A. Sabir, R. Sharma, How Aunt-Like Are You? Exploring Gender Bias in the Genderless Estonian Language: A Case Study, (2025 Nodalida)
Human values, as described in Schwartz’s theory, strongly shape how people interact with digital technologies.
For example, concerns about privacy influence how users manage their online data, the desire for freedom shapes
their engagement with open platforms, and the need for accessibility determines whether technologies can be used
by people with diverse abilities. These values are often reflected in online discussions, with platforms such as
Reddit providing a rich source of conversations where they are openly articulated. Prior research has shown that
mining Reddit can yield valuable insights for software requirements, demonstrating its potential as a source of
user feedback. Yet, despite their importance for attracting and retaining users, such values remain underrepresented
in software artefacts.
[1]
S. H. Schwartz, An Overview of the Schwartz Theory of Basic Values. Online Readings in Psychology and Culture (2012)
[2]
T. Iqbal et al., Mining Reddit as a New Source for Software Requirements. IEEE International Requirements Engineering Conference (2021)
[3]
A. Nurwidyantoro et al., Human values in software development artefacts: A case study on issue discussions in three Android applications. Information and Software Technology (2021)
Evaluating LLMs for App Feature Extraction in App Reviews Using Semantic Similarity Metrics
Description Users frequently submit feedback through reviews on app marketplaces, discussing new features or expressing opinions about existing ones. Automatically extracting app features from these reviews is essential for conducting feature-level sentiment analysis—also known as “aspect-based sentiment analysis”—which helps developers improve their apps based on user feedback.
Recently, large language models (LLMs) have been applied to extract app features from user reviews with zero-shot and few-shot performance [1]. Traditional evaluation metrics rely on exact or partial matches, but these can be overly rigid. Semantic similarity-based evaluation provides a more flexible and meaningful approach, as LLMs can correct typos in user reviews and generate feature words that are semantically related to the true features (e.g., “picture” instead of “photo”).
This project explores the use of semantic similarity-based metrics to evaluate the accuracy of LLMs in extracting app features, offering a more nuanced assessment than traditional exact or partial match criteria.
[1]
Shah, F.A., Sabir, A., Sharma, R., Pfahl, D. (2025). How Effectively Do LLMs Extract Feature-Sentiment Pairs from App Reviews?. REFSQ 2025. Lecture Notes in Computer Science, vol 15588. Springer, Cham.
Cross-cultural research in perception and cognition has demonstrated that individuals
from different backgrounds process information in distinct ways, with East Asians tending toward holistic perspectives and Westerners
favoring more analytical approaches. These cultural patterns raise important questions for computational models that are trained primarily
on linguistic data. Vision-Language Models (VLMs), in particular, learn to connect textual and visual information, and their outputs may
reflect not only structural properties of language but also culturally embedded modes of reasoning. When restricted to English, however,
such models are trained within a predominantly Western linguistic and cultural context, which may influence their attentional patterns and
descriptive tendencies.
This project aims to examine whether VLMs trained predominantly on English text exhibit cultural biases consistent with analytic perceptual styles, and how these biases manifest in image description tasks. The goal is to systematically analyze the presence of culturally grounded attentional patterns in the dominant English language in each model, evaluate their implications for fairness and inclusivity, and establish a foundation for understanding how cultural cognition is implicitly reproduced in large-scale multimodal training.
[1]
Masuda, T. (2001). Attending holistically versus analytically: Comparing the context sensitivity of Japanese and Americans. Journal of Personality and Social Psychology.