Departamento de Filología Inglesa

URI permanente para esta comunidadhttps://hdl.handle.net/10953/32

En esta Comunidad se recogen los documentos generados por el Departamento de Filología Inglesa y que cumplen los requisitos de Copyright para su difusión en acceso abierto.

Examinar

Mostrando 1 - 6 de 6

Automatic lexical collocate extraction for corpus-based ontology building and refinement: A FunGramKB case study of the THEFT conceptual scenario
(John Benjamins, 2021-12-15) Fernández-Martínez, Nicolás José; Felices-Lago, Ángel
Traditional corpus-based methods rely on manual inspection and extraction of lexical collocates in the study of selection preferences, which is a very costly, labor-intensive, and time-consuming task. Devising automatic methods for lexical collocate extraction becomes necessary to handle this task and the immensity of corpora available. With a view to leveraging the Sketch Engine platform and in-built corpora, we propose a working prototype of a Lexical Collocate Extractor (LeCoExt) command-line tool that mines lexical collocates from all types of verbs according to their syntactic constituents and Collocate Frequency Score (CFS). This might be the first tool that performs comprehensive corpus-based studies of the selection preferences of individual or groups of verbs exploiting the capabilities offered by Sketch Engine. This tool might facilitate the task of extracting rich lexico-semantic knowledge from diverse corpora in a few seconds and at a click away. We test its performance for ontology building and refinement departing from a previous detailed analysis of stealing verbs carried out by Fernández-Martínez & Faber (2020). We show how the proposed tool is used to extract conceptual-cognitive knowledge from the THEFT scenario and implement it into FunGramKB Core Ontology through the creation and modification of theft-related conceptual units.
Introducing the NLP task of negative attitudinal function identification
(Sociedad Espanola para el Procesamiento del Lenguaje Natural, 2024) Fernández-Martínez, Nicolás José
On social media, users often express emotions, judgments, and evaluations on various social and private topics, detectable through automated methods. While NLP tasks like emotion detection and dialogue act classification focus on identifying emotions and intentions in texts, little attention has been paid to the attitudinal function of a text, such as expressing dislike, disagreement, pessimism, disapproval, etc. Our main contribution introduces the NLP task of negative attitudinal function identification, going beyond emotion detection and dialogue classification by focusing on users’ intent and the expression of negative emotions, and negative ethical and aesthetic evaluations. We present a basic synthetic dataset for negative attitudinal functions built with foreign language teaching and learning resources. The dataset was used to develop negative attitudinal function models with supervised approaches, which were compared against other baseline models based on social media emotion detection datasets whose emotion categories were mapped to negative attitudinal functions. Our models, though not consistently outperforming baselines due to the qualitative differences of the tasks, use of out-of-domain data, and labeling issues found in the emotion detection datasets, exhibit promising capabilities with unseen data and in multilingual contexts.
Knowledge-based rules for the extraction of complex, fine-grained locative references from tweets
(Asociacion Espanola de Linguistica Aplicada, 2020-07-14) Fernández-Martínez, Nicolás José
The automatic analysis of user-generated text content from social media involves the challenge of extracting the locative references mentioned in microtexts, so that their geographic coordinates can be identified and the locations can be pinpointed on a map in geolocation systems. The goal of this article is to describe a knowledge-based model that captures a wide variety of locative references, ranging from geopolitical entities and natural landforms to points of interest and traffic ways, from English and Spanish tweets.
Taxonomía de funciones comunicativas negativas para su identificación automática en el contexto de las ciudades inteligentes
(Tirant lo Blanch, 2024-01-26) Fernández-Martínez, Nicolás José
The FGLOCTweet Corpus: An English tweet-based corpus for fine-grained location-detection tasks
(Spanish Association for Corpus Linguistics, 2022-01) Fernández-Martínez, Nicolás José
Location detection in social-media microtexts is an important natural language processing task for emergency-based contexts where locative references are identified in text data. Spatial information obtained from texts is essential to understand where an incident happened, where people are in need of help and/or which areas have been affected. This information contributes to raising emergency situation awareness, which is then passed on to emergency responders and competent authorities to act as quickly as possible. Annotated text data are necessary for building and evaluating location-detection systems. The problem is that available corpora of tweets for location-detection tasks are either lacking or, at best, annotated with coarse-grained location types (e.g. cities, towns, countries, some buildings, etc.). To bridge this gap, we present our semi-automatically annotated corpus, the Fine-Grained LOCation Tweet Corpus (FGLOCTweet Corpus), an English tweet-based corpus for fine-grained location-detection tasks, including fine-grained locative references (i.e. geopolitical entities, natural landforms, points of interest and traffic ways) together with their surrounding locative markers (i.e. direction, distance, movement or time). It includes annotated tweet data for training and evaluation purposes, which can be used to advance research in location detection, as well as in the study of the linguistic representation of place or of the microtext genre of social media.
Who stole what from whom? A corpus-based, cross-linguistic study of English and Spanish verbs of stealing
(John Benjamins, 2020-01-01) Fernández-Martínez, Nicolás José; Faber, Pamela
Drawing on the Lexical Grammar Model, Frame Semantics and Corpus Pattern Analysis, we analyze and contrast verbs of stealing in English and Spanish from a lexico-semantic perspective. This involves looking at the lexical collocates and their corresponding semantic categories that fill the argument slots of verbs of stealing. Our corpus search is performed with the Word Sketch tool on Sketch Engine. To the best of our knowledge, no study has yet taken advantage of the Word Sketch tool in the study of the selection preferences of verbs of stealing, let alone a semantic, cross-linguistic study of those verbs. Our findings reveal that English and Spanish verbs of stealing map out the same underlying semantic space. This shared conceptual layer can thus be incorporated into an ontology based on deep semantics, which could in turn enhance NLP tasks such as word sense disambiguation, machine translation, semantic tagging, and semantic parsing.

RUJA: Repositorio Institucional de Producción Científica

Examinar

Examinando Departamento de Filología Inglesa por Autor "Fernández-Martínez, Nicolás José"