RUJA: Repositorio Institucional de Producción Científica

 

The FGLOCTweet Corpus: An English tweet-based corpus for fine-grained location-detection tasks

dc.contributor.authorFernández-Martínez, Nicolás José
dc.date.accessioned2025-01-30T15:47:52Z
dc.date.available2025-01-30T15:47:52Z
dc.date.issued2022-01
dc.description.abstractLocation detection in social-media microtexts is an important natural language processing task for emergency-based contexts where locative references are identified in text data. Spatial information obtained from texts is essential to understand where an incident happened, where people are in need of help and/or which areas have been affected. This information contributes to raising emergency situation awareness, which is then passed on to emergency responders and competent authorities to act as quickly as possible. Annotated text data are necessary for building and evaluating location-detection systems. The problem is that available corpora of tweets for location-detection tasks are either lacking or, at best, annotated with coarse-grained location types (e.g. cities, towns, countries, some buildings, etc.). To bridge this gap, we present our semi-automatically annotated corpus, the Fine-Grained LOCation Tweet Corpus (FGLOCTweet Corpus), an English tweet-based corpus for fine-grained location-detection tasks, including fine-grained locative references (i.e. geopolitical entities, natural landforms, points of interest and traffic ways) together with their surrounding locative markers (i.e. direction, distance, movement or time). It includes annotated tweet data for training and evaluation purposes, which can be used to advance research in location detection, as well as in the study of the linguistic representation of place or of the microtext genre of social media.es_ES
dc.identifier.citationFernández-Martínez, N. J. (2022). The FGLOCTweet Corpus: An English tweet-based corpus for fine-grained location-detection tasks. Research in Corpus Linguistics, 10(1), 117–133. https://doi.org/10.32714/ricl.10.01.06es_ES
dc.identifier.issn22434712es_ES
dc.identifier.other10.32714/ricl.10.01.06es_ES
dc.identifier.urihttps://hdl.handle.net/10953/4579
dc.language.isoenges_ES
dc.publisherSpanish Association for Corpus Linguisticses_ES
dc.relation.ispartofResearch in Corpus Linguistics [2022]; [10(1)]; [117-133]es_ES
dc.rightsAtribución-NoComercial-SinDerivadas 3.0 España*
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.subjectcorpus for training and evaluating modelses_ES
dc.subjectfine-grained locationses_ES
dc.subjectlocation detectiones_ES
dc.subjectlocative referenceses_ES
dc.subjecttweetses_ES
dc.titleThe FGLOCTweet Corpus: An English tweet-based corpus for fine-grained location-detection taskses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.type.versioninfo:eu-repo/semantics/publishedVersiones_ES

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
The FGLOCTweet Corpus An English tweet-based corpus for fine-grained location-detection tasks.pdf
Tamaño:
320.57 KB
Formato:
Adobe Portable Document Format
Descripción:

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
1.98 KB
Formato:
Item-specific license agreed upon to submission
Descripción:

Colecciones