Creating datasets of moth morphology and behaviour from textual sources with large language models

Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center for Information and Communication Technologies), University of Granada, Spain
UK Centre for Ecology and Hydrology, United Kingdom
*Corresponding author: bortiz@ugr.es
An example of our method

Abstract

The integration of language models into ecological workflows is opening new possibilities for automated species monitoring. Classification systems are especially relevant in this context, as the high volume of data generated by automated systems requires efficient tools to support expert curators. Multimodal approaches, which incorporate textual information alongside visual or acoustic data, have shown potential to improve classification performance and interpretability. However, for many insect taxa, structured and usable textual descriptions remain scarce or difficult to access. In this work, we present a tool for retrieving and merging textual information about moth species from official repositories and citable sources. The resulting descriptions can be used to enrich multimodal classification models across different taxonomic levels or to build structured databases for species comparison and discovery.

BibTeX

BibTex Code Here

Acknowledgment and funding

This publication is based upon work from COST action InsectAI CA22129, supported by COST (European Cooperation in Science and Technology) The authors would also like to acknowledge the support from: University of Granada Mobility plans programme and UK Centre for Ecology and Hydrology