KennethEnevoldsen commited on
Commit
8c88b10
1 Parent(s): 04cad87

Update readme

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - da
4
+ - nn
5
+ - nb
6
+ ---
7
+
8
+
9
+ ## Citation
10
+
11
+ When referring to this work please use the following citation.
12
+
13
+ ```
14
+ @inproceedings{al-laith-etal-2024-development,
15
+ title = "Development and Evaluation of Pre-trained Language Models for Historical {D}anish and {N}orwegian Literary Texts",
16
+ author = "Al-Laith, Ali and
17
+ Conroy, Alexander and
18
+ Bjerring-Hansen, Jens and
19
+ Hershcovich, Daniel",
20
+ editor = "Calzolari, Nicoletta and
21
+ Kan, Min-Yen and
22
+ Hoste, Veronique and
23
+ Lenci, Alessandro and
24
+ Sakti, Sakriani and
25
+ Xue, Nianwen",
26
+ booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
27
+ month = may,
28
+ year = "2024",
29
+ address = "Torino, Italia",
30
+ publisher = "ELRA and ICCL",
31
+ url = "https://aclanthology.org/2024.lrec-main.431",
32
+ pages = "4811--4819",
33
+ abstract = "We develop and evaluate the first pre-trained language models specifically tailored for historical Danish and Norwegian texts. Three models are trained on a corpus of 19th-century Danish and Norwegian literature: two directly on the corpus with no prior pre-training, and one with continued pre-training. To evaluate the models, we utilize an existing sentiment classification dataset, and additionally introduce a new annotated word sense disambiguation dataset focusing on the concept of fate. Our assessment reveals that the model employing continued pre-training outperforms the others in two downstream NLP tasks on historical texts. Specifically, we observe substantial improvement in sentiment classification and word sense disambiguation compared to models trained on contemporary texts. These results highlight the effectiveness of continued pre-training for enhancing performance across various NLP tasks in historical text analysis.",
34
+ }
35
+ ```