File size: 3,200 Bytes
68e93fb
 
 
5df68a6
 
68e93fb
 
 
 
5df68a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68e93fb
 
5df68a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68e93fb
5df68a6
68e93fb
 
 
 
 
 
 
 
 
 
 
 
5df68a6
 
 
 
 
 
 
68e93fb
5df68a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
tags:
- spacy
- arxiv:2408.06930
- medical
language:
- nl
license: cc-by-sa-4.0
model-index:
- name: Echocardiogram_SpanCategorizer_rv_dil
  results: 
  - task: 
      type: token-classification
    dataset:
      type: test
      name: "internal test set"
    metrics:
    - name: "Weighted f1"
      type: f1
      value: 0.901
      verified: false 
    - name: "Weighted precision"
      type: precision
      value: 0.926
      verified: false
    - name: "Weighted recall"
      type: recall
      value: 0.877
      verified: false
    
pipeline_tag: token-classification
metrics:
- f1
- precision
- recall
---

# Description
This model is a spaCy SpanCategorizer model trained from scratch on Dutch echocardiogram reports sourced from Electronic Health Records. The publication associated with the span classification task can be found at https://arxiv.org/abs/2408.06930. The config file for training the model can be found at https://github.com/umcu/echolabeler.

# Minimum working example
```python
!pip install https://huggingface.co/baukearends/Echocardiogram-SpanCategorizer-rv-dil/resolve/main/nl_Echocardiogram_SpanCategorizer_rv_dil-any-py3-none-any.whl
```
```python
import spacy
nlp = spacy.load("nl_Echocardiogram_SpanCategorizer_rv_dil")
```
```python
prediction = nlp("Op dit echo geen duidelijke WMA te zien, goede systolische L.V. functie, normale dimensies LV en RV, wel L.V.H., diastolische dysfunctie graad 1A tot 2. Geringe aortastenose en - matige -insufficientie. Geringe M.I.")
for span, score in zip(prediction.spans['sc'], prediction.spans['sc'].attrs['scores']):
    print(f"Span: {span}, label: {span.label_}, score: {score[0]:.3f}")
```

# Label Scheme

<details>

<summary>View label scheme (5 labels for 1 components)</summary>

| Component | Labels |
| --- | --- |
| **`spancat`** | `rv_dil_normal`, `rv_dil_severe`, `rv_dil_mild`, `rv_dil_moderate`, `rv_dil_present` |

</details>


# Intended use
The model is developed for span classification on Dutch clinical text. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch.

# Data
The model was trained on approximately 4,000 manually annotated echocardiogram reports from the University Medical Centre Utrecht. The training data was anonymized before starting the training procedure.

| Feature | Description |
| --- | --- |
| **Name** | `Echocardiogram_SpanCategorizer_rv_dil` |
| **Version** | `1.0.0` |
| **spaCy** | `>=3.7.4,<3.8.0` |
| **Default Pipeline** | `tok2vec`, `spancat` |
| **Components** | `tok2vec`, `spancat` |
| **License** | `cc-by-sa-4.0` |
| **Author** | [Bauke Arends]() |

# Contact
If you are having problems with this model please add an issue on our git: https://github.com/umcu/echolabeler/issues

# Usage
If you use the model in your work please use the following referral; https://doi.org/10.48550/arXiv.2408.06930

# References
Paper: Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es (2024): Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification, Arxiv https://arxiv.org/abs/2408.06930