sinequa
/

passage-ranker.pistachio

Text Classification

Inference Endpoints

Model card Files Files and versions Community

clarine commited on Jul 17

Commit

44a46d4

•

1 Parent(s): d4a1a0e

Add metrics on Polish datasets

Files changed (1) hide show

README.md +30 -3

README.md CHANGED Viewed

@@ -38,9 +38,10 @@ Besides the aforementioned languages, basic support can be expected for addition
 ## Scores
-| Metric              | Value |
-|:--------------------|------:|
-| Relevance (NDCG@10) | 0.480 |
 Note that the relevance score is computed as an average over 14 retrieval datasets (see
 [details below](#evaluation-metrics)).
@@ -93,6 +94,8 @@ can be around 0.5 to 1 GiB depending on the used GPU.
 ### Evaluation Metrics
 To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
 [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
@@ -115,6 +118,30 @@ To determine the relevance score, we averaged the results that we obtained when
 | TREC-COVID        |   0.651 |
 | Webis-Touche-2020 |   0.312 |
 We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics for the existing languages.
 | Language              | NDCG@10 |

 ## Scores
+| Metric                      | Value |
+|:----------------------------|------:|
+| English Relevance (NDCG@10) | 0.474 |
+| Polish Relevance (NDCG@10)  | 0.380 |
 Note that the relevance score is computed as an average over 14 retrieval datasets (see
 [details below](#evaluation-metrics)).
 ### Evaluation Metrics
+##### English
 To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the
 [BEIR benchmark](https://github.com/beir-cellar/beir). Note that all these datasets are in English.
 | TREC-COVID        |   0.651 |
 | Webis-Touche-2020 |   0.312 |
+#### Polish
+This model has polish capacities, that are being evaluated over a subset of
+the [PIRBenchmark](https://github.com/sdadas/pirb) with BM25 as the first stage retrieval.
+| Dataset       | NDCG@10 |
+|:--------------|--------:|
+| Average       |   0.380 |
+|               |         |
+| arguana-pl    |   0.285 |
+| dbpedia-pl    |   0.283 |
+| fiqa-pl       |   0.223 |
+| hotpoqa-pl    |   0.603 |
+| msmarco-pl    |   0.259 |
+| nfcorpus-pl   |   0.293 |
+| nq-pl         |   0.355 |
+| quora-pl      |   0.613 |
+| scidocs-pl    |   0.128 |
+| scifact-pl    |   0.581 |
+| trec-covid-pl |   0.560 |
+#### Other languages
 We evaluated the model on the datasets of the [MIRACL benchmark](https://github.com/project-miracl/miracl) to test its multilingual capacities. Note that not all training languages are part of the benchmark, so we only report the metrics for the existing languages.
 | Language              | NDCG@10 |