philipp-zettl's picture
Add new SentenceTransformer model.
b4ba8b3 verified
metadata
language: []
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:1267
  - loss:CoSENTLoss
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets: []
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: Give me suggestions for a high-quality DSLR camera
    sentences:
      - faq query
      - subscription query
      - faq query
  - source_sentence: Aidez-moi à configurer une nouvelle adresse e-mail
    sentences:
      - order query
      - faq query
      - feedback query
  - source_sentence: Как я могу изменить адрес доставки?
    sentences:
      - support query
      - product query
      - product query
  - source_sentence: ساعدني في حذف الملفات الغير مرغوب فيها من هاتفي
    sentences:
      - technical support query
      - product recommendation
      - faq query
  - source_sentence: Envoyez-moi la politique de garantie de ce produit
    sentences:
      - faq query
      - account query
      - faq query
pipeline_tag: sentence-similarity
model-index:
  - name: >-
      SentenceTransformer based on
      sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: MiniLM dev
          type: MiniLM-dev
        metrics:
          - type: pearson_cosine
            value: 0.6538226572138826
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6336766646599241
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.5799895241429639
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.5525776786782183
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.5732001104236694
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.5394971970682657
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6359725423136287
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6237936341101822
            name: Spearman Dot
          - type: pearson_max
            value: 0.6538226572138826
            name: Pearson Max
          - type: spearman_max
            value: 0.6336766646599241
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: MiniLM test
          type: MiniLM-test
        metrics:
          - type: pearson_cosine
            value: 0.6682368113711722
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6222011918428743
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.5714617063306076
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.5481366191719228
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.5726946277850402
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.549312247309557
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6396412507506479
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6107388175009413
            name: Spearman Dot
          - type: pearson_max
            value: 0.6682368113711722
            name: Pearson Max
          - type: spearman_max
            value: 0.6222011918428743
            name: Spearman Max

SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("philipp-zettl/MiniLM-similarity-small")
# Run inference
sentences = [
    'Envoyez-moi la politique de garantie de ce produit',
    'faq query',
    'account query',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6538
spearman_cosine 0.6337
pearson_manhattan 0.58
spearman_manhattan 0.5526
pearson_euclidean 0.5732
spearman_euclidean 0.5395
pearson_dot 0.636
spearman_dot 0.6238
pearson_max 0.6538
spearman_max 0.6337

Semantic Similarity

Metric Value
pearson_cosine 0.6682
spearman_cosine 0.6222
pearson_manhattan 0.5715
spearman_manhattan 0.5481
pearson_euclidean 0.5727
spearman_euclidean 0.5493
pearson_dot 0.6396
spearman_dot 0.6107
pearson_max 0.6682
spearman_max 0.6222

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,267 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.77 tokens
    • max: 18 tokens
    • min: 4 tokens
    • mean: 5.31 tokens
    • max: 6 tokens
    • min: 0.0
    • mean: 0.67
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Get information on the next art exhibition product query 0.0
    Show me how to update my profile product query 0.0
    Покажите мне доступные варианты полетов в Турцию faq query 0.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 159 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.65 tokens
    • max: 17 tokens
    • min: 4 tokens
    • mean: 5.35 tokens
    • max: 6 tokens
    • min: 0.0
    • mean: 0.67
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Sende mir die Bestellbestätigung per E-Mail order query 0.0
    How do I add a new payment method? faq query 1.0
    No puedo conectar mi impresora, ¿puedes ayudarme? support query 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss MiniLM-dev_spearman_cosine MiniLM-test_spearman_cosine
0.0629 10 6.2479 2.5890 0.1448 -
0.1258 20 4.3549 2.2787 0.1965 -
0.1887 30 3.5969 2.0104 0.2599 -
0.2516 40 2.4979 1.7269 0.3357 -
0.3145 50 2.5551 1.5747 0.4439 -
0.3774 60 3.1446 1.4892 0.4750 -
0.4403 70 2.1353 1.5305 0.4662 -
0.5031 80 2.9341 1.3718 0.4848 -
0.5660 90 2.8709 1.2469 0.5316 -
0.6289 100 2.1367 1.2558 0.5436 -
0.6918 110 2.2735 1.2939 0.5392 -
0.7547 120 2.8646 1.1206 0.5616 -
0.8176 130 3.3204 1.0213 0.5662 -
0.8805 140 0.8989 0.9866 0.5738 -
0.9434 150 0.0057 0.9961 0.5674 -
1.0063 160 0.0019 1.0111 0.5674 -
1.0692 170 0.4617 1.0275 0.5747 -
1.1321 180 0.0083 1.0746 0.5732 -
1.1950 190 0.5048 1.0968 0.5753 -
1.2579 200 0.0002 1.0840 0.5738 -
1.3208 210 0.07 1.0364 0.5753 -
1.3836 220 0.0 0.9952 0.5750 -
1.4465 230 0.0 0.9922 0.5744 -
1.5094 240 0.0 0.9923 0.5726 -
1.0126 250 0.229 0.9930 0.5729 -
1.0755 260 2.2061 0.9435 0.5880 -
1.1384 270 2.7711 0.8892 0.6078 -
1.2013 280 0.7528 0.8886 0.6148 -
1.2642 290 0.386 0.8927 0.6162 -
1.3270 300 0.8902 0.8710 0.6267 -
1.3899 310 0.9534 0.8429 0.6337 -
1.4403 318 - - - 0.6222

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}