---
library_name: transformers
license: mit
language:
- en
pipeline_tag: object-detection
---
This model is fine-tuned version of microsoft/conditional-detr-resnet-50.

You can find details of model in this [fashion-visual-search](https://github.com/yainage90/fashion-visual-search)

This model was trained using a combination of two datasets: [modanet](https://github.com/eBay/modanet) and [fashionpedia](https://fashionpedia.github.io/home/)

The labels are ['bag', 'bottom', 'dress', 'hat', 'shoes', 'outer', 'top']

In the 96th epoch out of total of 100 epochs, the best score was achieved with mAP 0.7542. Therefore, it is believed that there is a little room for performance improvement.

![sample_image](sample_image.png)