TensorBoard
File size: 6,994 Bytes
573027e
38f0a65
573027e
38f0a65
f8efefb
 
 
 
2d659c3
ad15daf
f8efefb
 
ccee439
 
b01ace1
 
 
 
 
 
 
 
 
 
 
 
 
5258730
 
8471a16
 
ccee439
8471a16
 
 
19833db
 
 
 
 
 
8471a16
 
5258730
8471a16
 
 
 
 
5258730
8471a16
 
 
58b0b89
 
a6ce03f
58b0b89
 
 
5258730
 
8471a16
 
 
 
 
 
 
 
c080f61
8471a16
 
 
 
 
c080f61
8471a16
 
 
cf618e5
8471a16
 
 
 
 
cf618e5
 
 
 
 
 
0e3f3c3
5258730
0e3f3c3
 
 
 
 
 
 
5258730
 
 
0e3f3c3
 
5258730
 
 
 
 
 
 
 
19833db
 
5258730
 
 
 
 
19833db
5258730
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5729e9
5258730
f5729e9
8471a16
3c9647e
 
 
36efc5f
3c9647e
 
 
 
 
 
c25a03d
 
 
 
 
 
3c9647e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
license: cc-by-nc-4.0
---

# RAVE Models

This is a collection of [RAVE](https://github.com/acids-ircam/RAVE) models trained by the [Intelligent Instruments Lab](https://iil.is) for various projects. 

For a full description see our blog post at: https://iil.is/news/ravemodels, and for more about RAVE, see the original [paper](https://arxiv.org/abs/2111.05011) from IRCAM.

Most of these models are encoder-decoder only, no prior, and all use the `--causal` mode and are exported for streaming inference with [nn~](https://github.com/acids-ircam/nn_tilde), [NN.ar](https://github.com/elgiano/nn.ar) or [rave-supercollider](https://github.com/victor-shepardson/rave-supercollider).

In the `checkpoints/` directory are some complete checkpoints which can be used with our [fork of RAVE](https://github.com/victor-shepardson/RAVE) to speed up training by transfer learning.

Citation:

```
@misc {intelligent_instruments_lab_2023,
	author       = { {Intelligent Instruments Lab} },
	title        = { rave-models (Revision ad15daf) },
	year         = 2023,
	url          = { https://huggingface.co/Intelligent-Instruments-Lab/rave-models },
	doi          = { 10.57967/hf/1235 },
	publisher    = { Hugging Face }
}
```

## Musical Instruments

### guitar_iil_b2048_r48000_z16.ts

Dataset: [IILGuitarTimbre](https://github.com/Intelligent-Instruments-Lab/IILGuitarTimbre), a timbre-oriented collection of plucking, strumming, striking, scraping and more recorded dry from an electric guitar.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### sax_soprano_franziskaschroeder_b2048_r48000_z20.ts

Dataset: Soprano sax improvisation by [Franziska Schroeder](https://improvisationai.wordpress.com/).

Model: modified RAVE v1, 48kHz, block size 2048, 20 latent dimensions.

### organ_archive_b2048_r48000_z16.ts  

Dataset: various recordings of organ music sourced from archive.org. Small amounts of voice and other instruments were included, and vinyl record noises are prominent.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### organ_bach_b2048_sr48000_z16.ts

Dataset: various recordings of J.S. Bach music for church organ.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### mrp_strengjavera_b2048_r44100_z16.ts

Dataset: [magnetic resonator piano](https://andrewmcpherson.org/project/mrp) controlled by [artificial life](https://github.com/Intelligent-Instruments-Lab/iil-python-tools/tree/ja-dev/tolvera), as part of generative installation Strengjavera by Jack Armitage premiered at AIMC 2023. See [paper](https://aimc2023.pubpub.org/pub/83k6upv8) and [Zenodo](https://zenodo.org/records/8329855) for citation.

Model: RAVE v3, 44.1kHz, block size 2048, 16 latent dimensions.

## Voice

### voice_vocalset_b2048_r48000_z16.ts

Dataset: [VocalSet](https://zenodo.org/record/1193957) singing voice dataset.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### voice_hifitts_b2048_r48000_z16.ts

Dataset: [Hi-Fi TTS](https://www.openslr.org/109/) audiobooks dataset.

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### voice_jvs_b2048_r44100_z16.ts

Dataset: [Hi-Fi TTS](https://www.openslr.org/109/) speaker 9017 (John Van Stan).

Model: RAVE v3, 44.1kHz, block size 2048, 16 latent dimensions.

### voice_vctk_b2048_r44100_z22.ts

Dataset: [CSTR VCTK Corpus](https://datashare.ed.ac.uk/handle/10283/3443) multispeaker read speech dataset.

Model: RAVE v3, 44.1kHz, block size 2048, 22 latent dimensions.

### voice_multivoice_b2048_r48000_z11.ts

Dataset: combination of speaking and singing voice datasets: [CSTR VCTK Corpus](https://datashare.ed.ac.uk/handle/10283/3443), [VocalSet](https://zenodo.org/record/1193957), [Children's Song Dataset](https://zenodo.org/records/4785016), [NUS-48E](https://ieeexplore.ieee.org/document/6694316/), [attHACK](https://arxiv.org/abs/2004.04410).

Model: RAVE v3 with spectral discriminator, 48kHz, block size 2048, 11 latent dimensions.

## Birds

### birds_motherbird_b2048_r48000_z16.ts

This model of bird sounds was curated by Manuel Cherep, Jessica Shand and Jack Armitage for their piece Motherbird, performed at TENOR 2023 in Boston, May 2023.

Dataset: bird sounds.

Model: RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### birds_pluma_b2048_r48000_z12.ts

This model of bird sounds was curated by Giacomo Lepri for his instrument *[Pluma](http://www.giacomolepri.com/pluma)*

Dataset: bird sounds.

Model: modified RAVE v1, 48kHz, block size 2048, 12 latent dimensions.

## *Pond Brain* Marine Sounds

These models of marine sounds were trained for [Jenna Sutela](https://jennasutela.com/)'s *Pond Brain* installations at [Copenhagen Contemporary](https://copenhagencontemporary.org/en/yet-it-moves-read-online/) and the [Helsinki Biennial](https://helsinkibiennaali.fi/en/artist/jenna-sutela/)

Caution: these decoders sometimes produce a loud chirp on first initialization.

### water_pondbrain_b2048_r48000_z16.ts

Dataset: water recordings from freesound.org.
<details>
<summary>list of freesound users</summary>
inspectorj, inchadney, aesqe, vonfleisch, javetakami, atomediadesign, kolezan, zabuhailo, zaziesound, repdac3, al_sub, lgarrett, uzbazur, lydmakeren, frenkfurth, edo333, boredtoinsanity, owl, kaydinhamby, tliedes, ilmari_freesound, manoslindos, l3ardoc, alexbuk, s-light
</details>

Model: modified RAVE v1, 48kHz, block size 2048, 16 latent dimensions.

### humpbacks_pondbrain_b2048_r48000_z20.ts

Dataset: humpback whale recordings from the [Watkins database](https://cis.whoi.edu/science/B/whalesounds/index.cfm), [MBARI](https://freesound.org/people/MBARI_MARS/), and BBC.

Model: modified RAVE v1, 48kHz, block size 2048, 20 latent dimensions.

### marinemammals_pondbrain_b2048_r48000_z20.ts

Dataset: various marine mammal sounds from [NOAA](https://www.fisheries.noaa.gov/national/science-data/sounds-ocean-mammals), the [Watkins database](https://cis.whoi.edu/science/B/whalesounds/index.cfm), freesound users `felixblume` and `geraldfiebig`, and sound effects databases.

Model: modified RAVE v1, 48kHz, block size 2048, 20 latent dimensions.


## *Thales* magnets_b2048_r48000_z8.ts

Dataset: One hour recording of magnets of different dimensions hitting each other or scratching wooden and metallic surfaces. Used for [Thales](https://iil.is/pdf/2023_nime_privato_et_al_thales.pdf), a musical instrument based on magnets 

Model: RAVE v1, 48Khz, block size 2048, 8 latent dimensions.


## *Crozzoli's Music* crozzoli_bigensemblesmusic_18d.ts

Dataset: Six recordings of long contemporary compositions for electronic and acoustic big ensembles.

Model: RAVE v3, 48Khz, block size 2048, 18 latent dimensions.


## *Aulus-les-Bains Dawn Chorus @ CAMP* birds_dawnchorus_b2048_r48000_z8.ts

Dataset: ~230 minutes of dawn chorus recorded by Gregory White at Aulus-les-Bains as part of a residency at CAMPfr.com.

Model: RAVE v3, 48Khz, block size 2048, 8 latent dimensions.