Music style translation aims to generate variations of existing pieces of music by altering the style-related characteristics of the original piece while content, such as the melody, remains unchanged. These alterations could involve timbre translation, re-harmonization, or music rearrangement. Previous studies have achieved promising results utilizing time-frequency and symbolic music representations. Music style translation on raw audio has also been investigated and applied to single-instrument pieces. Although processing raw audio is more challenging, it provides richer information about timbres, dynamics, and articulations. We introduce Music-STAR, the first audio-based translation system that translates the existing instruments in a piece into a set of target instruments without using source separation. To conduct our experiments, we use the StarNet dataset, which includes strings-piano and vibraphone-clarinet mixtures alongside their stems. We also compare the performance of our model to baseline approaches performed by single-instrument translation and separation-translation pipelines.
The following tables include some random samples both from the StarNet dataset and some famous songs, demosntrating ten music pieces performed by strings-piano and clarinet-vibraphone combinations.
| Name | Clarinet | Vibraphone | Clarinet-Vibraphone | String | Piano | Strings-Piano |
|---|---|---|---|---|---|---|
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
The following tables include the results obtained by applying single-instrument translation and then mixing the outputs.
Clarinet-Vibraphone to Strings-Piano:
| Name | Input Clarinet | Input Vibraphone | Target String | Target Piano | Target Mixture | Gold Standard |
|---|---|---|---|---|---|---|
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
Strings-Piano to Clarinet-Vibraphone:
| Name | Input String | Input Piano | Target Clarinet | Target Vibraphone | Target Mixture | Gold Standard |
|---|---|---|---|---|---|---|
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
The following tables include the results obtained by applying the separation-translation pipeline and then mixing the outputs.
Clarinet-Vibraphone to Strings-Piano:
| Name | Input Mixture | Isolated Clarinet | Isolated Vibraphone | Target String | Target Piano | Target Mixture | Gold Standard |
|---|---|---|---|---|---|---|---|
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
Strings-Piano to Clarinet-Vibraphone:
| Name | Input Mixture | Isolated String | Isolated Piano | Target Clarinet | Target Vibraphone | Target Mixture | Gold Standard |
|---|---|---|---|---|---|---|---|
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
|||||||
|
|
The following tables include the results obtained by applying the embedding-supervised method and then mixing the outputs.
Clarinet-Vibraphone to Strings-Piano:
| Name | Input Mixture | Target String | Target Piano | Target Mixture | Gold Standard |
|---|---|---|---|---|---|
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
Strings-Piano to Clarinet-Vibraphone:
| Name | Input Mixture | Target Clarinet | Target Vibraphone | Target Mixture | Gold Standard |
|---|---|---|---|---|---|
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
The following tables include the results obtained by applying stem-supervised Music-STAR.
Clarinet-Vibraphone to Strings-Piano:
| Name | Input Mixture | Target String | Target Piano | Target Mixture | Gold Standard |
|---|---|---|---|---|---|
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
Strings-Piano to Clarinet-Vibraphone:
| Name | Input Mixture | Target Clarinet | Target Vibraphone | Target Mixture | Gold Standard |
|---|---|---|---|---|---|
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
The following tables include the results obtained by applying mixture-supervised Music-STAR.
Clarinet-Vibraphone to Strings-Piano:
| Name | Input Mixture | Target Mixture | Gold Standard |
|---|---|---|---|
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
Strings-Piano to Clarinet-Vibraphone:
| Name | Input Mixture | Target Mixture | Gold Standard |
|---|---|---|---|
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
The following tables include the final results obtained by applying all five methods.
Clarinet-Vibraphone to Strings-Piano:
| Name | Single-Instrument | Separation-Translation | Embedding-supervised | Stem-supervised Music-STAR |
Mixture-supervised Music-STAR |
|---|---|---|---|---|---|
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
Strings-Piano to Clarinet-Vibraphone:
| Name | Single-Instrument | Separation-Translation | Embeddin Music-STAR |
Stem-supervised Music-STAR |
Mixture-supervised Music-STAR |
|---|---|---|---|---|---|
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
|||||
|
|
Below you can see the results of subjective evaluation as bar charts. Each method was ranked by the participants from 1 to 5 (1 being the best) in terms of content preservation, style fit, and audio quality.
The following tables shows the SDR, the metric to evaluate the performance of Demucs used for separating the stems of Strings-Piano and Clarinet-Vibraphone mixtures. The overall SDR is 7.36. You can listen to the isolated audio tracks in the Separation-Translation Pipline section.
| Clarinet-Vibraphone | Strings-Piano | |||
|---|---|---|---|---|
| Clarinet | Vibraphone | Strings | Piano | |
| Pirates of Caribbean Theme | 9.210 | 4.523 | 6.722 | 3.117 |
| My Heart Will Go on | 7.081 | 8.698 | 3.439 | 4.223 |
| Beethoven's String | 10.423 | 4.815 | 8.649 | 3.896 |
| Moonlight Sonata | 9.015 | 10.227 | 11.048 | 8.953 |
| Fur Elise | 3.647 | 3.529 | 3.672 | 3.101 |
| Brahms's Clarinet | 10.071 | 8.622 | 3.053 | 12.397 |
| Beethoven's Piano | 8.906 | 14.570 | 5.271 | 8.280 |
| Dvorak's String | 7.880 | 10.780 | 5.228 | 8.231 |
| Romeo and Juliet | 5.831 | 6.675 | 0.529 | 6.929 |
| Nuvole Blanche | 16.948 | 12.318 | 8.370 | 5.600 |
| Average SDR | 8.901 | 8.476 | 5.598 | 6.473 |