SingleVC performs A2O VC through a self-supervised task((Xi →X̂si → X̂i )). X̂si is a PSDR-processed speech with pitch-shifted s. The more details can be access here.
This page provides converted speech samples. The pretrained model is trained with p249(female, 22.5-minute) from VCTK corpus.
p249_samples
We also need a small plastic snake and a big toy frog for the kids.
The weaknesses are few.
In fact, they have the opposite effect.
Utterance
Source
p249_004
p249_155
p249_316
VCTK
F1, Ask her to bring these things with her from the store.
F2, She can scoop these things into three red bags, and we will go meet her Wednesday at the train station.
M1, Please call Stella.
M2, He should have asked for a second opinion.
Utterance
Source
Convert
F1_p310_002
F2_p240_005
M1_p374_001
M2_p245_062
LibriSpeech
F1, The visit went off successfully, as was to have been expected.
F2, “He’s Gilbert Blythe,” said Marilla contentedly.
M1, All judgements do not require examination, that is, investigation into the grounds of their truth.
M2, And always that same pretext is offered–it looks like the thing.
Utterance
Source
Convert
F1_225_131256_000006_000002
F2_188_135249_000012_000000
M1_296_129659_000004_000005
M2_272_130225_000010_000007
VCC2020
F1, If not, it’s about time somebody did.
F2, The figures are adjusted for seasonal variation.
M1, The trip was a disaster.
M2, Sometimes, it helps to take a step back.
Utterance
Source
Convert
F1_TEF1_E10061
F2_SEF2_E10066
M1_SEM1_E10033
M2_TEM2_E20042
LJSpeech
F1, especially as no more time is occupied, or cost incurred, in casting, setting, or printing beautiful letters