Everything sound & ear training related

SoundGym

profile
A U
Sep 22, 2023
Edit.
To all musicians out there !
We have 2 dry vocals.
First one, we call it Deep
Second one, we call it Thin.
One of the voices is real person, the other is AI.

Debate :

https://drive.google.com/file/d/10oPQH4vIq1WBxrufuiFcHT8-8N1J5fSg/view?usp=sharing

In this second example we have @Alexandra Escalle Vs. my trained AI voice :

https://drive.google.com/file/d/11euFSBwLn_MOeTqjtP22qcPE5ZlVsoCz/view?usp=sharing

PS: The AI is the same as in the first example
profile
Alexandra Escalle
Sep 22, 2023
That's not kiiiind 😭😂
profile
A U
Sep 22, 2023
One of the voices is copying the other.
Which one's which and why do you think that ?
profile
Nicholas Stolarczuk
Sep 22, 2023
Feels like a trick question because one sounds obviously more processed than the other.
profile
Colin Aiken
Sep 23, 2023
Interesting.
profile
Christian Piller
Sep 23, 2023
Interesting question. Obviously on the second one has quite recognizable pitch correction on it so I would go for this one as the AI voice. But I really don't know. Both sound correct in articulation...

mhmm.... take the red or the green pill ...
profile
A U
Sep 23, 2023
If it would have been so simple, i wouldn't have posted in the first place.
Pay attention to details.
profile
Robert Skelton
Sep 23, 2023
Does it matter? We are fooked either way.
profile
Robert Skelton
Sep 23, 2023
let me add to this. Music should be a journey with the soul and the talent imbued in you, a brilliant creator. If you embrace AI as the inevitable future then, not only have you turned your back on your abilities, but embraced the ultimate
thief of self fulfillment. Chose wisely, the shiny things are placed before the trap. I have a feeling we may have to pull the plug soon. The big red stop button is on a wall and within reach. only at the precipice
though.... huh.
profile
Wannes Breyne
Sep 23, 2023
considering the amount of processing and pitch correction on the 2nd voice, i'll go for that one.
profile
Cuantas Vacas
Sep 23, 2023
If AI is as advanced as it promises, I'd say you can have her singing in a way that no pitch correction should be needed. I guess pitch control is much easier to achieve artificially than a convincing articulation, and both articulate quite well... So I go with Ms. Deep to be AI.💻
profile
Marc-André Bleau
Sep 23, 2023
It's not dry, there is reverb on one of the vocal. To compare let's put them in the same atmosphere. Without that it's not fair
profile
Tairone Magalhaes
Sep 23, 2023
I agree with the other comments, it is hard to compare the voices when there is a lot of processing applied to them. Anyway, here are my impressions:

- In terms of EQ-ing, the first voice has no high frequency content; it sounds like a low-pass filter was applied to it at ~8Khz or maybe even lower. That is pretty common in AI-generated audio, cause many models are trained at lower sampling rates to reduce the computational cost of the model (although this has changed a lot lately). Therefore, this characteristic might bias the listener to perceive it as the AI voice.

- The second voice spans a wider frequency range, which makes it fell more natural in terms of EQ-ing. However, there is a pronounced chorusy effect applied to it (I would guess it is an ensemble processor), which might fool a listener into thinking it was artificially generated. There is also some sibilance and a substantial amount of reverb (it could be DSP or just natural room ambience).

Despite that those differences make it harder to compare and judge which one was articially generated, my immediate impression was that the first voice was obviously generated by AI. The EQ difference probably plays a significant part in my guess, but the decisive characteristic are some weird artifacts present in it, specially at 0:57 (weird breathing sound), 1:00 (unnatural sustained vowel sound). I could be wrong, cause those artifacts sound a bit like some typical artifacts produced by auto-tune plugins, but I still think they were AI-generated.

Now, to make the comparison fair I would suggest:
- matching the tonal characteristics of both voices using some basic EQ, if the AI model is not able to produce high frequencies;
- not applying any extra effect;
- avoid to record the natural voice in an environment with significant ambience;
- avoid sibilance or pops to affect the judgement of the listener.

With that, I believe you can obtain be a more fair comparison between the voices. 😉
profile
A U
Sep 23, 2023
Cymatics advertised it as dry vocal.
They probably used autotune or something similar during recording.
Deep voice is the AI , as @Cuantas Vacas said, the Thin one is real person.
On the second example, the AI sings French, just as good as Alexandra does 😄
profile
A U
Sep 23, 2023
Hi Tairone and thanks for your awesomely explained POV.
The AI is trained by me using real person dry vocals.
However, she doesn't have a high quality mic, that explains the lack of high frequencies.
The AI quality could be better if i were to pay a bit more.
So far i'm impressed with what the technology can do.
profile
Maik Frenkel
Sep 23, 2023
Tirone Magalhaes das sehe ich komplett genau so, danke du hast mir einen Riesen Text erspart ;-)
profile
Tairone Magalhaes
Sep 23, 2023
I wrote my first comment based on the first excerpt only. Just realized there is a second excerpt that makes a comparison a bit more fair (could still use some EQ matching though).

So, in the second example I believe you inverted the order. The first voice is probably Alexandra's, while the second is generated by AI.
profile
Tairone Magalhaes
Sep 23, 2023
I agree that the technology is impressive. It's just a matter of time until it becomes impossible to know the difference by ear. We will probably have to rely on AI-based tools to apply forensic analysis to audio in the future, since AI generated audio will become more and more convincing.