Edit. To all musicians out there ! We have 2 dry vocals. First one, we call it Deep Second one, we call it Thin. One of the voices is real person, the other is AI.
Interesting question. Obviously on the second one has quite recognizable pitch correction on it so I would go for this one as the AI voice. But I really don't know. Both sound correct in articulation...
let me add to this. Music should be a journey with the soul and the talent imbued in you, a brilliant creator. If you embrace AI as the inevitable future then, not only have you turned your back on your abilities, but embraced the ultimate thief of self fulfillment. Chose wisely, the shiny things are placed before the trap. I have a feeling we may have to pull the plug soon. The big red stop button is on a wall and within reach. only at the precipice though.... huh.
If AI is as advanced as it promises, I'd say you can have her singing in a way that no pitch correction should be needed. I guess pitch control is much easier to achieve artificially than a convincing articulation, and both articulate quite well... So I go with Ms. Deep to be AI.💻
I agree with the other comments, it is hard to compare the voices when there is a lot of processing applied to them. Anyway, here are my impressions:
- In terms of EQ-ing, the first voice has no high frequency content; it sounds like a low-pass filter was applied to it at ~8Khz or maybe even lower. That is pretty common in AI-generated audio, cause many models are trained at lower sampling rates to reduce the computational cost of the model (although this has changed a lot lately). Therefore, this characteristic might bias the listener to perceive it as the AI voice.
- The second voice spans a wider frequency range, which makes it fell more natural in terms of EQ-ing. However, there is a pronounced chorusy effect applied to it (I would guess it is an ensemble processor), which might fool a listener into thinking it was artificially generated. There is also some sibilance and a substantial amount of reverb (it could be DSP or just natural room ambience).
Despite that those differences make it harder to compare and judge which one was articially generated, my immediate impression was that the first voice was obviously generated by AI. The EQ difference probably plays a significant part in my guess, but the decisive characteristic are some weird artifacts present in it, specially at 0:57 (weird breathing sound), 1:00 (unnatural sustained vowel sound). I could be wrong, cause those artifacts sound a bit like some typical artifacts produced by auto-tune plugins, but I still think they were AI-generated.
Now, to make the comparison fair I would suggest: - matching the tonal characteristics of both voices using some basic EQ, if the AI model is not able to produce high frequencies; - not applying any extra effect; - avoid to record the natural voice in an environment with significant ambience; - avoid sibilance or pops to affect the judgement of the listener.
With that, I believe you can obtain be a more fair comparison between the voices. 😉
Cymatics advertised it as dry vocal. They probably used autotune or something similar during recording. Deep voice is the AI , as @Cuantas Vacas said, the Thin one is real person. On the second example, the AI sings French, just as good as Alexandra does 😄
Hi Tairone and thanks for your awesomely explained POV. The AI is trained by me using real person dry vocals. However, she doesn't have a high quality mic, that explains the lack of high frequencies. The AI quality could be better if i were to pay a bit more. So far i'm impressed with what the technology can do.
I wrote my first comment based on the first excerpt only. Just realized there is a second excerpt that makes a comparison a bit more fair (could still use some EQ matching though).
So, in the second example I believe you inverted the order. The first voice is probably Alexandra's, while the second is generated by AI.
I agree that the technology is impressive. It's just a matter of time until it becomes impossible to know the difference by ear. We will probably have to rely on AI-based tools to apply forensic analysis to audio in the future, since AI generated audio will become more and more convincing.
0 props
Space Description
Discuss and everything regarding your SoundGym training.
- Ask for training tips
- Share your SoundGym experience
- Celebrate your achievments
We use cookies to improve your experience. Essential cookies keep the site running. We also use optional cookies to enhance performance, analyze traffic, and personalize ads. By clicking “Accept”, you agree to the use of all cookies.
Sep 22, 2023
Sep 22, 2023
Sep 22, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023
Sep 23, 2023