I'm On A Roll Today

I'm On A Roll Today

I'm on a roll today

More Posts from Skelerose and Others

1 year ago
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions

Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions

1 year ago
Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over
Ars Technica
OpenAI could be fined up to $150,000 for each piece of infringing content.

Well, this would be interesting...

11 months ago
skelerose - Angel
5 months ago
Playing Around With Lines Today

Playing around with lines today


Tags
10 months ago
BARI BARI DENSETSU (バリバリ伝説) 1987 | Dir. Osamu Uemura
BARI BARI DENSETSU (バリバリ伝説) 1987 | Dir. Osamu Uemura
BARI BARI DENSETSU (バリバリ伝説) 1987 | Dir. Osamu Uemura
BARI BARI DENSETSU (バリバリ伝説) 1987 | Dir. Osamu Uemura
BARI BARI DENSETSU (バリバリ伝説) 1987 | Dir. Osamu Uemura

BARI BARI DENSETSU (バリバリ伝説) 1987 | dir. Osamu Uemura


Tags
1 year ago

A quick summary of the last 16 months

A Quick Summary Of The Last 16 Months
8 months ago

No i didn't "forget to pack a toothbrush and a phone charger" its called on site procurement. Solid snake does it too

1 year ago
A long infographic with visual aids starting with the conversation:
"Is Miku AI?"
"No."
"Are vocal synths ethical?"
"Yes."
"How so?"
First section is Compensation.
Hatsune Miku is made out of recordings by Saki Fujita. Saki Fujita is contracted to record Miku samples, and is paid for her work.
Section: Recording Method. 
This is why Miku is not AI: Saki Fujita records from a list of sounds. It's necessary to have at least one recording per sound Miku should be able to sing. (Visual aid has examples of these sounds, such as "kaka".)
She can also sing the recording list a second time in a different octave, so that she sounds more natural. 
Section: Labelling. 
The samples Saki Fujita sung are then labelled with what sound they make. These sounds are then reproduced by the engine. This is how Vocal Synth software such as VOCALOID and UTAU work. This model is called "concatenative". (Visual aid shows how "kaka" is split into "k" and "a", which is how it looks in the VOCALOID software.)
Section: User interfacing.
These voicebanks are very flat. Users must adjust the vocals themselves in order to produce singing. This is referred to as "tuning". If you listen to "Tuning BLANK in the style of Vocaloid producers", you can see there are countless ways to tune Hatsune Miku. It is considered a form of artistic expression. 
Compare Scratchin' Melodii's original songs to the updated versions. This is the result of hiring an experienced Vocaloid tuner.
Question: How do AI Vocal Synths work?
Answer: They are actually extremely similar!
Section: Compensation.
Let's use the Synthesizer V Studio library "Solaria". Solaria is made out of recordings by Emma Rowley. Emma Rowley is contracted to record Solaria samples, and paid for her work.
Section: Recording.
Emma Rowley then records several hours of singing data. This is the substance of the library.
Section: Base model.
The AI needs a base to understand what it's interpreting. Unlike images, there is a large amount of volunteer voice data out there. It's typically assumed that base models are trained ethically. (Visual aid shows Dreamtonics, the developer company behind Synthesizer V, asking a university "Can I use this voice data you made for TTS research?" and observing a person saying "Hi! Here is a few hours of singing data you can use for voice technology.")
Section: Labelling.
Labelling is also the same. The singing is broken up into phonemes the engine will interpret. 
Header Section: Deep Learning.
In casual speech, "AI" refers to computer learning/sorting algorithms. "Diffusion" AI is the result of DNN; Deep Neural Network. It is the most drastic difference between concatenative and AI voicebanks.
Section: Teaching the base model.
The computer must be taught what the sounds are. The concept it builds is the "base model". (Visual guide is a cartoon of two computers talking. "Here's a british man saying 'bath'." "Added to my concept of 'a'." "Here's a Japanese girl saying 'baka'." "Added to my concept of 'a'.")
Section: Training the voice model.
Emma Rowley's recordings are then made into a reference point. This will make it so it will only render based on what it knows about Emma Rowley's singing. (Visual aid is a similar cartoon where a person talks to a computer while giving it a drive. Computer: "Now that I know what 'a' is, how should it sound?" Person: "I've labelled every time Emma Rowley says 'a'. Use this!")
Section: Diffusion.
The Solaria model uses everything it learned from Emma Rowley's recordings and the base mdoel to determine how 'a' sounds based on what note it's sung on, what's next to it, etcetera.
Section: Interfacing.
Tuners have been mixed on this; it sounds much clearer, yet the AI also has voice pitch models, so there's not as much as an incentive to develop your own personal flair.
Question: Are voice changers ethical?
Answer: Oh geez.
Section: ARE they ethical?
We don't need to break this down a third time. Voice changers are the generative AI of voice synthesis. It requires a lot less work of both the developer and the user, a simple applicator of everything the machine knows onto a piece of audio. What are the ranges of ethics?
Vocaloid 6 is packaged with a voice changer. It is only for AI libraries, voiced by people who agreed to this and were compensated. This is definitely ethical.
If you bought Hatsune Miku, you're nominally permitted to use the results as you see fit. Is tuning Miku and then creating a voice changer of her singing ethical? I genuinely don't know.
There's also a question of art. If you were to project the voice actor onto your own personal tuning work, isn't that still artistic expression? A voice is different from an art style. Where is human expression being interrupted by automation? I can't make an explainer for those subjective concepts.
I hope you're now educated enough to think on it yourself. End of image transcription.

A lot of people try to explain this without knowing anything about how voice synthesis works, so here's my breakdown on No, Hatsune Miku Is Not AI, And No, AI Voice Synthesis Is Not Bad.

Loading...
End of content
No more pages to load
  • imaginationengine
    imaginationengine liked this · 4 months ago
  • no-words-for-this-one
    no-words-for-this-one liked this · 7 months ago
  • violetbeetle
    violetbeetle reblogged this · 7 months ago
  • skelerose
    skelerose reblogged this · 7 months ago
skelerose - Angel
Angel

28 | she/they | artist

204 posts

Explore Tumblr Blog
Search Through Tumblr Tags