why are ai music generators so far behind image generators? music is way easier to make than paintings.

Posted on April 23, 2023 by Anonymous

why are ai music generators so far behind image generators? music is way easier to make than paintings. and sound is less complicated than vision for both humans and technology.

why isn't there a website where i can type in a few prompts like genres, and band names and start making new songs?

A Conspiracy Theorist Is Talking Shirt $21.68

POSIWID: The Purpose Of A System Is What It Does Shirt $21.68

A Conspiracy Theorist Is Talking Shirt $21.68

12 months ago

Reply

Anonymous

Human's perception of light isn't all that great, you can get away with a lot if you've seen how jpeg lossy compression throws away tons data without you noticing.
It's harder to do this for music because humans create it in a proactive way drawing from their own experiences, robots just imitiate their training data.
- 12 months ago
  
  Reply
  
  Anonymous
  
  there is less discernible data in an audio file than an image that an AI can learn from
  
  nonsense. a computer should be able to very easily analyze patterns in music on albums, by bands, and within genres.
  
  I'm a bit of a sperg so years ago, I took a few of my favorite songs and transposed them all to the key of C/A minor. and then i just make a flow chart to show every combination of chords used in those songs. C major is only ever followed by G major or D minor, A is only ever followed by B minor or C major. whatever it was, i don't remember. but it was a very simple chart with 7 chords and a bunch of arrows connecting them.
  
  I could just turn off my brain, start anywhere on the chart and follow it around any way i wanted randomly and no matter what it would sound good.
  
  If a sperg could do that a decade ago, then ai should be able to do it now.
  
  thats just chords but there is no reason it couldn't do that with every single part of every song on the internet.
  - 12 months ago
    
    Reply
    
    Anonymous
    
    tto an ai music generator a band would just be a list pf probabilities, 70% of this band's songs are in the key of G. 85% use a tempo of 140. whenever the song is minor they end up using a harmonic minor 7th to resolve during the last chorus.
    
    it's so incredibly simple. music is a joke compared to rendering light, and figuring out anatomy, perspective, etc.
    - 12 months ago
      
      Reply
      
      Anonymous
      
      >yes i listen to good music
      >acdc, aerosmith, halo 2 soundtrack, list goes on
      - 12 months ago
        
        Reply
        
        Anonymous
        
        If you want to know what I listen to you can just ask. Recently I've been enjoying Periphery, Opeth, Perturbator and Deftones. Back when I did that, I was big into the Mars Volta, Tool and this avant garde band called World's End Girlfriend. Porcupine Tree is pretty good, Mastodon, Love is Noise is pretty interesting, Yeasayer...
        
        12 months ago
        
        Reply
        
        Anonymous
        
        what the heck is happening to her?
        
        12 months ago
        
        Anonymous
        
        looks like an ashthma attack
        
        12 months ago
        
        Anonymous
        
        looks like an ashthma attack
        
        >has never seen a female orgasm
        makes sense
        
        12 months ago
        
        Reply
        
        Anonymous
        
        Tell me you have a small dick and you're hideous without telling me you have a small dick and you're hideous
        
        12 months ago
        
        Reply
        
        Anonymous
        
        woa, World's End Girlfriend, I thought I was the only one.
        
        AI won't capture musical emotion of say pic related or underworld live at Coachella right now.
        
        12 months ago
        
        Anonymous
        
        >underworld live at Coachella right now.
        mmm, skyscraper I love you
        
        12 months ago
        
        Reply
        
        Anonymous
        
        >power chord music
      - 12 months ago
        
        Reply
        
        Anonymous
        
        >>halo 2 soundtrack.
        it's nine inch nails
    - 12 months ago
      
      Reply
      
      Anonymous
      
      AI doesn't actually figure out anatomy. It just mimics previous poses from the art it's trained off of. It's an amalgamation of defuzzed images. Have you see how it'll create objects that look like objects but, on closer inspection, they're incredibly off? Prompts being up an approximation of an image. The parts underneath the surface aren't considered by AI. There may as well be no skeleton, no muscles. It only approximates the surface. AI art does objective art. It doesn't create based on subjective understanding of limb placement or aesthetic. It just does what it's told and brings up properties based on prompt text. Shit is true for all art models.
      - 12 months ago
        
        Reply
        
        Anonymous
        
        True, which is why an interesting development is that GPT 4 seems to do a decent job of drawing really shitty images that are more """intelligent""" than the art models, which can then be used as input of art models to get something that looks good.
  - 12 months ago
    
    Reply
    
    Anonymous
    
    >western harmonic music theory
    ngmi
    - 12 months ago
      
      Reply
      
      Anonymous
      
      enjoy banging sticks onto rocks and eating mosquitos. western music is the bestern music.
      
      you see a lot of chinks learning the piano and violen. not a whole lot of westerners taking an interest in chink and nig shit music. why do you think that is? do you think maybe it's because it sucks?
      - 12 months ago
        
        Reply
        
        Anonymous
        
        lol white people do nothing BUT ape music invented by black people
        
        12 months ago
        
        Reply
        
        Anonymous
        
        They really don't. This is israeli education and historical revision.
      - 12 months ago
        
        Reply
        
        Anonymous
        
        >why do you think that is?
        pretty simple: white people on average have the most christ-awful taste in music
        
        12 months ago
        
        Reply
        
        Anonymous
        
        Please share your enlightened taste my minority friend
        
        12 months ago
        
        Anonymous
        
        no, im caucasian my friend, im just 9 standard deviations over the average
        
        12 months ago
        
        Anonymous
        
        Are you saying there's a race where the average person has good taste in music?
        
        12 months ago
        
        Anonymous
        
        am i?
  - 12 months ago
    
    Reply
    
    Anonymous
    
    Then make it if it's so easy
  - 12 months ago
    
    Reply
    
    Anonymous
    
    >i found out about the circle of fifths in C major
    Anon this is day one of music theory.
- 12 months ago
  
  Reply
  
  Anonymous
  
  > you can get away with a lot if you've seen how jpeg lossy compression throws away tons data without you noticing.
  Literally this. You can even remove 50% data of an FFT version of an image and revert it back and you still get something coherent (while people don't notice quality degradation).
  Music cannot be FFT'd like that (and also music compression algos are more complicated like DFT) plus its linear nature makes it hard to actually do FT since it's way more complicated than a still image (video is trash for AI even at this point in time).
  
  The training used for music AI is to put them into spectogram "image" version and run Fourier Transform "diffusion" over the image data but that doesn't actually yield good results since it only works for chucking in several shit as image composition meanwhile music composition is a complicated task that ONE mistake will stand out, visual arts is more forgiving.
  
  I doubt it's even possible for computers to make a indistinguishable work imitating "Bach" when humans for the longest time haven't even come close to him. Just look up the "Modern Classical Music" we have these days, it has since regressed and is trash compared to the old medium. While fact is sound is more complicated than still image. Even a single second sound data is enough to contain several information (such as imaging out of echolocation/sonar).
  - 12 months ago
    
    Reply
    
    Anonymous
    
    >music composition is a complicated task that ONE mistake will stand out, visual arts is more forgiving.
    at best the ONE genre that AI will definitely kill is Jazz. it's the only type of genre that is forgiving with mistakes and can be covered up by reshaping the music keys etc. but still the procedure to make this is not "feed moar Jazz" but rather teaching it so a giant AI model is not enough for these or rather a giant model will make it less accurate.
    it needs to be a specific model.
    
    generic music are more "connected" to its initial seed which is not a noise, maybe AI music can help people with good "seed" but it cannot work the same way as image generation which can use any form of noise as "seed". rather a "seed" is actually even hard to do even for humans so how much seed data can you even feed a large AI model when you don't even know the seed of most music? then you have a problem but also it is good since it means it's harder to generate.
    
    because the music industry is a thousand times mroe anal about copyright
    
    that wont last too long though
    
    music industry is a hero. they don't even let bigtech scrape lyrics. it's one of the reason they don't put the lyrics on youtube "CC" since doing so would let youtube own the lyrics data which is rightfully not theirs.
12 months ago

Reply

Anonymous

idk but you sound like a lazy gibsmedat Black person to be honest
12 months ago

Reply

Anonymous

there is less discernible data in an audio file than an image that an AI can learn from
12 months ago

Reply

Anonymous

Actually, music generators are about on par with image generators. The medium itself is less forgiving of AI's smoke and mirrors. AI is all form, no intention; all outward edifice, no inner content. Makes for great waifus and generic fantasy landscapes, don't get me wrong...but the auditory equivalent of AI waifus is far less convincing/interesting. I'm sure however that AI music generation is already plenty good enough to spit out the kind of low-effort garbage most people enjoy listening to
- 12 months ago
  
  Reply
  
  Anonymous
  
  so where is the Dall e of music? where is the stable diffusion of music?
- 12 months ago
  
  Reply
  
  Anonymous
- 12 months ago
  
  Reply
  
  Anonymous
  
  It's not difficult to automate music into something that sounds "emotional". Hell, that's what DAWs do right now. You can easily automate chord progressions and certain parameters to create emotional ambient soundscapes or some shit. What current AI music is having issue with is sounding like complete ass because I'm assuming most models are fed with completed mastered tracks. To the AI, that just sounds like a blob of sounds, especially if no one has properly tagged what type of instruments or synths are being used in the track. Factor in shitty mastering trends for the past few decades and AI can only generate ear rapey noise with the occasional coherent DnB track or something.
  - 12 months ago
    
    Reply
    
    Anonymous
    
    It would probably be way easier and potentially more interesting to train AI on something like MIDI music or chiptune where the music file isn't the raw sound itself, but the data used to create the sound.
12 months ago

Reply

Anonymous

All K-pop idols are MtFs.
12 months ago

Reply

Anonymous

I assume it's very easy to take images from random nobodies on the internet to use for training without asking for permission, but if you do that to the music industry AND also allow people to use the service for free - or even worse you attempt to make money off of it - they will legally eviscerate your anus so bad it'll extinguish your entire family line.

YouTube and Twitch and whatever other fricking websites were all buttfricked into automatically identifying music if a user uploads it, it's that bad. If you watch a video on YouTube of some dude attempting to review audio equipment, they all apologize because they can't actually show it playing music - they know the automated assrape device from the music industry is primed and ready to destroy them if they play even a mere 10s of popular music. There's no fricking way they would ever allow generating new shit from their music without money flowing into their coffers because of it.
- 12 months ago
  
  Reply
  
  Anonymous
  
  yea but there is a law in place already about what constitutes copyright fraud/theft and what is deemed a cover, homage, or different enough from the original. on youtube you have fair use. there is some among of time for which you can play someone else's content and then you are required to interject something for some amount of time. a computer can have that built in. i think you guys don't know how music works. There are only 7 notes in a scale. Do you have any idea how many songs use the chord progression 1 6 4 5? I mean.... maybe a billion? You just have to change it slightly. An as long as you aren't making money off it, what is the problem? If I make music for myself and don't sell it to people how does the band that the music is based on lose anything?
  
  Doesn't sound like a reasonable excuse.
  - 12 months ago
    
    Reply
    
    Anonymous
    
    >There are only 7 notes in a scale.
    son, i am disappoint
    - 12 months ago
      
      Reply
      
      Anonymous
      
      C D E F G A B C
      C and C are the same note.
      - 12 months ago
        
        Reply
        
        Anonymous
        
        0 to 7 is 8 steps you fricking moron
        get off my board
        
        12 months ago
        
        Reply
        
        Anonymous
        
        its the same note, just 2 different octaves. dont be gay.
      - 12 months ago
        
        Reply
        
        Anonymous
        
        do you actually listen to this kind of music? i havent listened to this kind of dadrock shit in ages.
  - 12 months ago
    
    Reply
    
    Anonymous
    
    The law was put in place before this shit became possible and popular. If they don't like what you're doing they're going to sue your ass anyway and bury you in legal fees while they also lobby to change the law to suit them.
    >There are only 7 notes in a scale
    This is reductive moronation
    >An as long as you aren't making money off it, what is the problem?
    Current AI works by throwing moar GPU farm at the problem until you get something worthwhile, for the most part. Who is going to fund this with no interest in somehow using it for profit?
12 months ago

Reply

sage

disgusting feed
12 months ago

Reply

Anonymous

A combination of things. The temporal component required is tricky. Also, the habit of AI researchers is, since they're doing research, to do what's most interesting, not the most promising. So AI would be built and trained to spit out raw audio, rather than use tools. The issue becomes that we are highly sensitive to noise, and small errors can be highly noticeable. If you tried to do it by training multiple networks to tackle different aspects of music, with the data they're fed, trained on, and producing being a more discrete representation, such as sheet music or something, they'd probably be a lot better. But the what AI researchers want to study is having it spit out raw audio. A great example of what I'm talking about is the vocals. The new audio NNs I've heard sound pretty good generally, but the vocals, which sound oddly unoffensive, are clearly fabrication and reveal the issues with the rest of it. However, it would be trivially to have a NN that just did vocals do a great job, and then combine it with the rest in post. Like how humans do it. But that's less interesting.
12 months ago

Reply

Anonymous

You can't generate porn with it so relatively speaking no one gives a shit about it. It's what at any given time there are two to three

[...]

threads here while The voice generation threads only have one thread in those last several days at a time. The sdg ones last like 4 hours max because it's still popular as hell even after several months and the significantly lower hype compared to December of last year.

/Thread
- 12 months ago
  
  Reply
  
  Anonymous
  
  cause going all the way back to the fricking printing press, pornography is what moves tech
  and pornography is primarily a visual medium
  why the frick do you think we ultra efficient light emitting diodes and nanometer sized transistors to deliver images but we can't do anything about delivering smell? it's not a priority
  
  these are actually good points. i didnt think of that but you're right.
- 12 months ago
  
  Reply
  
  Anonymous
  
  [...]
  
  these are actually good points. i didnt think of that but you're right.
  
  Now what that being said, you COULD generate some kick-ass ASMR content with it if you knew what you were doing. Too bad getting AI voice synthesis to work (specifically the tortoise one:
  
  [...]
  
  ) is a fricking headache to get set up, which is yet another reason why it won't be as popular as SD. The shit is pretty fricking hard to get working consistently, training takes several hours, some people report at taking as long as 18 fricking hours, and cuda out of memory errors appear out the ass and the people that actually know how to use it have a stick up their ass because they think they're special but knowing how to use SOMEONE ELSE'S software. The a1111 and training and generation is a cakewalk compared to that shit
12 months ago

Reply

Anonymous

cause going all the way back to the fricking printing press, pornography is what moves tech
and pornography is primarily a visual medium
why the frick do you think we ultra efficient light emitting diodes and nanometer sized transistors to deliver images but we can't do anything about delivering smell? it's not a priority
12 months ago

Reply

Anonymous

is skrillex brostep and what is broatep
12 months ago

Reply

Anonymous

the whole point of ai music would be to look for new combinations of sound that aren't shit
using it to make western bangrock is about as creative as asking ChatGPT to give you therapy
12 months ago

Reply

Anonymous

because the music industry is a thousand times mroe anal about copyright

that wont last too long though
12 months ago

Reply

Anonymous

For the actual sound texture, there's a monstrous amount of software for generating it already. Mostly handmade sampled libraries and software synthesizers, but NotePerformer is an AI musician and it's been around for years.

For composition... why? If you want to express yourself then you'll want to compose the music yourself. If you just want a ready-made high quality composition there are thousands of public domain classical pieces that blow virtually all modern music out of the fricking water. There isn't a use case for AI composition.
12 months ago

Reply

Anonymous

In general, sound processing is always solved before image processing.
But in the field of Deep Learning audio has always been very complex, at the end of the day it was preferred to treat it as an image of a spectrogram and do operations using CNN, audio needs coherence, like language processing, but its quantized values are like pixels, so there is no good model to process audio using Deep Learning, so the current models are extremely inefficient and full of flaws.

In general audio processing needs thousands or hundreds of thousands of times less resources than image processing to be a solved problem.
- 12 months ago
  
  Reply
  
  Anonymous
  
  >In general audio processing needs thousands or hundreds of thousands of times less resources than image processing to be a solved problem.
  I feel this to be true. Large AI are great for images and there is much proof for that but then a genre of music is more akin to language. So I think to truly train music is to treat each (genre or song) as its independent language though the training method is the mystery. It requires lots of math and syntax. Human nature just knows music by heart. Maybe it's because we're the only creatures that have developed language (while other animals have amusia) and thus it is far easier to generate or understand music.
12 months ago

Reply

Anonymous

Same reason why our reinforcement AI can't write entire books, only snippets. It can only output so much complexity and has no ability to add onto an existing work.
12 months ago

Reply

Anonymous

Because music is more than just a bunch of notes. It's extremely abstract. That said, you could probably train an ai to do accents and fills over established genres and generate decent elevator music or shitty pop backing tracks.
12 months ago

Reply

Anonymous

If you fed an AI all of classical music, it would never just invent jazz. If you can understand why, you will understand why you can't just teach an AI to write music.
12 months ago

Reply

Anonymous

i t
i s
o v e r
- 12 months ago
  
  Reply
  
  Anonymous
  
  thank you for posting this becuse this is exactly what i was asking about. so every one of you tech gays who tried to say music is way more complicated than visual art is completely moronic and out of touch with reality. you are boobs. you don't know anything about tech whatsoever and you don't know anything about music.
  
  this is exactly what i was asking about. someone trained an AI on oasis songs, it wrote a bunch more oasis style songs. It sounds fine.
  
  All this talk about compression and raw sound is just jibberish. It is 100% doable. So it begs the original question of the thread which is, why isn't this shit fricking ubiquitous like ai art generators? if it's possible with convincing results, why is it so hard to come by?
  
  Another exactly is https://www.youtube.com/watch?v=2EaJCt2GpVc
  
  Another one is this https://www.youtube.com/watch?v=qf6eOSJgN0Y
  
  And another https://www.youtube.com/watch?v=CqvmUnG25dA
  
  bunch of losers and idiots on this board.
  >durrrr no actually music is too complex for a machine to reproduce
  - 12 months ago
    
    Reply
    
    Anonymous
    
    >PLEASE NOTE: THE MUSIC, LYRICS, RECORDING IS ORIGINAL BY THE BAND BREEZER (@breezerfever) & NOT Al.. THE AI ASPECT IS REPLACING THE ORIGINAL SINGERS VOICE WITH AI LIAM
    
    Read the description, moron. Humans wrote, played, and recorded the tracks. The AI part was making the vocals sound like they were sung by Liam.
  - 12 months ago
    
    Reply
    
    Anonymous
    
    You don't know how much you don't know is the problem here. The generative djent is rule based, not AI, that sort of generative music has been a thing for decades. The Oasis one is an AI voice filter, not AI music.
  - 12 months ago
    
    Reply
    
    Anonymous
    
    >https://www.rollingstone.com/music/music-features/nirvana-kurt-cobain-ai-song-1146444/
    
    How the Nirvana one was made. Much human interaction and selection needed.
- 12 months ago
  
  Reply
  
  Anonymous
  
  >PLEASE NOTE: THE MUSIC, LYRICS, RECORDING IS ORIGINAL BY THE BAND BREEZER (@breezerfever) & NOT AI.. THE AI ASPECT IS REPLACING THE ORIGINAL SINGERS VOICE WITH AI LIAM
12 months ago

Reply

Anonymous

Music is more of a social experience rather than your average AI-waifu fap-pic, that's why people would listen to System of a Down singing about the armenian genocide or neocons fricking over both the average american guy and people from other countries rather than some prompted tune done by a nobody no one cares about.
- 12 months ago
  
  Reply
  
  Anonymous
  
  >lyrics
  >mattering
- 12 months ago
  
  Reply
  
  Anonymous
  
  Its only a purely social experience to npcs
12 months ago

Reply

Anonymous

>all this cope ITT
Absolute fricking morons and that is grossly understating your phenomenal moronation.

There are no music generators because all data useful for training them is copyrighted by the israeliteiest of israelites. With such hostile environment no researcher bothers, and that leaves out hobbyists with next no understanding on ML armed with the power of free music datasets to create and train the models.

Your best bet to get your hands on something functional is a meaty Sony Music leak in a few years. I suggest you learn piano or guitar instead of waiting for that.
- 12 months ago
  
  Reply
  
  Anonymous
  
  >There are no music generators because all data useful for training them is copyrighted by the israeliteiest of israelites. With such hostile environment no researcher bothers, and that leaves out hobbyists with next no understanding on ML armed with the power of free music datasets to create and train the models.
  You are a genuine moron. The overwhelming, vast majority of good music is so old it's PUBLIC DOMAIN aka nobody has copyright to it.
  
  99% of copyrighted music is modern pop slop. There's a tiny 1% of good music today that's copyrighted but copyright is not preventing anyone from training an AI on good music, only bad music.
12 months ago

Reply

Anonymous

Limited data sets. When was the last time a captcha asked you to crop the moment the beat drops?
- 12 months ago
  
  Reply
  
  Anonymous
  
  Furthering this point, we don't have musicboorus where autists have been meticulously tagging every chord progression and harmonic device for decades.
12 months ago

Reply

Anonymous

Audio is considerably more data heavy than images. Stable diffusion was trained on 3 channel (RGB) 512x512 images, thats 786432 values per image. That same amount of data is only 8.91 seconds of cd quality stereo audio.
- 12 months ago
  
  Reply
  
  Anonymous
  
  No one is saying that the entire audio file should be generated right off the bat. Just the notes would be fine (which could ideally be parsed and passed on to a synthesizer)
  - 12 months ago
    
    Reply
    
    Anonymous
    
    Thats already been done long before AI, generative music has a long history. For a more modern AI approach there's https://www.youtube.com/@aiva1828
    - 12 months ago
      
      Reply
      
      Anonymous
      
      >paid service
      We mean AI that can compose masterpeces based on human feedback and with artist, genre, and optionally lyrics described.
      - 12 months ago
        
        Reply
        
        Anonymous
        
        You want the moon on a stick
        
        12 months ago
        
        Reply
        
        Anonymous
        
        Most of it has already been done. Google released a paper called MusicLM (but not code, as always).
12 months ago

Reply

Anonymous

It probably isn't down to technical limitations, just that there's not much effort in this area. Image recognition and generation is much more valuable than generating music.

There's also the fact that music is more temporal than imagery.
12 months ago

Reply

Anonymous

We're close. Give it a few months to a year.
12 months ago

Reply

Anonymous

Generative music may appear to be lagging behind text or image generation for several reasons. Here are some factors that contribute to this perception:

- Complexity of musical information: Music consists of a complex blend of pitch, rhythm, harmony, timbre, and expression, making it more challenging to model compared to images or text, which have more well-defined structures.
- Temporal dimension: Music is inherently a time-based art form, with patterns and relationships unfolding over time. This adds another layer of complexity to the modeling and generation process.
- Training data and representation: Curating and representing high-quality, diverse, and large-scale music datasets for training generative models is more challenging. While text and images are more easily accessible and represented in standard formats, music data can be less consistent in terms of notation, encoding, and quality.
- Evaluation and subjectivity: Evaluating the quality and creativity of generated music is more subjective compared to text or images. While there are some objective metrics for text generation (e.g., BLEU, ROUGE), and images (e.g., FID, IS), assessing the quality of generated music often relies on human judgment, which can be influenced by personal taste and cultural background.
- Commercial demand and application: Text and image generation have a wider range of immediate commercial applications, such as content generation, advertising, and data augmentation, which may lead to more research interest and investment in these areas. Generative music, on the other hand, has more niche applications and may not receive the same level of attention and funding.

Despite these challenges, there has been significant progress in generative music models, such as OpenAI's MuseNet and Google's Magenta. As technology and research advance, we can expect generative music to continue to improve and catch up with text and image generation.
- 12 months ago
  
  Reply
  
  Anonymous
  
  Thank you GPT-sama
12 months ago

Reply

Anonymous

Because they just cannot feed it all the music available on the internet without getting in trouble like they did with images, the moment the music labels find out some pajeet did that, they're gonna murder him.
12 months ago

Reply

Anonymous

I would've believed this to be legit. The problem is usually that the original voice input makes the voice sound unnatural

Cancel reply