![]() Thus, to learn the high level semantics of music, a model would have to deal with extremely long-range dependencies. For comparison, GPT-2 had 1,000 timesteps and OpenAI Five took tens of thousands of timesteps per game. 24 A typical 4-minute song at CD quality (44 kHz, 16-bit) has over 10 million timesteps. 20 21 22 23 Generating music at the audio level is challenging since the sequences are very long. 19 For a deeper dive into raw audio modelling, we recommend this excellent overview. One can also use a hybrid approach-first generate the symbolic music, then render it to raw audio using a wavenet conditioned on piano rolls, 13 14 an autoencoder, 15 or a GAN 16-or do music style transfer, to transfer styles between classical and jazz music, 17 generate chiptune music, 18 or disentangle musical style and content.
0 Comments
Leave a Reply. |