The thing is there is no narrative or story telling element to image style transfer. Think of Beethovan's 9th symphony. It develops a narrative its not entirely abstract, the theme of the Song of Brotherhood gets subtly developed through the course of the composition. This is story telling. This is the same reason why its much easier to use ML to generate credible poetry. Generating a real novel with ML faces the same challenges as generating music, even instrumental music has a narrative and a story its telling. Story telling is far more difficult a problem than style transfer.
>Why do we need Style Transfer for Music?
We don't. Most of the applications of ANNs to music that I've seen so far are solving problems that either don't exist or have been very successfully solved by much simpler methods with much better results.
If you want to apply AI to something useful in music, how about:
- Machine learning for reverb generation. I.e. being able to "record" reverb from real space and apply it to sound.
- AI/ML for high-quality sample transposition.
- AI for pure sound synthesis. Imagine something that does PCA a set of sounds and then allows you to generate new sounds by changing components with a bunch of knobs.
- AI for real-time sample morphing. For example, being able to synthesize a choir that sings the words you need. Or, how about transforming your own voice into a quality choir vocals? Hey, style transfer!
- AI for high-quality music transcription (sound to notes or MIDI). This is being done to some extent, but I don't think it's very good yet.
And so on.
I can assure you, every singe one of these would be something musicians would be willing to pay good money for.
We're quite lucky that neural nets are overkill for procedural music generation.
By that I mean, we have huge masses of music theory, applied across genres and focusing on differences between genres, in terms of heuristics for both analysis & composition, and because of a tradition of procedural generation of music that goes back a couple centuries, a lot of it is fairly easy to translate into computer programs. (For instance, end-to-end serialist composition is easier for computers than it is for people, while canons and other mechanisms for creating permutations of melodies are equally straightforward.)
This doesn't translate into a straightforward method for putting in two wav files and producing a third with transferred style, but it does mean that a sufficiently motivated person can write something that translates notes between two known genres with greater ease than they would with images.
(Text is somewhere in the middle. I've worked on a couple attempts at 'style transfer' for text -- mostly using word2vec.)
There are some fun examples of this sort of stuff on http://dadabots.com/ , which includes attempts to synthesize music in the style of The Beatles, Meshuggah and The Dillinger Escape Plan.
If you like the idea of taking a song and changing its genre, check out Postmodern Jukebox. They’re pretty brilliant.
I find this surprising, from the simplistic (and probably naive) view that images are 2D signals while music is 1D.
"Music is NOT well-understood by machines (yet !!)"
I wrote this blog post, with some data that might help improve that.
Finally someone gets it. At least a little. :)