Australian technology news, reviews, and guides to help you
Australian technology news, reviews, and guides to help you

After dabbling in AI music, I get why the music industry isn’t happy

Could you tell the difference between an AI artist and a real one? I accidentally made Bowie and Beyonce and Elton sing. After experimenting with AI music, it’s easy to see why the music industry is concerned.

“Thanks, I hate it,” a colleague said to me after hearing some of what I had been experimenting with. I don’t think he really hated it, and some of the results had been absolute bangers, but the lack of skill and process that went into them sullied the whole music creation process.

The fact that I was a musician and could make music the old fashioned way probably didn’t help. And yet I was still somewhat proud of my efforts.

I hadn’t laboured on them the way I might have any other track I’ve ever created. It had taken far less time, sure, but I still had worked on them.

This was music making, but different. It was modern, but also risky. You might even call it more risky than making music the old fashioned way.

A different take on music making

The music creation process is an art form: you start with a song in your head, working out how you want to make it, moulding it, changing the shape and the sound, matching words and structure and other instruments to the process, before you make your way to a recording where it’s time to split, tweak, time, apply effects, play with volumes, and master the sound.

Music creation is a labour of love, and an absolute process. It’s typically not quick and easy, and yet the music I had created in AI? That was. It laughed in the face of the regular music process.

The approach was very different.

Whereas most music starts as a song in your head and some notes you vibe with, the AI music process started with nothing more than a prompt and maybe some text. In my case, it was a prompt for the style I was trying to create for the sound, and then some lyrics.

I had experimented with creating songs entirely from scratch and letting the AI make lyrics before, but that wasn’t cutting it. I had to come up with a better process.

How to make good AI music

That better process was to think like a writer and start with the lyrics. Build the structure of the song in lyrical form, much like I would a story or article, and create it as work of words that anyone can read.

AI music is a bit of a crap shoot — you never know what you’re going to get, and that makes it so interesting. It can be a fully developed song, a small jingle, or a short sequence of sounds designed to resemble music that in actuality is the sort of garbage you’d never want to hear ever again. A lot of it arguably falls into the latter. But it’s also entirely possible to make something great.

So I started with the words, testing out two platforms, Suno and Udio, to see what they could deliver, before settling on the latter for this project. Udio works a little differently from Suno, which is largely a system to make a two minute song from a single prompt.

Slightly different (at least when we were using it in beta), Udio builds songs in 30 second chunks, but allows you to choose your start and end points, so you can technically start in the chorus or even the verse.

From there, you can add sections, like the intro or a bridge, giving you a little more creative control. With that extra control, I would iterate versions, improve on what’s being generated, and use my lyrics and song structure to help define what was being created.

In the lyrics, I would specifically say if something was a [Bridge] or a [Chorus], and listen to the AI response. Some of it was great, some of it was less great, but after between 5 and 20 iterations, I typically had a starting point, almost always kicked off from the first verse or the chorus.

From there, I could build the track in AI, and then spit out the final song. Just like that.

What I did from there was largely to see if AI music could be more than just artificially generated trash.

An experiment in AI music

Specifically, the project aimed to answer a few things. I wanted to:

  1. Explore if it was easy or even possible to generate good AI music, somewhat to see whether our music services would eventually be flooded with it
  2. See whether AI services were scary for the music industry, given the claims that it’s a threat to real and current artists, and
  3. I wanted to showcase music that could cover topics no one typically covered. Could I make decent music about a theme no one would ever really genuinely write?

All three seemed possible in a story, so I got to work in the background of all the things that normally occupy my life: dayjob, journalist, parent life, and all the other things that happen.

If music is a labour of love, but you don’t have the option to do it full time, it tends to take a backseat to everything else, and this was no different.

I split my journalism time with my nine-to-five already, having made the jump to do both years ago, and so adding another on top of everything else was clearly going to be a fun juggle. Programming. Music making. Art and photography. The endless pursuit of trying to find an inner Gordon Ramsay somewhere inside.

Understandably, AI can help with some of these. It can suggest recipes, but it can’t cook for me. It can provide completed code samples, but it can’t make a full app. I still need to do a lot of things myself, which is fine.

But AI can lend a big hand with music making. As I learned when I was scripting sound, artificial intelligence can do a lot of the heavy lifting in music creation, and help you create great-sounding music in practically no time.

AI isn’t a robot playing a piano for you, but it may as well be a digital equivalent in some ways.

Can you make good AI music easily?

Almost too easily, we found the nascent AI platforms in existence right now generated great tracks. It might take some trial and error, but you can build solid sounds in a pre-mixed state without much talent, if any.

Much like other forms of AI, the results are can be very much about the prompt as they can the model and data they’re trained on, which like the recipe for Coca Cola is likely very, very secret.

That being said, I found having an understanding of music helps immensely. That’s not to say musicians will necessarily work out how to prompt an AI system better than people who aren’t musicians. However, knowing the difference between a bridge and a chorus, understanding music structure, and even being able to describe the sounds you’re after will help.

Depending on the platform, you may be able to just say the name of the artist, and have the system approximate what you’re going for.

To avoid copyright issues (rightfully so), AI music platforms won’t necessarily let you copy an artist directly. They shouldn’t, either. However, an AI platform may interpret the results for you, working out the right words for genre, for instrument, for voice and tone.

Prompting for success

If you know what you’re trying to get out, the sounds give you an indication of what the models are trained on.

In the opening track of our experimental album “When Do We Eat”, the initial prompt was:

Soft 70s English piano rock, soaring cinematic ballad, Male Vocals

The result was a sound which to the ear comes across as a combination of David Bowie and Peter Gabriel. I tweaked the prompt and construction of the song using a bridge and later on suggested “orchestral build-up”, but it came out rather like the style we were going for, without saying a single name of an artist.

It’s a similar story with other tracks, such as the last on the album: “I Wrote My Life”. The prompt used was:

70s piano pop rock, dramatic, ballad, English singer, adult contemporary,

The resulting sound was one like a young Elton John, and yet distinct from another on the release in “Sometimes It Works”, which comes off more as a combination between Sting and John Mayer.

upbeat, guitar strumming, adult contemporary, jazzy rock

Without naming names, the AI model had managed to work things out, and also blend some sounds.

It even managed a Beyoncé-sounding track without uttering a name (below).

Treading on toes: why the music industry is concerned

No doubt, this is going to ruffle some feathers and tread on toes, largely because the Large Language Models (LLMs) being used are very likely trained on real artists without their permission, or even their publishers and distributors. How else are they able to get this close to the sounds of real artists?

That is understandably a problem, not just because it potentially takes work away from musicians, but because it also appears to put words in their mouths. It’s an issue, and it’s one where you can see why artists are understandably concerned.

As it is, image AI generation is running into similar issues, and at least one company is responding by only training the technology on licensed images. It’s a big deal, because current AI image makers are trained on a variety of sources, and like the AI music side of things, those sources may not have given their permission, or may be completely unaware of the happenings.

The problem is this goes deeper than songs that sound like they’re being sung by someone else.

Music publishing isn’t just about the artist playing the music today, but also the sounds being played. Some of those sounds can be an almost literal hook to bring the listener in, and they might have been used in other songs, too. Often, these are credited, particularly when a producer has intentionally reused a hook found in other tracks and can pay royalties to that original. Other times, these hooks appear unintentional, largely because music can sound similar.

You only need to ask Ed Sheeran about this, whose song “Shape Of You” was accused of ripping off a hook from a track released before it, similar to another lawsuit about a similar-sounding hook in a Sheeran song.

In AI music, however, there’s a risk that a hook will be used without the creator realising it, as the model and engine customises the hook in the process. You might hear similar sounds to other hooks and other tracks, but because artificial intelligence is changing things, there’s almost no way to prove that a hook is being used.

Conversely, there’s also likely no way to disprove it, and that could end up being a big problem. I’m not a lawyer, but it doesn’t take much effort to see the potential issues.

Back in 2015, the estate of Marvin Gaye won a lawsuit against Robin Thicke over the sound of “Blurred Lines”, arguing that the sound of the songs was similar, even if Thicke’s track wasn’t a direct copy. There were several rounds of this lawsuit, a back and forth over appeals and such, but the resulting discussion could be particularly troubling for both the AI music world and the music industry in question, particularly since AI music appears to be trained on the music of other artists.

Ultimately, it becomes pretty easy to see it from the point of view of the music industry once you dabble with it all. And at the same time, it becomes risky for those simply playing with AI music as a fun hobby.

It didn’t take much to make music with AI. You could skip a whole bunch of the steps I tried and just have generative AI make an album in a matter of minutes, releasing it without any effort at all.

Can you take AI music and turn it into something better?

But you can also conceivably make something with effort, and try to be consciously aware of what the system is doing.

Sending the right prompt to an AI platform is only one part of building a great song with AI. I found that writing the lyrics ourselves produced the best results, but even then, it’s about prompting the next part, the structure, the intro, the outro, and so on.

Once AI has done its job, however, you can turn to a little bit of machine learning and then some skill to split the tracks into something mixable.

A recent update to Logic Pro could do it, splitting a music file into several tracks for remixing: guitar, drums, vocal, etc. However, before Logic added this feature, we were turning to another AI platform called “Moises AI”, which also provides a way to translate your AI songs to tabs and chords.

The concept is clever, also providing greater complexity for stem splitting than the four tracks offered by Logic.

Depending on how much you pay, you can get a high-res track export, though keep in mind, it can’t create quality from nothing. The music exported from AI platforms may lack fidelity compared to recording it yourself. Almost nothing else will match that.

With the music split into tracks, you can then work on your own tracks. Piece them together, cut sections, add effects, mix and master and build the tracks the way you want to do it.

AI music is typically exported in a “wet” way, which is to say all the effects and sounds have been pre-mixed. That’s distinct from “dry” audio sessions, which the exact opposite: you can do all the work yourself when it’s dry.

Splitting the stems allows you to have a makeshift dry session in your AI music: it’s not completely dry, but you can mix some of the tracks as if you were building your own songs. Kinda sorta.

What does a mixed AI song sound like?

After spending a few weeks with AI music generation, an album was made.

The theme was in some ways ridiculous, creating music about search engine optimisation. I’m not sure if anyone ever has before, but given that music is hardly ever about what you do in work, I figured making music about content and search was about as crazy an experiment as could be.

So I wrote lyrics and created music, and then downloaded the audio and remixed it in Logic Pro. The concept was written by a human, engineered in AI, and remixed by a human once again, all to see if it was possible.

And it totally is. You can hear the results for yourself, available on a gamut of music services, including Amazon Music, Apple Music, Spotify, and YouTube Music, and pretty much any other music service in the world.

The results are better than they should be, and that’s what is largely so interesting about AI music.

These tracks shouldn’t sound as good as they do. The topics aren’t really suited for music; no one in their right mind would sing about the things Google does and the terms needed for search, but yet they now exist. They didn’t take nearly as long as a real music writing process should have dictated, and yet they’re almost ready for Top 40. It’s crazy.

The craziest thing isn’t even that a journalist made them, either. It’s that more like them will very likely populate music services down the track. Based on what was experienced, you’d likely be none the wiser to realise it. And depending on how much music you listen to, you might end up listening to more AI music soon, too.

Remixing AI tracks is fun in stereo, but there’s actually an Atmos variant that was made, too. I haven’t released it yet, but it’s neat to learn you can re-engineer AI music to be spatial capable, if need be.
Read next