TL;DR

We produce dozens of Instagram Reels and YouTube Shorts per week at tabiji.ai. Every one needs background music. We tested two AI music generators in production: MiniMax Music API (three models: 2.0, 2.5, 2.5+) and Suno AI (reverse-engineered API, bird-codename models, piano covers scored by Gemini).

The punchline: Suno produces significantly better music (8.5/10) but has no public API — we had to reverse-engineer it. MiniMax produces good-enough music (7/10) with a real API at ~$0.01/track. We built infrastructure for both, then ended up using MiniMax for daily production because it's simpler. We use Suno when music quality actually matters.

Along the way, we also built a Gemini 2.5 Flash audio scorer to evaluate AI-generated piano covers — and discovered that Gemini is too polite to be a useful music critic. Every sample on this page is a real production artifact. Press play.

Why AI Music for Travel Content?

Short-form video needs background music. Every Reel, every Short, every TikTok. When you're producing 4–10 videos per day through automated pipelines, your options narrow fast:

  • Licensed music — expensive at scale, rights management nightmare, can't automate selection
  • Royalty-free libraries — free or cheap but limited variety. Your audience hears the same 20 tracks on repeat after a week.
  • AI-generated music — unique per video, ~$0.01/track, fully automatable via API, infinite variety

We chose option three — and then spent weeks going deeper than we expected. What started as "just add background music to Reels" turned into reverse-engineering undocumented APIs, discovering that AI models use bird codenames instead of version numbers, building a Gemini-powered music scoring system, and learning that AI music critics are too nice.

This is the full story.

MiniMax Music: The Production Workhorse

MiniMax offers three music generation models through their API. They look similar on paper, but in practice they behave very differently — and only one of them actually works for instrumental background music.

FeatureMusic 2.0Music 2.5Music 2.5+
Model IDmusic-2.0music-2.5music-2.5+
Cost per track~$0.01~$0.01~$0.01
Requires lyricsYes (always)Yes (always)No (optional)
is_instrumentalNot supportedBroken (error)Works perfectly
Output duration20–65s20–50s60–192s
Vocal tendencyAlways vocalUsually vocalPure instrumental
Quality (travel bgm)5/106/108/10
Our usageRetro/vocal vibesDon't useAll production

Music 2.0 — Vocals Whether You Want Them or Not

The original MiniMax music model. It always requires a lyrics field — you cannot generate without one. It doesn't support is_instrumental: true at all. To get something close to instrumental, we pass placeholder lyrics like "la la la". The model interprets this as a wordless vocal melody — humming, vocalizations layered on top. For retro vibes and dreamy atmospheric tracks, this has charm. For clean background music behind text overlays, it's a problem.

Music 2.5 — The Broken Middle Child

Slightly better instrumentation and mix than 2.0, but is_instrumental: true doesn't work. It returns 2013: invalid params, lyrics is required. You still need lyrics, you still get vocals. There's no reason to use this model.

Music 2.5+ — The One That Works

The only MiniMax model where is_instrumental: true actually works. Set it and you get pure instrumental music — no vocals, no humming. Quality is noticeably better: richer instrumentation, better stereo separation, more dynamic range. Tracks are longer (60–192s), giving you more material to trim from. This is the one we use in production.

MiniMax Audio Tests

Test 1: Warm Cinematic Travel

Prompt: "warm cinematic travel background music, acoustic guitar and soft piano, golden hour sunset vibes, dreamy and nostalgic"

Same prompt, all three models. Music 2.0 and 2.5 got placeholder lyrics ("la la la"). Music 2.5+ used is_instrumental: true.

🎧 Prompt 1: Warm Cinematic Travel
Music 2.0 — "la la la" placeholder lyrics Vocal
Music 2.5 — "la la la" placeholder lyrics Vocal
Music 2.5+ — is_instrumental: true Instrumental ✓

The difference is immediate. Music 2.0 renders the "la la la" as a dreamy sung melody — charming but not background music. Music 2.5 is similar but slightly more polished. Music 2.5+ is the only one producing clean, vocal-free instrumental that sits behind video content without competing for attention.

Test 2: Japanese Festival Energy

Prompt: "upbeat Japanese festival drums and shamisen, energetic travel montage, fast tempo"

🥁 Prompt 2: Japanese Festival Drums
Music 2.0 — "la la la" placeholder lyrics Vocal
Music 2.5 — "la la la" placeholder lyrics Vocal
Music 2.5+ — is_instrumental: true Instrumental ✓

Key difference here: duration. Music 2.0 and 2.5 generated ~20-second clips, while Music 2.5+ gave us ~100 seconds — five times more material to work with. For our pipeline where we trim to 8–15 seconds, more source material means more options for finding the perfect segment.

Test 3: Vintage Kyoto (Real Production Sample)

Prompt: "Lo-fi ambient, Rhodes piano, shakuhachi flute, vinyl crackle, tape hiss, 65 BPM, no drums"

🎬 Production Sample: Vintage Kyoto Reel
Music 2.0 — from our vintage Kyoto Reel Vocal

Even when prompted for "lo-fi ambient" with specific instruments, Music 2.0 layers vocal performances on top. Pleasant, but not what we needed for a video with text overlays.

MiniMax Music 2.5+ is good. Reliable, cheap, automatable. But we wanted to see what great AI music sounded like. Enter Suno.

Enter Suno: The Quality King

If you've spent any time in AI music, you've heard of Suno AI. It produces genuinely stunning music — full songs with complex arrangements, rich vocal performances, emotional dynamics that rival human production. In a blind test, Suno tracks can fool professional musicians.

We needed Suno for a specific use case: solo piano covers for our travel content. Not background music for Reels (MiniMax handles that), but standalone pieces — the kind you'd put on a Spotify playlist called "ambient piano for work." Original compositions with specific moods. The kind of music that sounds like someone sat down at a Steinway and played.

There was just one problem: Suno has no public API.

So we built one.

How We Reverse-Engineered Suno's API

This section is the technical deep dive. If you just want to hear the music, skip to the Suno Audio Showcase.

The Architecture

We use a local Next.js wrapper (suno-api/) that reverse-engineers Suno's internal API. The auth flow looks like this:

Browser Cookie Clerk Session JWT Token Suno Internal API Audio URL

The wrapper also includes a Playwright-based captcha solver that uses the 2Captcha service (~$3 balance) to solve hCaptchas when Suno requires them. In practice, we discovered captcha is rarely required — but we'll get to that.

The Night Everything Broke

On March 13, 2026, around 9:38 PM, we tried to generate some piano covers and the whole system was dead. Suno had redesigned their web UI without warning, and our Playwright captcha solver couldn't find any of the expected elements.

Here's what changed:

Old SelectorNew SelectorWhy It Broke
.custom-textareatextarea:visibleCSS class removed entirely. Hidden lyrics textarea required :visible filter.
button[aria-label="Create"]button.filter({hasText: 'Create'})aria-label removed from the button.
waitUntil: 'domcontentloaded'waitUntil: 'networkidle'React SPA fires domcontentloaded before rendering anything. Page shows a loading spinner.

We fixed each selector, added debug screenshots on browser launch for future troubleshooting, and got everything working again. Then came the plot twist.

The Plot Twist: Captcha Wasn't Even Required

After fixing all the selectors and getting the browser flow working again, we checked the captcha endpoint (/api/c/check) — and it returned required: false.

The entire browser automation dance — launching Playwright, navigating to suno.com/create, finding the textarea, clicking Create, solving hCaptcha — was unnecessary. Direct API calls with just the Clerk JWT token worked fine. The old broken selectors had been causing timeouts before the code could even check whether captcha was needed.

We'd spent an hour debugging a browser flow we didn't need. Classic.

The Bird Codename Mystery

Once the API was working, we hit another wall: model selection. We tried chirp-v5 for Suno's latest model. 403 error. Tried v5. 403. Tried chirp-v4-5. 403.

Turns out, Suno uses bird codenames, not version numbers. Here's the mapping we eventually figured out:

Bird NameVersionQualityNotes
chirp-crowv5BestPro default, highest quality
chirp-bluejayv4.5+Very good
chirp-aukv4.5Good
chirp-v4v4SolidOlder but reliable
chirp-v3-5v3.5Legacy

There's no documentation for this. You have to try model names until something doesn't 403. chirp-v5, v5, suno-v5 — none of them work. It has to be chirp-crow. We found this through trial and error at midnight.

Other Suno API Gotchas

  • No artist names in prompts. Suno blocks specific artist names. "Solo piano cover of Cruel Summer by Taylor Swift" gets rejected. "Solo piano cover of Cruel Summer" works fine. Describe the style without naming the artist.
  • Prompt limit: 500 characters for gpt_description_prompt on v5 (not 200 as some guides claim).
  • Each generation = 2 clips = 10 credits. Pro Plan ($10/mo) gives you 2,500 credits/month = 250 generations = 500 clips. Commercial rights included.
  • wait_audio: false + polling is more reliable than wait_audio: true. The blocking mode can timeout after 100 seconds. Better to fire-and-forget, then poll with /api/get?ids=X,Y.

The Cruel Summer Experiment

To really test Suno's capabilities and build a scoring system, we ran a controlled experiment: generate 10 solo piano covers of the same song using 5 different emotional variations, then score them with Gemini.

The song: "Cruel Summer" (we can't name the artist in the prompt, but you know who). The model: chirp-v4. Instrumental mode enabled.

The Five Variations

Each variation got the same base prompt template — only the mood and dynamics changed:

"Solo piano cover of Cruel Summer. Steinway grand piano, close-miked. [MOOD]. [DYNAMICS]. Warm intimate recording."
#StrategyMood ModifierClips
1BalancedBittersweet longing, hushed verses building to powerful chorus2
2MelancholicMinor key reharmonization, wistful and sad2
3BrightBright uplifting, flowing arpeggios, shimmering upper register2
4SparseMinimal, lots of space between notes, ambient contemplative2
5VirtuosicFast runs, ornamental flourishes, concert-style performance2

5 variations × 2 clips each = 10 clips total. Cost: 50 credits ($0.20). All confirmed as solo piano with recognizable melody. Now we needed a way to score them.

Gemini as AI Music Critic

This is the part we were most excited about — and the part that taught us the most about AI evaluation limitations.

The Scorer

We built scripts/gemini-piano-scorer.py — a Python script that uploads an MP3 file to Gemini 2.5 Flash and asks it to evaluate the piano cover on five criteria, each scored 1–10:

Emotional Impact
/10
Piano Tone Quality
/10
Arrangement
/10
Spotify Playlist Fit
/10
Replay Value
/10

The scorer also checks two binary questions: Is this actually solo piano? Is the melody recognizable? Both are important — an AI generator might produce beautiful music that happens to include a full orchestra, or nail the vibe but completely miss the original melody.

The Results

All 10 clips passed the binary checks: solo piano, recognizable melody. The scores told an interesting story — or rather, they told a story about Gemini's limitations:

VariationClipEmotionToneArrangementPlaylistReplayTotal
BalancedA9999945/50
BalancedB9989944/50
MelancholicA9999844/50
MelancholicB99910946/50
BrightA9989944/50
BrightB8999944/50
SparseA9999945/50
SparseB910910947/50
VirtuosicA99109946/50
VirtuosicB9999844/50

Top 3: Sparse (B) at 47/50, Melancholic (B) and Virtuosic (A) tied at 46/50.

The emotional reinterpretations (melancholic, sparse) outperformed the straight covers. The sparse version — lots of breathing room, ambient and contemplative — scored highest. Interesting signal: restraint impresses the model more than virtuosity.

The Problem: Gemini Is Too Nice

Look at those scores again. Every single clip scored between 44 and 47 out of 50. Individual criteria ranged from 8 to 10. No deductions below 8. Every clip was "excellent."

This is useless for selection.

The scores read like a polite music professor grading undergraduate recitals — everything is "technically proficient" and "emotionally resonant." In reality, the 10 clips varied more than the scores suggest. A human listener could easily rank them: some had awkward transitions, others meandered, one had a section where the AI seemed to forget the melody entirely. Gemini noticed none of this.

Gemini 2.5 Flash produces evaluations that are individually reasonable but collectively useless. When every score clusters at 8–9/10, you can't differentiate quality. The distribution is too compressed to extract a ranking.

How to Fix AI Music Scoring

We haven't fully solved this, but here's what we think would work:

  • Adversarial scoring: "Find 3 specific things wrong with this performance. Be harsh." Forces the model to look for flaws instead of confirming quality.
  • Comparative scoring: "Listen to both clips. Which is better? Why? Rank them." Humans naturally compare. Point scores in isolation are meaningless.
  • Reference-based scoring: "How close is this to the original recording? Score the melodic accuracy specifically." Anchors the evaluation to something concrete.
  • Calibrated rubric: "A 10/10 is Oscar Peterson live. A 5/10 is a first-year piano student. Where does this clip fall?" Gives the model a real scale.

💡 Key Insight: AI Can Analyze Audio, But Can't Rank It

Gemini 2.5 Flash is genuinely impressive at understanding music — it correctly identifies instruments, key signatures, tempo, emotional texture, arrangement style. Its analysis is excellent. Its evaluation is worthless. It can describe what it hears in precise detail but assigns the same "great" rating to everything.

For now, we use Gemini scoring as a filter (reject anything below 7/10) rather than a ranker. The final selection is still human ears.

The Scorer Code

import google.generativeai as genai
import json, sys

genai.configure(api_key="YOUR_GEMINI_API_KEY")

def score_piano_cover(mp3_path):
    """Upload MP3 to Gemini 2.5 Flash, score on 5 criteria."""
    audio = genai.upload_file(mp3_path)
    model = genai.GenerativeModel("gemini-2.5-flash-preview-05-20")

    prompt = """Analyze this solo piano cover. Score each criterion 1-10:
    1. Emotional Impact - Does it evoke feeling?
    2. Piano Tone Quality - Realistic, warm, well-recorded?
    3. Arrangement Sophistication - Creative reinterpretation?
    4. Spotify Playlist Fit - Would it fit "peaceful piano"?
    5. Replay Value - Would you listen again?

    Also: Is this actually solo piano? (yes/no)
    Is the original melody recognizable? (yes/no)

    Return JSON: {"scores": {"emotion": N, "tone": N,
    "arrangement": N, "playlist_fit": N, "replay": N},
    "solo_piano": bool, "melody_recognizable": bool,
    "total": N, "notes": "brief explanation"}"""

    response = model.generate_content([prompt, audio])
    return json.loads(response.text.strip("```json\n").strip("```"))

if __name__ == "__main__":
    result = score_piano_cover(sys.argv[1])
    print(json.dumps(result, indent=2))

Suno Audio Showcase

Here are real Suno outputs from our production runs. The first two are piano covers Bernard selected as favorites. The "Cruel Summer" clip is from our scoring experiment. The rest are original compositions generated on chirp-crow (v5) — Suno's best model.

🎹 Suno Piano Covers
Piano Cover #1 — Yiruma-style, warm and intimate Suno
Piano Cover #2 — Emotional, reflective Suno
Cruel Summer — Solo piano cover (experiment clip) Suno v4
🎵 Suno v5 Original Compositions (chirp-crow)
Morning Mist — Solo piano, dawn atmosphere Suno v5
Snowfall Nocturne — Solo piano, winter night Suno v5
Candlelight Waltz — Solo piano, warm 3/4 time Suno v5
Coastal Reverie — Solo piano, ocean-inspired Suno v5
Paper Lanterns — Solo piano, East Asian influence Suno v5

Listen to these back-to-back with the MiniMax samples above. The quality gap is real. Suno's piano has dynamic range, natural sustain and decay, realistic pedal noise, and genuine musicality — the kind of subtle imperfections that make a performance feel human. MiniMax sounds like good MIDI. Suno sounds like a recording session.

Native Video Audio: Veo 3, CogVideoX, Hailuo

Before settling on separate music generation, we tested whether video generation models themselves could produce usable audio.

Google Veo 3 — Decent SFX, Not Music

Veo 3's generate_audio parameter produces synchronized audio: ambient street noise, wind, water, footsteps. For establishing shots it adds atmosphere. But it can't produce composed background music. We disable it for most Reels.

Z.AI CogVideoX-3 — Synthetic and Unusable

Claims "AI SFX audio" in docs. In practice: mediocre, synthetic, obviously AI-generated. Never used in production.

MiniMax Hailuo I2V — No Audio At All

Video only — no audio output. This is actually why we built the music pipeline: Hailuo's video is excellent, but it ships silent. Music 2.5+ is the natural companion.

ModelAudio OutputMusic QualityOur Usage
Veo 3SFX + ambienceN/A (not music)Disabled for Reels
CogVideoX-3AI SFX2/10Never used
Hailuo I2VNoneN/APaired with Music 2.5+
Bottom line: no video generation model produces music you'd want in a published Reel. Use a dedicated music model.

Head-to-Head: Suno vs MiniMax

After months of using both in production, here's the honest comparison:

CategorySuno AIMiniMax Music 2.5+
Music Quality8.5/10 — genuinely impressive7/10 — good enough for bgm
APINo public API. Reverse-engineered wrapper.Real REST API. Documented.
Cost$10/mo subscription (~$0.02/clip)~$0.01/track, pay-per-use
AutomationFragile — Suno can change their UI/API at any timeStable, versioned API
Instrumental ModeExcellent — true solo instrument supportWorks on 2.5+ only
Model SelectionBird codenames (chirp-crow, chirp-bluejay)Simple version numbers
Output Duration30–240s60–192s
Commercial RightsPro plan required ($10/mo)Included with API usage
ReliabilityBreaks when Suno updates their site99%+ uptime
Best ForStandalone music, covers, quality-criticalBackground music at scale

The irony: We spent hours reverse-engineering Suno's API, fixing broken selectors at midnight, discovering bird codenames through trial and error, building a Gemini scorer to evaluate output quality — and for our actual daily production, we use MiniMax. Because it has a proper API and it just works.

Suno is objectively better music. But "better" doesn't matter when you need 10 tracks generated automatically at 3 AM by a cron job. MiniMax is boring and reliable. Suno is exciting and fragile. For production automation, boring wins every time.

🎯 When to Use Each

Use MiniMax Music 2.5+ when: automated pipelines, background music for Reels, cron jobs, any workflow where reliability matters more than quality. Cost: ~$0.01/track.

Use Suno when: standalone music pieces, piano covers, quality-critical content, anything where a human will listen to the music itself (not just hear it under a video). Cost: ~$0.02/clip from a $10/mo subscription.

Other Alternatives We Explored

Pixabay Royalty-Free Library — Our Fallback

We downloaded 21 royalty-free tracks from Pixabay, organized by mood (adventure, dreamy, tropical, cinematic, cultural). Zero latency, zero cost, predictable quality. When MiniMax's API has issues, we fall back to these. Cons: Only 21 tracks. After a week of 4 Reels/day, your audience has heard each one multiple times.

Trending Instagram Audio Pipeline

We built a system to download trending Instagram audio via YouTube search, then categorize by mood using Gemini Flash audio analysis. The flow: identify trending audio → search YouTube → extract with yt-dlp → analyze mood with Gemini → rename and categorize. Downloaded 19 tracks at ~$0.007 per track. This is for riding trending sounds for algorithmic reach — a different use case than generated music.

Udio — Good Music, Same API Problem

Like Suno, Udio produces excellent music with no public API. Web-only. We didn't invest in reverse-engineering Udio since we already had a Suno wrapper. If Udio ships an API first, we'll switch.

Production Pipeline & Code

Here's how everything fits together in our actual production workflow:

Reel FormatMusic SourceTypical Prompt Style
Budget ReelsMiniMax Music 2.5+ instrumentalDestination-mood based
Scam Alert ReelsMiniMax Music 2.5+ instrumentalTension, suspense, minor key
Tourist Mistake ReelsMiniMax Music 2.5+ instrumentalDramatic reveal, cinematic
One Thing ReelsPixabay library (mood-based)Pre-categorized by mood
Wrong Answers OnlyMiniMax Music 2.5+ instrumentalCultural mood matching
Vintage POVMiniMax Music 2.5+ or 2.0Lo-fi, Rhodes, vinyl crackle
Standalone PianoSuno (chirp-crow)Solo piano, specific mood

The Standard Pipeline (MiniMax)

Destination + Mood Music 2.5+ Generate FFmpeg Trim (8–15s) FFmpeg Mux with Video Publish

Total music cost per Reel: ~$0.01. At 4–10 Reels/day, that's $0.04–$0.10/day. Essentially free.

The Quality Pipeline (Suno + Gemini)

Song + Mood Suno Generate (×5) Gemini Score Human Select Top 1 Trim + Publish

More expensive ($0.10 per batch of 10 clips + Gemini scoring). Used for quality-critical content where the music is the focus, not just background.

MiniMax Code

# Generate instrumental background music (Music 2.5+)
python3 scripts/minimax-video.py music \
  "warm cinematic travel, acoustic guitar, golden hour" \
  -m music-2.5+ --instrumental \
  -o /tmp/my-track.mp3

# The underlying API call
curl -X POST "https://api.minimax.io/v1/music_generation" \
  -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "music-2.5+",
    "prompt": "warm cinematic travel background music",
    "is_instrumental": true
  }'

Suno Code

import requests, time, os

SUNO_API = 'http://localhost:3000'

def generate_suno(prompt, instrumental=True, model='chirp-crow'):
    """Generate 2 clips via local Suno API wrapper."""
    r = requests.post(f'{SUNO_API}/api/generate', json={
        'prompt': prompt,
        'make_instrumental': instrumental,
        'model': model,
        'wait_audio': False
    }, timeout=180)
    data = r.json()
    ids = [clip['id'] for clip in data]

    # Poll until complete
    for _ in range(30):
        r = requests.get(f'{SUNO_API}/api/get?ids={",".join(ids)}')
        clips = r.json()
        if all(c.get('status') in ('complete','error') for c in clips):
            break
        time.sleep(20)

    # Download completed clips
    for clip in clips:
        if clip['status'] == 'complete' and clip.get('audio_url'):
            mp3 = requests.get(clip['audio_url'])
            with open(f"/tmp/{clip['id'][:8]}.mp3", 'wb') as f:
                f.write(mp3.content)

# Usage
generate_suno(
    "Solo piano cover of Cruel Summer. Steinway grand, "
    "close-miked. Minor key reharmonization, wistful. "
    "Warm intimate recording.",
    model='chirp-v4'
)

Full Shell Pipeline (What Our Cron Jobs Run)

#!/bin/bash
# Generate music → trim → mux with video → publish

# 1. Generate instrumental track
python3 scripts/minimax-video.py music \
  "$MOOD_PROMPT" -m music-2.5+ --instrumental \
  -o /tmp/reel-music.mp3

# 2. Trim to 12s with fade-out
ffmpeg -y -i /tmp/reel-music.mp3 \
  -af "atrim=0:12,afade=t=out:st=10:d=2" \
  /tmp/reel-music-trimmed.mp3

# 3. Mux with video
ffmpeg -y -i /tmp/reel-video.mp4 -i /tmp/reel-music-trimmed.mp3 \
  -map 0:v -map 1:a -c:v copy -c:a aac -shortest \
  /tmp/reel-final.mp4

# 4. Publish
bash ~/tabiji/functions/publish-reel.sh \
  /tmp/reel-final.mp4 "$CAPTION"

Key Learnings & Gotchas

⚠️ Suno uses bird codenames, not version numbers

chirp-crow = v5, chirp-bluejay = v4.5+, chirp-auk = v4.5. Using chirp-v5 or v5 returns 403. This is not documented anywhere.

⚠️ No artist names in Suno prompts

Suno blocks specific artist names. "Solo piano cover of Cruel Summer" works. "Taylor Swift piano cover" gets rejected. Describe the style without naming the artist.

⚠️ MiniMax is_instrumental only works on 2.5+

Music 2.0 ignores the parameter. Music 2.5 returns error 2013: invalid params, lyrics is required. Only Music 2.5+ supports true instrumental mode.

⚠️ Gemini audio scoring is too generous

All scores cluster at 8-9/10 with no meaningful differentiation. Use adversarial or comparative prompts instead of point-based scoring. Gemini is better as a filter (reject bad) than a ranker (pick best).

⚠️ Suno's API can break without warning

When Suno redesigns their UI (March 13, 2026), the reverse-engineered wrapper stops working. Budget time for debugging selectors. Or just check if captcha is required — direct API calls may work without the browser flow.

⚠️ MiniMax Music is a MUSIC model, not a foley model

Even when prompted for "ambient coffee shop sounds" or "ocean waves," the model produces melodic compositions — not sound effects. Use a dedicated foley model for ambience.

💡 FFmpeg muxing command we use in production

Trim music to fit video, add 2-second fade-out:

ffmpeg -i video.mp4 -i music.mp3 \
  -filter_complex "[1:a]atrim=0:12,afade=t=out:st=10:d=2[a]" \
  -map 0:v -map "[a]" \
  -c:v copy -c:a aac -shortest output.mp4

Final Scorecard

CategorySuno AIMiniMax 2.0MiniMax 2.5MiniMax 2.5+
Music Quality8.5/105/106/107/10
Instrumental Mode9/10N/AN/A8/10
Prompt Adherence8/106/106.5/108/10
API Quality2/10 (reverse-engineered)7/105/109/10
Reliability5/10 (can break anytime)7/105/109/10
Cost Efficiency7/10 ($10/mo sub)10/1010/1010/10
Automation4/10 (fragile)8/106/109/10
Output Duration30–240s20–65s20–50s60–192s
Overall6.5/10 (quality king, infra nightmare)5/105.5/108.5/10 (production king)

Note the paradox: Suno scores lower overall despite producing better music. That's because "overall" includes API quality, reliability, automation — the things that matter for production. If you're evaluating purely on music quality, Suno wins by a mile. If you're evaluating as a production tool, MiniMax 2.5+ wins.

The Verdict

🏆 Production Winner: MiniMax Music 2.5+

For automated travel video production at scale, Music 2.5+ is the right choice. Proper API, working instrumental mode, ~$0.01/track, 99%+ uptime. It's background music for 8-second Reels — it doesn't need to be Spotify-quality, it needs to not suck and to always work. MiniMax delivers both.

👑 Quality Winner: Suno AI

For standalone music, piano covers, or anything where a human will actually listen to the music, Suno is in a different league. The chirp-crow (v5) model produces genuinely beautiful piano music with natural dynamics and emotional range. The $10/mo Pro plan is worth it. The reverse-engineered API situation is not ideal — budget time for maintenance when Suno updates their site.

🤖 Surprise MVP: Gemini 2.5 Flash as Audio Analyzer

Not a music generator, but worth calling out: Gemini's ability to analyze, describe, and understand audio is impressive. It correctly identifies instruments, keys, moods, and arrangement techniques. Just don't use it for scoring — it's too nice. Use it as a filter ("is this solo piano? yes/no") and a describer ("what instruments are in this track?"), not a critic.

What We'd Change

  • Official Suno API — would change everything. We'd use it for all quality-critical music immediately.
  • Adversarial Gemini scoring — we want to try "find 3 flaws" prompts and pairwise comparisons for better differentiation.
  • Shorter generation options — 8–15 second tracks purpose-built for Reels, so we don't trim 192 seconds down to 12.
  • Open-source catch-up — MusicGen and Stable Audio are improving but still trail behind. Self-hosting would eliminate API dependency.

For now, the split is clear: MiniMax for production, Suno for quality. Between the two of them, plus a Pixabay fallback library and the occasional trending Instagram audio, we never have to think about music licensing again. The entire music budget for hundreds of Reels per month is less than a single stock music license used to cost.


All MiniMax audio samples on this page were generated from identical prompts via the MiniMax Music API on March 11, 2026. Suno samples were generated between March 11–13, 2026. No post-processing was applied — these are raw API/generation outputs. Gemini scoring was performed on March 13, 2026 using Gemini 2.5 Flash.