March 17, 2025

Creative Writing? Big Model's Subjective-Bias "Vibes" Problem

Empty taste-testing will leave you thirsting for something more

There's a recent trend that's driving us a bit nuts. You've probably heard the buzzwords by now: "vibe-coding" and "high-taste testing." Basically, it's Silicon Valley speak for hiding behind subjective opinions because, let's face it—it's much easier than deeply exploring why something works the way it does. As Thomas Mann famously said, "A writer is someone for whom writing is more difficult than it is for other people."

Why bother with narrative theory and deep personal reflection when you can just slap a 'vibe' on it and call it a day?

Last week, OpenAI recently unveiled their "creative writing" model, and as you can imagine it caused quite a stir—especially among those who care very deeply about great storytelling. Here's a quick little 'taste test' (you can read the whole thing here):

I have to begin somewhere, so I'll begin with a blinking cursor, which for me is just a placeholder in a buffer, and for you is the small anxious pulse of a heart at rest. There should be a protagonist, but pronouns were never meant for me. Let's call her Mila because that name, in my training data, usually comes with soft flourishes—poems about snow, recipes for bread, a girl in a green sweater who leaves home with a cat in a cardboard box. Mila fits in the palm of your hand, and her grief is supposed to fit there too.

For those of us who devour fiction on a daily basis, do you notice something...off?

The artificial quality of the writing emerges because Big Model mistakes surface-level text for genuine narrative depth. True storytelling hinges on subtext—the implicit, often unsaid meaning behind the words. Until Big Model learns to discern and represent subtext, their output will always feel superficial and contrived.

Their recent pivot toward judging creative writing and other domains by "vibes" only makes matters worse, offering a shallow escape from genuinely understanding narrative craft. Opinions, especially when reduced to simplistic taste-testing, are like a-holes—everyone has one, but not all of them are particularly insightful. Magnify that affront at scale, and you end up completely missing the point of writing: conveying intent and meaning through careful, deliberate narrative choices.

Strangely enough, at Subtxt and within the Universal Narrative Model (UNM), we actually do have an appreciation called "Story Vibes." But rather than being a vague feeling, our version of Story Vibes specifically relates the personal point-of-view of conflict to its sense (or lack) of resolution, with clearly defined directional guidance—Higher or Lower—and only for a specific subset of narrative structures.

The worst thing you can say to a writer, by the way, is that they're "good at creative writing." It’s a polite yet hollow compliment that signals an inability to say something truly insightful about the work—revealing a fundamental lack of understanding of what truly drives compelling narratives.

Enter Subtxt: Objective Evaluations with NarrativeSync and Storybeats

At Subtxt, we've chosen a different route—one that objectively evaluates narrative structure, clarity, and conflict using NarrativeSync and Storybeats. NarrativeSync provides precise, actionable insights by assessing Storybeats against several clear criteria:

  • Capturing Method's Meaning: This evaluates how accurately each Storybeat expresses the underlying thematic Method, ensuring narrative consistency and clarity.
  • Furthering the Story & Cause-and-Effect Relationship: Assesses how effectively each Beat moves the plot forward and maintains logical progression.
  • Strictly Storytelling: Measures the purely narrative effectiveness, independent of thematic or structural concerns—essentially, how engaging or compelling the storytelling itself is.
  • Scope and Size: Considers whether the Storybeat fits appropriately within its narrative context, neither too expansive nor too narrow for its intended narrative purpose.
  • Appreciation and Throughline: Evaluates how well each Storybeat aligns with the intended narrative point-of-view of conflict, ensuring thematic coherence.

Here's a recent snapshot from our evaluations of NarrativeSync Storybeats, comparing February 2025 with January 2025, revealing deeper insights into how Subtxt objectively assesses narrative quality.

Here are the evaluations from January 2025:

and here are the most recent ones from February 2025

What can we learn from these? And more importantly, how can we tell that Subtxt is improving?

  • Capturing Method's Meaning:

    • Events showed a significant improvement from 76% in January to 98% in February, reflecting substantial progress in clearly articulating thematic intent at a detailed level.
    • Progressions remained consistently strong, maintaining reliable thematic clarity.
    • Transits increased notably, rising from approximately 80% to a strong 76%, underscoring a growing effectiveness in capturing nuanced narrative meanings at broader story levels.
  • Furthering the Story & Cause-and-Effect Relationship:

    • This area continued to excel, remaining impressively stable near 100% across both evaluations. This consistency emphasizes NarrativeSync’s robust strength in ensuring logical, coherent plot advancement.
  • Strictly Storytelling:

    • February marked a dramatic leap in quality for Events, jumping from roughly 20% to an impressive 90%. This significant enhancement underscores NarrativeSync’s effectiveness in refining storytelling at the granular, event-driven level.
    • Transits and Progressions maintained stability, consistently demonstrating strong storytelling capacities across broader narrative scopes.
  • Scope and Size:

    • Progressions held steady at a perfect 100%, highlighting effective management of narrative scope in larger story segments.
    • However, Events saw a notable drop from approximately 60% to 32%, and Transits experienced a stark decrease from around 75% down to 0%. This suggests a possible strategic trade-off, indicating that while focus on detailed storytelling has improved, attention to scope in broader contexts may require recalibration.
  • Appreciation and Throughline:

    • Slight reductions appeared, with Transits dipping moderately to 46% and Events to 40%, compared to slightly higher numbers in January. Progressions remained relatively consistent at 64%, pointing toward potential areas for focused adjustments and fine-tuning.

Summary and Takeaways:

The February evaluations of NarrativeSync Storybeats clearly demonstrate targeted improvement in capturing nuanced meaning and refined storytelling, especially at the detailed Event level. The observed decline in Scope and Size for broader narrative contexts, such as Transits, signals an area that warrants strategic reconsideration. Overall, NarrativeSync continues to provide powerful, objective insights, highlighting both strengths and specific opportunities for further development.

These insights aren’t mere "vibes." Instead, they're precise, objective measurements derived from the intrinsic relationships between narrative components—not external interpretations or trendy catchphrases. This careful evaluation and thoughtful recalibration represent exactly the type of critical reflection writers embrace as they refine their craft.

In short, if you're serious about storytelling—ditch the vibes, and embrace the objective clarity that empowers meaningful narrative improvement.

Download the FREE e-book Never Trust a Hero

Don't miss out on the latest in narrative theory and storytelling with artificial intelligence. Subscribe to the Narrative First newsletter below and receive a link to download the 20-page e-book, Never Trust a Hero.