December 11, 2023

The Ever-Shifting World of Large-Language Model Programming

Leaving room for dreaming and future growth

In this fast-paced realm of 21st century technology, the conversation around large-language models (LLMs) like ChatGPT and OpenAI's extensive library of APIs rapidly evolves like nothing else before. A particular point of interest is how these models are developed and refined over time—a process that directly impacts the functionality of applications like Subtxt and Subtxt Muse. In this article, we’re exploring this dynamic world, debunking myths about AI "laziness," and highlighting the innovative strides made by Subtxt Muse, which sets a new standard in narrative intelligence and creative AI assistance.

GPT's Alleged Laziness: A Myth Debunked by Subtxt Muse

There's been chatter in the tech community lately (particularly X/Twitter) about GPT models seemingly becoming lazier. Some even suggest that it's akin to a digital form of Seasonal Affective Disorder (SAD)--with some going so far as to prove empirically that GPT in December is less eager to write than GPT in May:

However, our experiences with Subtxt Muse paint a different picture. Muse isn't just keeping pace; it's setting new benchmarks in narrative intelligence and storytelling.

Muse: Going Beyond the Brief

In a recent test, Muse not only outlined a thematic structure for a short scene but went a step further to begin writing it.

And the quality of writing? Significantly improved over our tests just earlier this year.

This recent test exemplifies Muse's capacity to exceed expectations (and even surprise us!), blending our 30+ years of story structure expertise with cutting-edge AI.

The Continuous Evolution of LLMs

OpenAI recently shared insights on Twitter about the evolution of these models. As they explain, building GPT-4 and future GPTs is an ongoing process, more art than science.

Each new model is crafted, tested, and evaluated meticulously. This artisanal approach ensures that each iteration is a step forward in making LLMs more capable and versatile.

(And it’s funny that they would use “artisanal” to describe their approach, as Subtxt was built with Laravel, which describes itself as “The PHP Framework for Web Artisans”) 😊

The big takeaways from their artisanal and dynamic approach:

Training chat models is complex: Each training run, even with identical datasets, can yield models with distinct personalities, writing styles, refusal behaviors, and biases.
Evaluation and Deployment: OpenAI conducts thorough offline and online tests before deployment, making data-driven decisions to enhance real-user experiences.
User Feedback is Key: OpenAI encourages community feedback, understanding the importance of user experiences in guiding the evolution of these models.

And I would say these are exactly the kind of strategies we employ in building out Subtxt and Subtxt Muse. With so many different writing styles and individual biases among writers, Muse works to balance all against the backdrop of an objective and predictive narrative framework.

All the while, leaving a little bit of room for creativity.

Andrej Karpathy on Creativity and "Hallucinations"

Andrej Karpathy, one of the original founding members of OpenAI, offers a fascinating perspective on the so-called "hallucination problem" in LLMs. He views LLMs as "dream machines," with hallucination being their inherent nature, not a flaw.

I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.

We direct their dreams with prompts. The prompts start the dream, and based on the LLM's hazy recollection of its training documents, most of the time the result goes someplace useful.

It's only when the dreams go into deemed factually incorrect territory that we label it a "hallucination". It looks like a bug, but it's just the LLM doing what it always does.

I really like this idea of “directing dreams.” If anything, that perfectly describes the act of building a story with Subtxt and Subtxt Muse. You’re just leading the dream-machine to fulfill your imagination’s greatest intent.

Biggest takeaways from Karpathy’s post:

Prompt-Driven Dreams: LLMs use prompts to start a dream-like process, often leading to useful results.
Hallucinations as Feature, Not Bug: When LLMs venture into factually incorrect realms, we label it a "hallucination." But it's simply the LLM doing what it's designed to do.
The Creativity Spectrum: LLMs are 100% creative, dreaming up new content, unlike search engines which are 0% creative, merely echoing existing data.

Karpathy acknowledges that while LLMs thrive on creativity, LLM Assistants need to control hallucinations (which is where A LOT of our work on Subtxt happens!). Techniques like Retrieval Augmented Generation (RAG) anchor LLM responses in real data, and other methods like decoding uncertainty and tool use are areas of active research.

Subtxt used Retrieval Augmented Generation even before it was called that! It didn’t even have a name when we started back in ’22 (at least, every YouTube video wasn’t labeled as such). But over the past year or so, we’ve found the sweet spot between complete dream and structured reality, between chaos and order--and this is why Muse is so far and above more helpful than your basic vanilla ChatGPT or CustomGPT. The knowledgebase (what Subtxt calls it’s “memory matrix”) is just incomparable.

The Takeaway

Subtxt and Subtxt Muse, like OpenAI's models, are ever-evolving. We continuously monitor performance, responses, and integrate our deep narrative understanding to enhance the user experience. In the realm of storytelling and creativity, what might be seen as a "hallucination" in other contexts becomes a feature, fueling the innovative potential of these dream machines.

Remember, the world of LLMs is always in flux, and this change is not just inevitable but essential, especially in fields where creativity and imagination are not just welcomed but essential.