Robert Važan

Language models are the ultimate scripting engine

Some time ago, I blogged that scripting is a programming paradigm, a quick-and-dirty complement to software engineering rather than just a set of programming language features. As it turns out, instructing language models like ChatGPT has all the characteristics of scripting. Language models aren't just usable as a scripting environment. They outperform traditional programming languages by a wide margin. They are so good I expect language models to satisfy more than 90% of use cases for scripting in the future among programmers and to make scripting widely accessible to non-programmers.

Is it really scripting?

You can write down rules, paste in the input, and let the language model generate output. That's very much what you would do with a script. Rules given to the language model effectively serve as a program. Rules can be long and complicated, covering numerous special cases. I have not encountered any complexity limit yet. You can refine rules interactively, letting the language model regenerate output after every change. You can process another input with the same rules. You can even save the rules and reuse them later like you would do with personal script library. So yes, giving complex instructions to language models resembles scripting.

As I explained in the mentioned blog post, scripting as a programming paradigm differs from software engineering by making certain assumptions:

All these criteria are satisfied in an interactive session with a language model, so yes, language models can be used as scripting engines. They do make mistakes, but that's what the supervised execution is for. Scripts in traditional programming languages are error-prone too, especially on complex inputs.

Common sense programming

Language models also bring something new to the table: common sense. When writing a script in traditional programming language, you spend considerable amount of time dealing with the input and output formats, formalizing decision logic, and handling special cases and other details. Language model's common sense lets you avoid most of that, because you now have new programming primitives:

In traditional programming language, you work on top of abstractions provided by libraries. With language model, you instead work on top of a probabilistic model of the world. Language model can be thought of as a huge library of common sense knowledge. There's no API, because all this common sense manifests as reasonable defaults for everything. You only have to specify what is unique or unusual about the task. You end up with much shorter script that is much more robust in corner cases.

Workflows

I have developed three workflows that can all be considered a form of scripting.

Example-only mode

The simplest way to get language model to do something is to just give it example input and output and let it figure out what needs to be done.

  1. Paste example input introduced with "Given this input:".
  2. Paste example output introduced with "I have converted it like this:" (or "translated", "modified", ...).
  3. Paste actual input introduced with "Convert this similarly:".
  4. Let the language model generate the output.
  5. Optionally paste corrected output introduced with "This is the correct output:".
  6. Repeat with new input from step 3.

This is simple and easy to start with. Context window length is less limiting, because only current input and last output (or its correction) really need to be in the window. For simpler tasks, it will often work even if current input consumes 90% of the window and only a fragment of the last output is visible to the model.

The downside is that it does not work for more complicated tasks and the model has too much freedom in how it interprets the examples. The model also repeatedly forgets everything it has learned from previous examples.

Crafted ruleset mode

When quality of output is very important or when the task is more complicated, it is better to give the model some instructions above the input. There is no conversation. To prevent the model from forgetting instructions, new chat session is started for every input with instructions always above the input. In ChatGPT, this can be done conveniently by replacing input in the current session.

  1. Describe the task and provide initial list of rules. It's best to start brief and refine later.
  2. Provide short canonical input and output example. The model can assist you in crafting the example by abbreviating longer real-world example in separate chat session.
  3. Append actual input to the prompt.
  4. Let the model generate the output.
  5. If the output is wrong, adjust rules and canonical example and retry from step 4.
  6. Repeat with new input from step 3.

This will provide you with more control at the cost of some tinkering and having to fit rules and canonical example in the context window in addition to full input and output.

Self-programming mode

Crafting rules and especially the canonical example by hand is a bit laborious. We can instead ask the language model to incorporate feedback.

  1. Provide initial list of rules and canonical example. One of the rules is that all rules as well as the canonical example must be repeated after every response. Another standard rule is that after rules change, input must be repeated and output regenerated.
  2. Paste actual input.
  3. Let the model generate the output and repeat rules and the example.
  4. If the output is wrong, provide feedback, corrections, or a small correct example. Ask the model the update rules and the canonical example. Let it regenerate the output. Optionally iterate rule changes.
  5. Repeat with new input from step 2.

This is more convenient and gives the model more opportunities to be smart and useful, but it also gives it a lot of opportunities to corrupt the rules and it strains context window even more than before.

Context windows and character limits

The biggest current limitation is that GPT-4 cuts off its answer after 8K output characters. That's less than 2K tokens, way below GPT-4's 8K context window length. I can ask it to continue, but the remaining output is often messed up and sewing two parts of it together is a nuisance. Bard even limits queries to 10K characters. Sometimes I can split the task into smaller chunks. Other times the task is just too complicated to be solvable within the current limits. Maybe OpenAI API does not have character limit (no mention of it anywhere), but API is currently too much of an inconvenience for me and it's likely to get expensive when used to apply a script to a lot of inputs.

Beyond character limits, there's context window limit. GPT-4 reportedly has 8K context window and my little experiment seems to confirm that. Forgetting starts beyond 40K characters, which is 8K tokens at 5 characters per token. I estimate Bard's context window to be 4K tokens based on similar experiment. OpenAI has 32K model, but it's API-only and as I have said before, API can be inconvenient and expensive. Anthropic's Claude has 100K window, but it's not available where I live.

I nevertheless believe the technology will improve and get cheaper over time, eventually becoming the default choice for nearly all scripting jobs. Personally, I am already doing most scripting using GPT-4. I am also scripting much more than I used to, because language models make it so easy.

Where does this leave traditional scripting?

Scripting using traditional programming languages will not go extinct. There are scripting tasks that are unsuitable for language models, because some scripts need to have reproducible behavior, some applications are too risky, and processing larger amounts of data requires efficiency that language models by their nature cannot provide. Language model can nevertheless help you develop a script in traditional programming language that you then run separately.