2024-11-02 Using OpenAI structured JSON output from the command line

#OpenAI #AI #ChatGPT #JSON #LLM #structured output #command line

Using OpenAI structured JSON output from the command line

If you like to use AI for doing daily tasks from the command line, e.g. to extract data from unstructured text or even images or audio, and want to feed the results into other programs, you have to make the output machine readable. Of yourse it's a well proven technique to describe the output you want to the AI and give some examples there, but it still can happen that the AI goes off track and deviate from the intended output, so that your process stops there. To prevent that, OpenAI provided the nifty feature structured output where you give a JSON schema and OpenAI makes sure that the response of the model will match that format. (Which is, by the way, a really interesting problem to solve: that requires on each output token to verify which of the tokens suggested by the LLM has a continuation that will still match the schema. That'd be a lot of fun to write, but alas would be way too much for a sparetime project.) I integrated that into my chatgpt swiss knife type command line tool for using OpenAI's chat completion API, so that it's possible to use from the command line. But I didn't stop there - since writing JSON schemas is a bit of a hassle, I added some shortcuts for common use cases.

Of course, there also are many more command line tools in my ChatGPT toolsuite which you might like.

Illustration - creating JSON with an AI

An example

As an example, let's extract the links from the slides of my AdaptTo 2024 talk about the Composum AI
in a machine readable format, while pretending the slides weren't properly linked. We'll use multimodal output, so the first step is turning the slides into images we can submit to OpenAI. We'll have suggestbash make a command line suggestion: suggestbash split talk.pdf into individual images suggests e.g. pdftoppm -png talk.pdf slides to generate files slides-01.png to slides-31.png. Now, you can already get the links using chatgpt from that, using image input:

cmd=chatgpt for fil in slides-*; do cmd="$cmd -i $fil"; done $cmd 'print urls of links in the image; if there are no links print nothing'

That does work, but you might get a codeblock around the links or comments and whatnot, and would have to dissuade the AI to do that in the prompt, or take more clever measures. So let's see how we can avoid that.

Using a JSON schema

Now let's go the structured output way and provide a JSON schema. Since that'd be annoying to provide by hand: OpenAI's playground does offer an assistant that can generate a schema for you when you choose response type json_schema . I'll use the description "output a list of urls", and it'll create one for me:

{ "name": "url_list", "schema": { "type": "object", "properties": { "urls": { "type": "array", "description": "A list of URLs.", "items": { "type": "string", "description": "A single URL." } } }, "required": [ "urls" ], "additionalProperties": false }, "strict": true }

Now let's call chatgpt again, this time using $(printf -- '-i slides-%02d.png ' {1..31}) (Kudos to ChatGPT) for the -i slides-01.png -i slides-02.png ... arguments, and the schema file:

chatgpt $(printf -- '-i slides-%02d.png ' {1..31}) -rf urlsschema.json 'print urls of links in the image as JSON'

That nicely prints us

{"urls":["https://ai.composum.com","https://github.com/ist-dresden/composum-AI","https://www.composum.com/","https://www.stoerr.net/ai.html","https://github.com/ist-dresden/composum-nodes","https://www.stoerr.net/ai"]}

Shortcut arguments

I added shortcuts that save you creating a schema file for two common usecases: creating a simple object with some string attributes, and a list of simple objects with some string attributes. Here is an excerpt from the chatgpt -h help:

Response Options: -rj [R]esponse mode JSON: model outputs a JSON object -rf schemafile Structured output: requests that the [r]esponse conforms to the given JSON schema read from a [f]ile. -ra attr1,... Structured output [r]esponse - JSON with [a]ttributes: comma separated list of attributes to include in the JSON response. Alternative to -rf - creates a simple schema with these attributes as string properties. -rar attr1,... Structured output for JSON [r]esponse [ar]ray of objects with the given attributes - e.g. for extracting a list of entities from an input. Alternative to -rf and -ra , all are string properties.

If I want, e.g., to extract the author and talk title from the first slide, I can use the -ra option:

chatgpt -i slides-01.png -ra author,title "Print talk author and talk title as JSON"

That prints

{ "author": "Dr. Hans-Peter Störr, IST GmbH Dresden", "title": "Composum AI - Supporting the Content Author with LLM" }

Under the hood it creates a JSON schema for that object for you, so you don't have to bother with that. Or let's create a list of objects:

chatgpt $(printf -- '-i slides-%02d.png ' {1..31}) -rar name,description 'print the slide name and a content description as JSON'

returns

[ { "name": "Slide 1", "description": "Introduction to the presentation and overview of the conference." }, { "name": "Slide 2", "description": "Details on the talk's content, focusing on the functionalities of Composum AI." }, { "name": "Slide 3", "description": "Introduction to Dr. Hans-Peter Störr and his background." }, ... ]

Update: I've also added the possibility to have numeric or boolean attributes and string arrays:

chatgpt -ra a:stringarray,n:numericattr,i:integerattr,b:booleanattr,a:i:integerarray ...

Response Options: -rj [R]esponse mode JSON: model outputs a JSON object -rf schemafile Structured output: requests that the [r]esponse conforms to the given JSON schema read from a [f]ile. -ra attr1,... Structured output [r]esponse - JSON with [a]ttributes: comma separated list of attributes to include in the JSON response. Alternative to -rf - creates a simple schema with these attributes. By default they are strings; if you add a prefix "a:" it'll be an array of strings, "i:" and integer, "n:" a number, "b:" a boolean. Can be joined: "a:i:foo" is an array of integers. -rar attr1,... Structured output for JSON [r]esponse [ar]ray of objects with the given attributes - e.g. for extracting a list of entities from an input. Alternative to -rf and -ra , all are string properties or other, see -ra.

BTW: if you get confused by all the chatgpt options - how about using the built in help feature -ha:

chatgpt -ha how can I take the prompt from prompt.mp3

Conclusion

The chatgpt tool from my ChatGPT toolsuite makes it easy to use OpenAI's Structured output to generate machine readable output for the tasks you want the AI to do, and opens many possibilities to join with other tools using the best Unix spirit of combining small tools to do great things. Give it a try! I've been using that from extracting URLs from screenshots, categorizing banking statements, extracting information from webpages, asking quick questions to ChatGPT from the command line, and many more things.