Deterministic AI

Generative AI has a PR problem.

The vast majority of ChatGPT users have been misled into thinking AI is often stupid, especially with regard to numbers. Indeed, generative AI cannot perform math with precision because it is designed to do only one thing -

Predict the next word based on a collection of words, specifically tokens.

When math is involved, it goes off the rails quickly.

While the LLM knows about numbers like 22, zero, and 8 – it has no concept of zero, 8022, or 2208 because these four character [number] values resolve to three different tokens in the LLM - zero, 8, and 22. While OpenAI has been actively adjusting its models to support math with greater precision, it remains untrustworthy in some instances.

A computation that works:

22080 * 3 + 1

A computation that fails:

22080 * 32 + 1

The first computation is correct (66,241). The second example fails - it responds with 704,641. The correct answer is 706,561. Bottom line: LLMs cannot perform simple math in every case.

Token lengths (i.e., long numbers) are why generative AI cannot perform reliable computations. This behavior needs to be explained to uninformed users who have come to expect computers, large and small, to get basic math right every time. But if they understood the purpose of generative AI, it would soften this deep disappointment.

But, there are many cases where generative AI is very accurate. Unfortunately, three things stand in the way of becoming an AI-first developer in the creation of no-code and low-code solutions.

  1. You already don’t trust it.
  2. You don’t put in the work to become a skilled, prompt engineer.
  3. You won’t take the time to test your generative AI outputs.

If you did these things, you would accelerate your solution-building efforts. You would also craft features and capabilities that were heretofore considered impractical. You will need this competitive advantage as no/low-code merges with generative AI in the not-too-distant future.

Sidebar: No-code and AI will soon vanish. In its place will be #gen-apps. :wink: More on this another time.

In a recent thread, participants claimed generative AI wasn’t precise enough to format numbers into the EU equivalent - i.e., $2,192.45 → $2.192,45. This document demonstrates the prompt I used applied to 1,080 random tests. The results are perfection.

So, why does this work reliably when we just established that generative AI cannot handle large numbers with any predictability?

Answer: Strings

It is skilled at this task if you ask generative AI to do something with a string. You can even refer to it as a number in the prompt, and the instructions (i.e., the term “format”) will cause it to be treated as a string.

Admittedly, in this case, there are many benefits to be had if we could perform math and CFL algorithms in native EU numbers. This approach requires all computations to exist in US format and use AI to render EU formats. However, this story is more than the EU formatting - it demonstrates that generative AI can be deterministic and eliminate complexities while reducing time-to-production.

I advise spending the time to do the work to become an AI-first builder of solutions. It’s undoubtedly the future, and the people who want your job will be doing the work.

5 Likes

hi @Bill_French ,
interesting demonstration of AI capabilities.

would you be able to express durations between dates in years, months, weeks and days (hours if available) . So far I failed in constructing a prompt that did the job properly

looking forward!
cheers, christiaan

2 Likes

Interesting question! I’ve got a table ready to go in playmode with a bunch of date examples

1 Like

You’ll have to be more specific with an example. Doesn’t the duration formula already do this? I don’t get the sense there is anything about Coda that is defficient when it comes to computing dates and duration between dates. What am I missing?

3.57 days == 3 days 13 hrs 55 mins 12 secs

1 Like

Ah no the issue is when you cross over to months, that’s why the built-in duration formula doesn’t go higher than days, and why I made Delta Date :slight_smile:

1 Like

I see, yeah - I remember your pack.

Today, with Coda AI, I might not go to the trouble of coding something like this. I haven’t applied a rigid test protocol to this data, but I don’t think AI struggles to do this type of work. The outputs seem reasonable - possibly perfect?

@Christiaan_Huizer is this what you’re looking for?

It took me 21 minutes start to finish to craft this for two inference output types. But almost 7 minutes was spent understanding your data and coming up with testable values for the learner prompts. How long did it take you to build the Pack?

Here are the two prompts I used.

3 Grammar Units

You are an expert in calculating duration. I will give you the beginning 
date/time and the ending date/time, and you will calculate and generate 
the duration in a human-friendly manner. You will express the output in 
a manner consistent with the example output and only the example output.

Beginning Date/Time: thisRow.[Beg Date]
Ending Date/Time: thisRow.[End Date]

Example Data:
Beginning Date/Time: 1953-06-03, 3:30 AM
Ending Date/Time: 2023-09-30, 6:30 PM

Example Output:
70 years, 3 months and 4 weeks ago

[output]

All Grammar Units

You are an expert in calculating duration. I will give you the beginning
date/time and the ending date/time, and you will calculate and generate
the duration in a human-friendly manner. You will express the output in
a manner consistent with the example output and only the example output.

Beginning Date/Time: thisRow.[Beg Date]
Ending Date/Time: thisRow.[End Date]

Example Data:
Beginning Date/Time: 1953-06-03, 3:30 AM
Ending Date/Time: 2023-09-30, 6:30 PM

Example Output:
70y 3mo 4w 2d 15h 0m 0s ago

[output]
2 Likes

Thanks @Bill_French ,

Wonderful contribution, however I am afraid that the prompts fail.

The check I executed is the following:

thisRow.[End Date] - thisRow.[Beg Date]

this results in the durations expressed as days and this one will bring back to years to have an idea:

this result is compared with your AI (All Grammar Units)

It is a rather easy check to execute

Since I’d like to use Bard, I asked Bard to execute the calculation, Bard failed as well.

So for the moment the pack provided by @Rickard_Abraham is an option, but I would love to see Coda solving this.

Cheers, Christiaan

2 Likes

Hypothesis: your check methodology needs a closer examination. Is it possible under the covers the floating point error is being manifest as a slight deviation?

My assessment, although, without a lot of time to dig into the math.

Let’s assume I was born 03-Jun-1953 @ 3:30a. Let’s also assume it is currently 03-Jun-2023 @ 3:30a. I would be exactly 70 yrs of age.

Coda determines (using it’s duration computation that you used) the delta between these two points in time is exactly 25567 days. Dividing that number by 365.25, and we have 69.99863107460644, which appears to be incorrect if – AND ONLY IF – we can agree that there are exactly 365 and one-quarter days in each of the 70 years I have been breathing.

But even if we advance the end date to 2023-06-03, 3:59 AM, clearly beyond the 70 year mark, Coda returns 25567 days 29 mins, and divided by 365.25, still yields less than 70 years (69.99868621187923). This basic calculation of a finite duration clearly shows this AI validation approach has a flaw of some unexplained cause. My money is on the floating point error present in all microprocessors.

In contrast, my AI approach seems to result in an accurate assessment of my age.

This doesn’t mean the AI approach is perfect or that the prompts couldn’t be flawed, but it does underscore the challenges of automated test methodologies.

I’d like to see the deltas on the deviations you are computing, but most important - the prompt used in your examples.

1 Like

below you find the code part:

I checked on years and then the floating point issue does not seem to play a role.

The doc has been shared with you privatly.

Enjoy your Sunday!
Cheers, Christiaan

1 Like

Okay - so, the learner shot is specifying ISO date formats while you are feeding it with a different format, right? (i.e., dd/mm/yyyy)

1 Like

Let’s assume I was born 03-Jun-1953 @ 3:30a. Let’s also assume it is currently 03-Jun-2023 @ 3:30a. I would be exactly 70 yrs of age.

Coda determines (using it’s duration computation that you used) the delta between these two points in time is exactly 25567 days. Dividing that number by 365.25, and we have 69.99863107460644, which appears to be incorrect if – AND ONLY IF – we can agree that there are exactly 365 and one-quarter days in each of the 70 years I have been breathing.

But even if we advance the end date to 2023-06-03, 3:59 AM, clearly beyond the 70 year mark, Coda returns 25567 days 29 mins, and divided by 365.25, still yields less than 70 years (69.99868621187923). This basic calculation of a finite duration clearly shows this AI validation approach has a flaw of some unexplained cause. My money is on the floating point error present in all microprocessors.

2 Likes

that is an interesting observation @Bill_French , what are the consequences of your line of thought ? Can we not trust the computations when it comes to dates and time?

I wonder what @Agile_Dynamics has to say about this

my perspective is that when we look at a number as 69.99868621187923, we can round it to 70 because 69.998 results in 70 when brought back to two decimals. Maybe this is too simple.

When we look at larger deviations in the example, see below, I have a hard time believing that the AI is correct and the computations fail. Take for example line 3.

The calculationDelta is based on
thisRow.endDate - thisRow.startDate

the checkYears is based on:
thisRow.calculatedDelta.ToDays() / 365.25

the deviation is the result of:
thisRow.checkYears.Split(".").First() != thisRow.Bill.Split("y").First()

the Bill prompt is :slight_smile:

You are an expert in calculating duration. I will give you the beginning

date/time and the ending date/time, and you will calculate and generate

the duration in a human-friendly manner. You will express the output in

a manner consistent with the example output and only the example output.


Beginning Date/Time:  [startDate]

Ending Date/Time: [endDate]


Example Data:

Beginning Date/Time: 1953-06-03, 3:30 AM

Ending Date/Time: 2023-09-30, 6:30 PM


Example Output:

70y 3mo 4w 2d 15h 0m 0s ago


[output]

thx for the wonderful discussion @Bill_French

2 Likes

As is typically the case, prompts are the single biggest source of AI output flaws, and I did discover a few thanks to @Christiaan_Huizer forcing a closer examination. While his validation test was clearly impacted by a still unknown source of floating point precision, the prompt I used included only a single learner shot. As such, I was able to identify at least two edge cases where the year calculation was overshot by one - typically when computing a date range with months less than a full year.

I think this is resolved by adding a few learner shots - we’ll see. :wink: My 21-minute solution has now expanded to 2:40m but 1:10m was spent trying to understand the [presumed] floating point issue in @Christiaan_Huizer validation approach.

IMPORTANT: My prompts are based on ISO date formats (yyyy-mm-dd). If you need to use different formats, be sure to modify the prompts accordingly.

3 Grammar Units

You are an expert in calculating duration. I will give you the beginning 
date/time and the ending date/time, and you will calculate and generate
the duration in a human-friendly manner. You will express the output in
a manner consistent with the example outputs and only the example outputs.

Beginning Date/Time: thisRow.[Beg Date]
Ending Date/Time: thisRow.[End Date]

Examples:

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-03-10, 8:05 AM
Output: 14 years, 0 months, 0 days ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-03-11, 8:05 AM
Output: 14 years, 0 months, 1 days ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-04-11, 8:05 AM
Output: 14 years, 1 months, 1 day ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-02-10, 8:05 AM
Output: 13 years, 11 months, 0 days ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-02-09, 8:05 AM
Output: 13 years, 10 months, 29 days ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2035-12-31, 8:05 AM
Output: 13 years, 3 months, 10 days from now

[output]

All Grammar Units

You are an expert in calculating duration. I will give you the beginning 
date/time and the ending date/time, and you will calculate and generate
the duration in a human-friendly manner. You will express the output in
a manner consistent with the example outputs and only the example outputs.

Beginning Date/Time: thisRow.[Beg Date]
Ending Date/Time: thisRow.[End Date]

Examples:

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-03-10, 8:05 AM
Output: 14y 0mo, 0d 0h 0m ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-03-11, 8:05 AM
Output: 14y 0mo, 1d 0h 0m ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-04-11, 8:05 AM
Output: 14y 1mo, 1d 0h 0m ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-02-10, 8:05 AM
Output: 13y 11mo, 0d 0h 0m ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2036-02-09, 8:05 AM
Output: 13y 10mo 29d 0h 0m ago

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2035-12-31, 8:05 AM
Output: 13y 3mo, 10d 0h 0m from now

Beginning Date/Time: 2022-03-10, 8:05 AM
Ending Date/Time:      2035-12-31, 9:06 AM
Output: 13y 3mo, 10d 1h 1m ago

[output]
2 Likes

Yep - I saw these cases as well and agree - these are predictable flaws. But the cause isn’t certain. My additional learner prompts seem to address these cases, but I haven’t tested more than a handful.

2 Likes

I believe that is exactly what the AI inference is doing. It is rounding and your tests were not, so the mismatch occurred. However, this does not explain the bigger deviations in your examples, and to that I can only assume the LLM was getting confused by holding the years delta as integers and adjusting the months in cases of partial years. This is why I added the additional learner prompts and they seem to make it better.

2 Likes

good morning @Bill_French ,

I applied this prompt and it got all the years right on 85 rows. I noticed that when there are mistakes, it is mainly in the the first part, seldom in what follows. I did also set up a calculation for months, and this one was also correct, may the rest be correct as well AI did something surprisingly powerful.

You are an expert in calculating with dates. 
I need you to give me the delta between  **[startDate] and [endDate]**  
Only use year(s), month(s), week(s) or day(s) and hour(s) when necessary. Do not show the instructions in the outcome, show the outcome as short as possible.

You will express the output in a manner consistent with the example outputs and only the example outputs.

Example outputs are:
2 minutes
1 hour and 30 minutes
2 days, 1 hour and 30 minutes
1 week
8 days and 30 minutes
1 month, 2 weeks, 4 days, 16.5 hours
4 years, 1 month, 2 weeks, 1 day, 18 hours
2 weeks
32 years and 3 months
33 years and 21 days
36 years, 1 month, 3 weeks and 6 days.
23 years, 5 months, 13 days, 6 hours
39 years, 7 months, 2 weeks, 1 day, and 30 minutes

Do not use instructions in the outcome.
When you doubt the outcome, leave it blank.

I am rather happy with the outcome! Cheers, Christiaan

4 Likes

That is awesome! I love this prompt approach.

As you have proven, in many cases, the sculpture you seek is in the LLM and the prompt (specifically the learner shots) help you carve the solution.

2 Likes

OPINION:

When will the Codans realize there’s a market for Coda AI prompts, a need to reuse tested and well-documented prompts, and a requirement for commenting inside prompts such that the comments are masked from the LLM? Lastly, learner-shots; Coda uses these internally in conjunction with embedding vectors (I assume), so why not expose access to these in CFL?

PREDICTION:

I’ll bet a fine sandwich that off in the not-too-distant future, two lines will cross - CFL dominance and AI dominance. My date prediction is Q2-2025; roughly 20 months from now CFL will begin a downward trend to obscurity along side COBOL.

2 Likes

Interesting visual. Most readers may not know about COBOL…

anyway, we have a limited time to show the audience how to Coda correctly to help them building the checks and balances they tend to forget with AI, until the moment it all goes wrong. Instead of blaming AI they should have been encouraged to put in place some checks.

Anyway I hope to see a different picture, one that shows an upward AI and Coda line!

1 Like

Ha ha! That’s the entire point of using COBOL to drive home this prediction. In the distant, CFL will be something many users will never have heard about.

That’s an entirely different data visual. The point of the visual is to assess how 100% of the energy will be spent building answers. You can’t have 170% in a diagram like this. As one behavior increases, another decreases. This visual depicts what share of 100% will be performed by (x) or (y). There’s a line missing that @Eric_Koleda will want - Packs. A portion of this 100% is clearly occupied by makers who say - I need something neither can provide, and off they go.

1 Like