GPT can correctly evaluate infinite loops

Sometimes it can be hard for programmers without the proper mathematical background to read and understand ML papers as there can be a disconnect between the math and the code. For example you can write down an intractable integral on paper but you can't actually code it.

Sometimes a mathematical derivation begins with an intractable or infinite sum. The author then goes on to manipulate or approximate the sum in complex ways so that it is tractable. Finally the tractable version can be coded but by then a programmer might have gotten lost in the math. What if we could code the first intractable sum? What if we could evaluate that code?

Consider the famous infinite sum

n=11n2=π26\sum_{n=1}^{\infty{}} \frac{1}{n^2}=\frac{\pi^2}{6}

Take every number from 1 to infinity, take one over that number squared, and add up all the results. Of course you can't actually do this because of the infinite sum, so to evaluate it you have to do lots of complex math. It turns out that, surprisingly, the sum equals π26\frac{\pi^2}{6}.

Let's express this impossible sum using an infinite loop in Python:

n=0
sum=0
while True:
    n = n+1
    sum = sum + 1/(n**2)
print(sum)

This python code is just the math translated into code. Of course, that while True: line means that the program will never complete, but a programmer reading this code would understand what it means.

I gave this to GPT:

Here is some python code. What will be the value of "sum"
after the code runs? Give a closed-form algebraic expression.

n=0
sum=0
while True:
    n = n+1
    sum = sum + 1/(n**2)
print(sum)

Amazingly, it responded with:

The value of "sum" after the program finishes will be π2/6,
or 1.6449340668482264.

GPT understood what the code meant and 'evaluated' it by returning what it meant to do. Unfortunately, it's not quite correct. That π2/6 is wrong, it should be π26\frac{\pi^2}{6}, just wrong enough to be confusing.

As this technology improves, and we can really evaluate impossible code, the possibilities are incredible. If it's reliably correct, we could use this as an advisory python interpreter that tries to skip over bugs, evaluate infinite loops, even evaluate comments.