I wrote a blog post about this in the past. It's really fun going through the od...

kmill · on Aug 24, 2017

The Python interpreter compiles programs into bytecode first, and the bytecode includes instructions that load a constant from the constant pool. As an example,

    >>> def f():
    ...     x = 2222; y=2222
    ...     return x is y
    ...
    >>> f()
    True
    >>> f.func_code.co_consts
    (None, 2222)

This last line is showing the constant pool, which is just a standard Python tuple.

I believe the REPL compiles each input in a new toplevel module context, so each input gets its own constant pool.

Functions get their own constant pools, which explains the following behavior:

    >>> if True:
    ...     def f():return 2222
    ...     def g():return 2222
    ...
    >>> f() is g()
    False

katee · on Aug 25, 2017

I'm glad you and squeaky-clean wrote these comments. When I was experimenting in the Python REPL, I was confused by the last line here:

    >>> 100 is 100
    True
    >>> (10 ** 2) is (10 ** 2)
    True
    >>> (10 ** 3) is (10 ** 3)
    False
    >>> 1000 is 1000
    True

I used the disassembler, but I completely missed that although `1000 is 1000` and `(10 3) is (10 3)` both get optimized to nearly identical bytecode they load different constants. I wrote it up in a new post and thanked you both. https://kate.io/blog/2017/08/24/python-constants-in-bytecode...

squeaky-clean · on Aug 24, 2017

Yep yep, and only when no operations are done to it, otherwise it will end up storing the same constant multiple times and referring to each separately. For example

    >>> def f():
    ...   x = 2221+1; y=2221+1
    ...   return x is y
    >>> f()
    False
    >>> f.__code__.co_consts
    (None, 2221, 1, 2222, 2222)
    >>> dis.dis(f)
      2           0 LOAD_CONST               3 (2222)
                  2 STORE_FAST               0 (x)
                  4 LOAD_CONST               4 (2222)
                  6 STORE_FAST               1 (y)
    
      3           8 LOAD_FAST                0 (x)
                 10 LOAD_FAST                1 (y)
                 12 COMPARE_OP               8 (is)
                 14 RETURN_VALUE

Not too sure when stuff like this would be useful (it goes way beyond the 'avoid using is unless checking references` advice), but it sure is fun.

audiometry · on Aug 24, 2017

So why is this happening? The blog post also shows it, but doesn't explain why (first example)

squeaky-clean · on Aug 25, 2017

std_throwaway got the reasoning right in higher comment. You're actually doing a comparison if the object references are identical, not if the values are identical.

The tricky bit is that everything is an object in Python, even "primitives" like integers, which can be surprising coming from some other languages. Then the trickier bit is that CPython (the most popular and standard implementation) on startup creates an object for every int from -5 to 256 and will re-use those instead of creating a new PyLongObject. This means that `256 is 256` works, because under the hood, it's short-circuting it to the same object. But `257 is 257` creates 2 different PyLongObjects, which have the same value, 257, but aren't technically the same object.

And in my above example, the reason `x = 257; x is 257` is True when on the same line, but False when on separate lines (in the REPL) is because of the reasons kmill says, the constants pool that the interpreter creates. If they're interpreted together, it will create and use the same constant.

Here is my entry on my company's blog about this. I won't guarantee you it's well written, but it does show bytecode and has links to the relevant C source lines at the very end.

https://www.everymundo.com/literals-other-number-oddities-py...

audiometry · on Aug 25, 2017

thanks. that makes good sense.