2025 IOPCC

This year, orthoplex and I decided to try to submit something for the 3rd International Obfuscated Python Code Contest. You can see our submission here: quine.py. I'll talk about how we made it!

Open this to see our submission (hidden by default for brevity)
#coding:1026
#%Mók]~hì~MM]nM]]k%mkDkբkGkmkÒ~}ÃբÒ}%E~}l}ND%G~}l}NG%с~NMM]~~M]]%т~сNс%у~т\т%ф~у\т%х~ф\т%ц~х\т%ч~ц\т%ш~ч\т%щ~ш\т%ё~щ\т%ђ~ё\т%ѓ~ђ\т%є~ѓ\т%ѕ~є\т%і~ѕ\т%ї~і\т%ј~ї\т%љ~ј\т%ѡ~љ\т%Ѣ~ѡ\т%[~Y~NсNтNцkNсNтNцNчkNсNтNуNфNцNчkNуNцNчkNсNфNцNчkNтNуNфNцNчkNсNтNуNцNчkNтNфNхNцkNсNхNцkNхNцkNтNхNцkNтNуNхNцkNтNфkNсNтNцkNсNуNцkNсNуNцkNсNуNфNчkNсNтNхNцNчNшkNсNтNфNцNчkNсNуNфNхNчkNтNуNфNхNцNчkNфNцNчkNуNфNцNчNшkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNчkNсNуNфNхNчkNтNуNфNцNчkNсNуNфNчkNсNуNфNхNчkNсNуNфNхNчkNсNтNфNцNчkNсNуNцkNсNуNцkNсNуNфNцNчkNсNтNфNцNчkNуNчkNсNтNфNцNчkNтNцNчNщNђkNсNтNфNцNчkNсNтNуNчkNсNтNфNцNчkNсNуNфNцNчkNсNтNфNцNчkNтNхNчNшkNтNуNфNхNцNчkNсNуNфNхNцNчkNсNтNчNшkNтNцNчNщNђkNтNхNчNшkNсNуNфNхNцNчkNсNуNцkNсNуNцkNсNуNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNуNфNцNчkNсNуNфNхNцNчkNтNуNфNчkNуNчkNсNуNцkNсNуNцkNсNтNуNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNуNфNцNчkNсNуNфNхNцNчkNтNуNфNчkNсNтNуNчkNсNуNцkNсNуNцkNсNчNђkNтNуNфNхNцNчkNтNуNфNчkNсNуNфNчkNсNуNфNчkNсNуNфNхNчkNтNуNфNхNцNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNхNчkNсNуNфNхNчkNсNуNцkNсNуNцkNтNчNђkNтNуNфNхNцNчkNсNчNђkNтNуNфNчkNсNчNђkNсNуNцkNсNуNцkNсNтNчNђkNтNуNфNхNцNчkNтNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNуNчNђkNтNуNфNхNцNчkNсNтNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNуNчNђkNтNуNфNхNцNчkNуNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNтNуNчNђkNтNуNфNхNцNчkNсNуNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNтNуNчNђkNтNуNфNхNцNчkNтNуNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNфNчNђkNтNуNфNхNцNчkNсNтNуNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNфNчNђkNтNуNфNхNцNчkNфNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNхNчNђkNтNуNфNхNцNчkNсNфNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNтNхNчNђkNтNуNфNхNцNчkNсNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNтNхNчNђkNтNуNфNхNцNчkNтNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNуNхNчNђkNтNуNфNхNцNчkNсNтNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNуNхNчNђkNтNуNфNхNцNчkNуNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNтNуNхNчNђkNтNуNфNхNцNчkNсNуNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNтNуNхNчNђkNтNуNфNхNцNчkNтNуNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNфNхNчNђkNтNуNфNхNцNчkNсNтNуNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNфNхNчNђkNтNуNфNхNцNчkNфNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNцNчNђkNтNуNфNхNцNчkNсNфNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNтNцNчNђkNтNуNфNхNцNчkNсNцNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNтNфNхNчkNтNуNфNхNцNчkNсNфNхNчkNтNуNфNхNцNчkNсNуNцkNсNтNхNцNчkNсNуNцkNсNуNцkNсNчNшNђkNтNуNфNхNцNчkNтNуNфNчkNтNуNфNчkNтNуNфNчkNтNуNфNчkNтNуNфNчkNсNтNчNђkNтNуNфNчkNсNхNчNђkNсNуNцkNсNуNцkNсNтNфNхNцNчkNтNуNфNхNцNчkNтNуNфNчkNсNчNђkNтNуNфNчkNсNтNуNчNђkNтNуNфNчkNтNхNчNђkNсNуNцkNсNуNцkNуNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNчkNсNуNфNхNчkNтNуNфNхNцNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNхNчkNсNуNфNхNчkNсNуNцkNсNуNцkNтNуNчkNтNуNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNсNуNфNхNцNчkNсNуNцkNсNуNцkNфNцNчkNуNфNцNчNёkNтNуNфNхNцNчkNсNчNшNђkNсNтNфNцNчkNсNуNцkNсNуNцkNуNфNхNчkNсNуNфNчkNсNуNфNчkNсNуNфNчkNтNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNфNхNчkNфNцNчkNуNфNцNчNшkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNтNфNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNфNхNчkNфNцNчkNуNчkNтNфNхNцNчkNуNфNцNчNёkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNтNуNчkNтNуNчkNтNфNхNцNчkNтNуNфNхNцNчkNтNуNчkNтNуNчkNтNуNфNчkNсNуNчkNуNфNцNчkNтNчkNсNуNфNхNчkNсNуNфNхNчkNуNфNхNчkNсNфNхNшNщNђNѕNіNѡkNфNцNчkNсNуNфNцNчkNсNуNхNчNёNіNїNљNѡNѢkNсNуNфNхNцNчkNсNтNуNфNцNчkNсNуNфNхNцNчkNуNфNхNчkNсNчNшNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNцkNсNуNцkNсNтNфNхNчkNтNуNфNхNцNчkNсNфNхNчkNсNуNцkNсNуNцkNтNуNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNсNуNфNхNцNчkNсNуNцkNсNуNцkNуNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNчkNсNуNфNхNчkNтNуNфNхNцNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNхNчkNсNуNфNхNчkNсNуNцkNсNуNцkNуNфNхNчkNсNуNфNчkNсNуNфNчkNсNуNфNчkNтNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNфNхNчkNфNцNчkNуNфNцNчNшkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNтNфNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNфNхNчkNфNцNчkNуNчkNтNфNхNцNчkNуNфNцNчNёkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNсNуNфNхNцNчkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNтNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNчNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNтNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNтNчNђkNцNчkNтNчNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNуNфNхNчkNсNуNфNчkNсNуNфNчkNсNуNфNчkNсNуNфNчkNсNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNхNчkNтNуNфNчkNсNуNфNчkNтNцNчNщNђkNтNуNфNчkNсNуNчkNуNфNцNчkNсNуNфNчkNсNтNфNхNцNчkNтNуNфNчkNсNтNхNчkNтNуNфNчkNсNтNхNчkNсNцNчkNсNцNчkNсNуNфNчkNсNчNђkNтNуNфNчkNуNчNђkNсNуNфNхNчkNуNфNхNчkNсNуNфNчkNсNчNђkNтNуNфNчkNтNчNђkNтNуNфNчkNсNтNчNђkNсNуNфNхNчkNсNуNфNхNчkNсNуNфNхNчkNуNфNхNчkNсNуNфNчkNтNхNчkNхNчkNтNчkNтNуNфNцNчkNсNтNхNцNчNшkNсNуNфNхNчkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNтNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNтNхNчkNуNфNхNчkNтNчNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNтNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNхNчkNтNуNфNчkNсNчNђkNсNуNфNхNчkNсNуNфNхNчkNсNуNфNхNчkNуNфNхNчkNсNфNхNшNщNђNѕNіNѡkNфNцNчkNсNуNфNцNчkNсNуNхNчNёNіNїNљNѡNѢkNсNуNфNхNцNчkNсNтNуNфNцNчkNсNуNфNхNцNчkNуNфNхNчkNтNуNчNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNтNуNчkNтNфNхNцNчkNтNуNфNхNцNчkNтNуNчkNтNуNфNчkNсNхNчkNтNуNфNчkNтNхNчNшkNсNуNфNхNчkNсNуNфNхNчkNуNфNхNчkNсNфNхNшNщNђNѕNіNѡkNфNцNчkNсNуNфNцNчkNсNуNхNчNёNіNїNљNѡNѢkNсNуNфNхNцNчkNсNтNуNфNцNчkNсNуNфNхNцNчkNуNфNхNчkNсNчNшNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNцkNсNуNцkNфNцNчkNуNфNцNчNшkNтNуNфNхNцNчkNтNуNчkNтNуNчkNуNфNцNчkNтNуNчkNсNтNфNцNчkNсNуNцkNсNуNцkNфNчkNсNуNфNцNчkNтNфNхNцNчkNуNфNчNшkNфNцNчkNуNфNцNчNшk%Ӂ~NNNNNуNё%{~NсNчNђ%D~MM]~~M]]%FF~}}%hɬ~Ӂk%\MMMBz~[hì]kM[z~[hDzɬ]kMFFz~FFNElB]]\񆖙hm󬉕}o}\Ӂ]k%[~Y%F~}}%D~MM]~~M]]%\MMMBz~[hì]kM[z~[hDzɬ]kMQz~}}]kMRz~с]kMSz~т`т]kM\MMMMQz~QNMբNElM{NSNSaaMсNф]\MсNтNу]]]\MRPBnó]]kMRz~R\т]kMSz~SNс]]]\񆖙hm󬉕}o}\ц]k]kMFz~FNQNÒ]]\񆖙hm󬉕}o}\Ӂ]k%hì~FFlFk%HmzÌhì

Note that the entire program is just two lines of code, each a comment. Despite this, running the above python results in the following output:

Traceback (most recent call last):
  File "submission.py", line 38, in <module>
    {_:C}[C]
    ~~~~~^^^
KeyError: '#coding:1026\n#%Mók]~hì ... zÌhì'

It's a quine! The KeyError contains the entire contents of the file, albeit escaped. In this post I'll talk about how this little snippet of code works.

Codecs

A quick background on what the coding:1026 declaration actually means: Briefly, python supports declaring the encoding of source files. You may have seen this like this:

# -*- coding: UTF-8 -*-

or like this:

# vim:fileencoding=UTF-8

There are a multitude of supported codecs. Most of these coincide with ascii, making them somewhat unhelpful for obfuscation. There are however a set of code page encodings that remap most ascii characters, making them ideal for our purposes. coding:1026 is what we used in our final submission, which is a shorthand for the IBM1026 encoding.

Python

So how do we code in this new variant of python? It's really easy actually, we just need to make sure our source code decodes to regular python.

#coding:1026
{code.encode('1026').decode('latin1')}
            

The problem is that the vast majority of the time this is not valid utf8. We decided to restrict ourselves to valid utf8 source code as a challenge and for much better portability. This is the main challenge of a program: writing valid utf-8 that transforms into valid python when decoded with a custom codec.

With the utf8 restriction, the format of our programs is highly limited. While we still have access to most operators, there is no way to construct a valid utf8 string that will create more than 3 lowercase letters in a row, meaning we cannot access any builtins. Bruteforcing all unicode characters, we get something like:

...
A}
0]]]
0]]a
0]]|
0]]}
0]a]
0]aa
0]a|
0]a}
0]|]
0]|a
0]|}
0]}]
0]}a
0]}|
0]}}
0a]]
0a]a
0a]|
0a]}
0aa]
0aaa
0aa|
0aa}
0a|]
0a|a
0a|}
0a}]
0a}a
0a}|
0a}}
...

Notably there was no way to get numbers by themselves (they had to be part of these multi-character combinations, as the bytes they were part of were only valid in multi-byte unicode). For example the character 0 maps to \xf0, which unhelpfully can only appear as byte one of a four byte unicode character like 𬌬. Likewise, all lowercase letters can only appear as part of multibyte character, which makes writing any useful code difficult.

Quine

Writing quines is actually not too bad once you know the trick. One such method is with the following general form:


data = {encoded data}

print(decode(data).format(representation(data)))
            

where {encoded data} would be some data structure containing everything but with a slot to put the final data array in.

A regular python quine written in this form looks like

data = [100, 97, 116, 97, 32, 61, 32, 37, 115, 10, 112, 114, 105, 110, 116, 40, 98, 121, 116, 101, 115, 40, 100, 97, 116, 97, 41, 46, 100, 101, 99, 111, 100, 101, 40, 41, 32, 37, 37, 32, 114, 101, 112, 114, 40, 100, 97, 116, 97, 41, 41]
print(bytes(data).decode() % repr(data))

In this case, the data variable stores the bytes of 'data = %s\nprint(bytes(data).decode() %% repr(data))'.

For our purposes, we need to come up our own decode and representation functions, as well as a way to store the data array as there is no way to write numbers directly.

data

Although we can't write numbers directly in the source code, getting them is easy enough. +(()==()) is equal to 1. Then we can make a series of variables for each power of two that we need.

Ja=+(()==())
Jb=Ja+Ja
Jc=Jb*Jb
Jd=Jc*Jb
Je=Jd*Jb
Jf=Je*Jb
Jg=Jf*Jb
Jh=Jg*Jb
Ji=Jh*Jb
Jj=Ji*Jb
Jk=Jj*Jb
...

The variable names are intentional. Importantly, each variable name is predictable, simplifying the representation function we'll have to write later greatly. We then store each number as the sum of binary digits, resulting in code like

İ=ß=+Ja+Jb+Jf,+Ja+Jb+Jf+Jg,+Ja+Jb+Jc+Jd+Jf+Jg,+Jc+Jf+Jg,+Ja+Jd+Jf+Jg,+Jb+Jc+Jd+Jf+Jg,+Ja+Jb+Jc+Jf+Jg,+Jb+Jd+Je+Jf,+Ja+Je+Jf,+Je+Jf,+Jb+Je+Jf,+Jb+Jc+Je+Jf,+Jb+Jd,+Ja+Jb+Jf,+Ja+Jc+Jf,+Ja+Jc+Jf, ...

The stored data makes up the majority of the submission and is visible as the chunk of cyrillic letters and N's (mapping to the variable names and the plus character + respectively).

decode

The only way to loop over the whole data array is through the use of list comprehensions. Miraculously, we can use both the character combinations 1for (corresponding to the currently unassigned U+46599) and 3]in (corresponding to the even more unassigned U+ec255). It's a stroke of luck that there is any way at all to write for and in under our restrictions.

The actual code to decode is not too complicated. We use the %c formatting operator to convert each number to the corresponding character.

*(((â:=İ[C]),(İ:=İ[à:I]),(ãã:=ãã+á%â))*1for[_3]in'?'*La),

This is essentially equivalent to the following code:

for i in range(len(data)):
    result += '%c' % data[0]
    data = data[1:]

We're forced to modify data on each iteration since there's no easy way to index into our array in an index that is not zero.

representation

representation is similar to decode, we'll still need to loop over the whole data array again, one character at a time. But this time the loop body is more complex as we need to write out each character using the powers of two we created earlier.

*(((â:=İ[C]),(İ:=İ[à:I]),(é:=''),(ê:=Ja),(ë:=Jb-Jb),(*((((é:=é+(Ns+á%(Ö+ë+ë//(Ja+Jd)*(Ja+Jb+Jc)))*(ê&â>C·)),(ê:=ê*Jb),(ë:=ë+Ja)))*1for[_3]in'?'*Jf),),(ã:=ã+é+Ck))*1for[_3]in'?'*La),

This code isn't too bad, we just need two nested loops now. The inner loop loops 32 times, with the ê variable keeping track of the current bit. We check if a bit is present by using the bitwise and operation. Outputting the name of the variables is a little tricky: for example Ja maps to 'с' (U+0441 CYRILLIC SMALL LETTER ES). We choose the variables strategically to make the formula as simple as possible, the nth power of two maps to character code 1089 + n + n // 9 * 7 (that is, nearly linear but we need to skip two characters partway through). Other than that, the algorithm is straightforward: for each bit we need to prepend a +, then also separate each character with a comma. This code will then serialize the list the same way as in the source code!

This code is roughly equivalent to the following

result = ""

for i in range(len(data)):
    charcode = data[0]
    character = ""

    for bit in range(32):
        if charcode & (1 << bit):
            character += "N" + chr(1089 + bit + bit // 9 * 7)

    result += character + "k"
    data = data[1:]

Remember that since we need to output the source file, our representation function has to encode back into the 1026 codec. So when we mean a + to add the bits, that actually must appear as N, and a comma will need to be output as a k.

Final

Putting it all together, we get the following deobfuscated source code:

(C·,)=[C]=(()>()),
_,à,Ns,å,_,Ck='CcNsCk'
á='%'+à
å='%'+å
Ja=+(()==())
Jb=Ja+Ja
Jc=Jb*Jb
Jd=Jc*Jb
Je=Jd*Jb
Jf=Je*Jb
Jg=Jf*Jb
Jh=Jg*Jb
Ji=Jh*Jb
Jj=Ji*Jb
Jk=Jj*Jb
Jl=Jk*Jb
Jm=Jl*Jb
Jn=Jm*Jb
Jo=Jn*Jb
Jp=Jo*Jb
Jq=Jp*Jb
Jr=Jq*Jb
Jö=Jr*Jb
Js=Jö*Jb
İ=ß=~~~
La=+++++Jc+Jj
Ö=+Ja+Jg+Jk
à=(()==())
ãã=''
[I]=La,
*(((â:=İ[C]),(İ:=İ[à:I]),(ãã:=ãã+á%â))*1for[_3]in'?'*La),
İ=ß
ã=''
à=(()==())
*(((â:=İ[C]),(İ:=İ[à:I]),(é:=''),(ê:=Ja),(ë:=Jb-Jb),(*((((é:=é+(Ns+á%(Ö+ë+ë//(Ja+Jd)*(Ja+Jb+Jc)))*(ê&â>C·)),(ê:=ê*Jb),(ë:=ë+Ja)))*1for[_3]in'?'*Jf),),(ã:=ã+é+Ck))*1for[_3]in'?'*La),
[C]=ãã%ã,
{_:C}[C]

Then replacing ~~~ with the data array, changing the Ö variable to the right length, and finally encoding the code in the correct way yields the final submission.

Further work

We experimented with making the original source code also valid python, so both python3 and python3 -x would work. An example is this brilliant snippet written by orthoplex:

#coding: 1026
{}['utf8']%[[{{kçkÅkçkÂkçkÃkçkÄkçkÉ@~M}@ÅÂÃÄÉ}@lM]]^hì@~ÅNÂNÃNÄNÉNÃk^HçzÌhì