2025 IOPCC
This year, orthoplex and I decided to try to submit something for the 3rd International Obfuscated Python Code Contest. You can see our submission here: quine.py. I'll talk about how we made it!
Open this to see our submission (hidden by default for brevity)
#coding:1026 #%Mók]~hì~MM]nM]]k%mkDkբkGkmkÒ~}ÃբÒ}%E~}l}ND%G~}l}NG%с~NMM]~~M]]%т~сNс%у~т\т%ф~у\т%х~ф\т%ц~х\т%ч~ц\т%ш~ч\т%щ~ш\т%ё~щ\т%ђ~ё\т%ѓ~ђ\т%є~ѓ\т%ѕ~є\т%і~ѕ\т%ї~і\т%ј~ї\т%љ~ј\т%ѡ~љ\т%Ѣ~ѡ\т%[~Y~NсNтNцkNсNтNцNчkNсNтNуNфNцNчkNуNцNчkNсNфNцNчkNтNуNфNцNчkNсNтNуNцNчkNтNфNхNцkNсNхNцkNхNцkNтNхNцkNтNуNхNцkNтNфkNсNтNцkNсNуNцkNсNуNцkNсNуNфNчkNсNтNхNцNчNшkNсNтNфNцNчkNсNуNфNхNчkNтNуNфNхNцNчkNфNцNчkNуNфNцNчNшkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNчkNсNуNфNхNчkNтNуNфNцNчkNсNуNфNчkNсNуNфNхNчkNсNуNфNхNчkNсNтNфNцNчkNсNуNцkNсNуNцkNсNуNфNцNчkNсNтNфNцNчkNуNчkNсNтNфNцNчkNтNцNчNщNђkNсNтNфNцNчkNсNтNуNчkNсNтNфNцNчkNсNуNфNцNчkNсNтNфNцNчkNтNхNчNшkNтNуNфNхNцNчkNсNуNфNхNцNчkNсNтNчNшkNтNцNчNщNђkNтNхNчNшkNсNуNфNхNцNчkNсNуNцkNсNуNцkNсNуNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNуNфNцNчkNсNуNфNхNцNчkNтNуNфNчkNуNчkNсNуNцkNсNуNцkNсNтNуNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNуNфNцNчkNсNуNфNхNцNчkNтNуNфNчkNсNтNуNчkNсNуNцkNсNуNцkNсNчNђkNтNуNфNхNцNчkNтNуNфNчkNсNуNфNчkNсNуNфNчkNсNуNфNхNчkNтNуNфNхNцNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNхNчkNсNуNфNхNчkNсNуNцkNсNуNцkNтNчNђkNтNуNфNхNцNчkNсNчNђkNтNуNфNчkNсNчNђkNсNуNцkNсNуNцkNсNтNчNђkNтNуNфNхNцNчkNтNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNуNчNђkNтNуNфNхNцNчkNсNтNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNуNчNђkNтNуNфNхNцNчkNуNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNтNуNчNђkNтNуNфNхNцNчkNсNуNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNтNуNчNђkNтNуNфNхNцNчkNтNуNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNфNчNђkNтNуNфNхNцNчkNсNтNуNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNфNчNђkNтNуNфNхNцNчkNфNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNхNчNђkNтNуNфNхNцNчkNсNфNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNтNхNчNђkNтNуNфNхNцNчkNсNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNтNхNчNђkNтNуNфNхNцNчkNтNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNуNхNчNђkNтNуNфNхNцNчkNсNтNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNуNхNчNђkNтNуNфNхNцNчkNуNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNтNуNхNчNђkNтNуNфNхNцNчkNсNуNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNтNуNхNчNђkNтNуNфNхNцNчkNтNуNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNфNхNчNђkNтNуNфNхNцNчkNсNтNуNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNфNхNчNђkNтNуNфNхNцNчkNфNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNцNчNђkNтNуNфNхNцNчkNсNфNхNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNтNцNчNђkNтNуNфNхNцNчkNсNцNчNђkNуNфNхNчkNтNчNђkNсNуNцkNсNуNцkNсNтNфNхNчkNтNуNфNхNцNчkNсNфNхNчkNтNуNфNхNцNчkNсNуNцkNсNтNхNцNчkNсNуNцkNсNуNцkNсNчNшNђkNтNуNфNхNцNчkNтNуNфNчkNтNуNфNчkNтNуNфNчkNтNуNфNчkNтNуNфNчkNсNтNчNђkNтNуNфNчkNсNхNчNђkNсNуNцkNсNуNцkNсNтNфNхNцNчkNтNуNфNхNцNчkNтNуNфNчkNсNчNђkNтNуNфNчkNсNтNуNчNђkNтNуNфNчkNтNхNчNђkNсNуNцkNсNуNцkNуNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNчkNсNуNфNхNчkNтNуNфNхNцNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNхNчkNсNуNфNхNчkNсNуNцkNсNуNцkNтNуNчkNтNуNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNсNуNфNхNцNчkNсNуNцkNсNуNцkNфNцNчkNуNфNцNчNёkNтNуNфNхNцNчkNсNчNшNђkNсNтNфNцNчkNсNуNцkNсNуNцkNуNфNхNчkNсNуNфNчkNсNуNфNчkNсNуNфNчkNтNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNфNхNчkNфNцNчkNуNфNцNчNшkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNтNфNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNфNхNчkNфNцNчkNуNчkNтNфNхNцNчkNуNфNцNчNёkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNтNуNчkNтNуNчkNтNфNхNцNчkNтNуNфNхNцNчkNтNуNчkNтNуNчkNтNуNфNчkNсNуNчkNуNфNцNчkNтNчkNсNуNфNхNчkNсNуNфNхNчkNуNфNхNчkNсNфNхNшNщNђNѕNіNѡkNфNцNчkNсNуNфNцNчkNсNуNхNчNёNіNїNљNѡNѢkNсNуNфNхNцNчkNсNтNуNфNцNчkNсNуNфNхNцNчkNуNфNхNчkNсNчNшNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNцkNсNуNцkNсNтNфNхNчkNтNуNфNхNцNчkNсNфNхNчkNсNуNцkNсNуNцkNтNуNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNсNуNфNхNцNчkNсNуNцkNсNуNцkNуNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNчkNсNуNфNхNчkNтNуNфNхNцNчkNтNуNфNхNцNчkNсNуNфNчkNсNуNфNхNчkNсNуNфNхNчkNсNуNцkNсNуNцkNуNфNхNчkNсNуNфNчkNсNуNфNчkNсNуNфNчkNтNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNфNхNчkNфNцNчkNуNфNцNчNшkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNтNфNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNфNхNчkNфNцNчkNуNчkNтNфNхNцNчkNуNфNцNчNёkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNуNфNхNцNчkNсNуNфNхNцNчkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNтNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNчNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNтNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNтNчNђkNцNчkNтNчNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNуNфNхNчkNсNуNфNчkNсNуNфNчkNсNуNфNчkNсNуNфNчkNсNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNхNчkNтNуNфNчkNсNуNфNчkNтNцNчNщNђkNтNуNфNчkNсNуNчkNуNфNцNчkNсNуNфNчkNсNтNфNхNцNчkNтNуNфNчkNсNтNхNчkNтNуNфNчkNсNтNхNчkNсNцNчkNсNцNчkNсNуNфNчkNсNчNђkNтNуNфNчkNуNчNђkNсNуNфNхNчkNуNфNхNчkNсNуNфNчkNсNчNђkNтNуNфNчkNтNчNђkNтNуNфNчkNсNтNчNђkNсNуNфNхNчkNсNуNфNхNчkNсNуNфNхNчkNуNфNхNчkNсNуNфNчkNтNхNчkNхNчkNтNчkNтNуNфNцNчkNсNтNхNцNчNшkNсNуNфNхNчkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNтNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNтNхNчkNуNфNхNчkNтNчNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNсNтNхNчkNтNфNхNцNчkNтNуNфNхNцNчkNсNтNхNчkNтNуNфNчkNсNчNђkNсNуNфNхNчkNсNуNфNхNчkNсNуNфNхNчkNуNфNхNчkNсNфNхNшNщNђNѕNіNѡkNфNцNчkNсNуNфNцNчkNсNуNхNчNёNіNїNљNѡNѢkNсNуNфNхNцNчkNсNтNуNфNцNчkNсNуNфNхNцNчkNуNфNхNчkNтNуNчNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNхNчkNсNтNфNцNчkNсNуNфNчkNтNуNчkNтNфNхNцNчkNтNуNфNхNцNчkNтNуNчkNтNуNфNчkNсNхNчkNтNуNфNчkNтNхNчNшkNсNуNфNхNчkNсNуNфNхNчkNуNфNхNчkNсNфNхNшNщNђNѕNіNѡkNфNцNчkNсNуNфNцNчkNсNуNхNчNёNіNїNљNѡNѢkNсNуNфNхNцNчkNсNтNуNфNцNчkNсNуNфNхNцNчkNуNфNхNчkNсNчNшNђkNсNуNфNхNчkNсNтNфNцNчkNсNуNцkNсNуNцkNфNцNчkNуNфNцNчNшkNтNуNфNхNцNчkNтNуNчkNтNуNчkNуNфNцNчkNтNуNчkNсNтNфNцNчkNсNуNцkNсNуNцkNфNчkNсNуNфNцNчkNтNфNхNцNчkNуNфNчNшkNфNцNчkNуNфNцNчNшk%Ӂ~NNNNNуNё%{~NсNчNђ%D~MM]~~M]]%FF~}}%hɬ~Ӂk%\MMMBz~[hì]kM[z~[hDzɬ]kMFFz~FFNElB]]\hm}o}\Ӂ]k%[~Y%F~}}%D~MM]~~M]]%\MMMBz~[hì]kM[z~[hDzɬ]kMQz~}}]kMRz~с]kMSz~т`т]kM\MMMMQz~QNMբNElM{NSNSaaMсNф]\MсNтNу]]]\MRPBnó]]kMRz~R\т]kMSz~SNс]]]\hm}o}\ц]k]kMFz~FNQNÒ]]\hm}o}\Ӂ]k%hì~FFlFk%HmzÌhì
Note that the entire program is just two lines of code, each a comment. Despite this, running the above python results in the following output:
Traceback (most recent call last):
File "submission.py", line 38, in <module>
{_:C}[C]
~~~~~^^^
KeyError: '#coding:1026\n#%Mók]~hì ... zÌhì'
It's a quine! The KeyError contains the entire contents of the file,
albeit escaped. In this post I'll talk about how this little snippet of code works.
Codecs
A quick background on what the coding:1026 declaration actually means:
Briefly, python supports
declaring the encoding
of source files. You may have seen this like this:
# -*- coding: UTF-8 -*-
or like this:
# vim:fileencoding=UTF-8
There are a multitude of supported codecs. Most of these coincide with ascii, making
them somewhat unhelpful for obfuscation. There are however a set of code page
encodings that remap most ascii characters, making them ideal for our purposes.
coding:1026 is what we used in our final submission, which is a
shorthand for the
IBM1026 encoding.
Python
So how do we code in this new variant of python? It's really easy actually, we just need to make sure our source code decodes to regular python.
#coding:1026
{code.encode('1026').decode('latin1')}
The problem is that the vast majority of the time this is not valid utf8. We decided to restrict ourselves to valid utf8 source code as a challenge and for much better portability. This is the main challenge of a program: writing valid utf-8 that transforms into valid python when decoded with a custom codec.
With the utf8 restriction, the format of our programs is highly limited. While we still have access to most operators, there is no way to construct a valid utf8 string that will create more than 3 lowercase letters in a row, meaning we cannot access any builtins. Bruteforcing all unicode characters, we get something like:
...
A}
0]]]
0]]a
0]]|
0]]}
0]a]
0]aa
0]a|
0]a}
0]|]
0]|a
0]|}
0]}]
0]}a
0]}|
0]}}
0a]]
0a]a
0a]|
0a]}
0aa]
0aaa
0aa|
0aa}
0a|]
0a|a
0a|}
0a}]
0a}a
0a}|
0a}}
...
Notably there was no way to get numbers by themselves (they had to be part of these
multi-character combinations, as the bytes they were part of were only valid in
multi-byte unicode). For example the character 0 maps to \xf0, which
unhelpfully can only appear as byte one of a four byte unicode character like 𬌬.
Likewise, all lowercase letters can only appear as part of multibyte character,
which makes writing any useful code difficult.
Quine
Writing quines is actually not too bad once you know the trick. One such method is with the following general form:
data = {encoded data}
print(decode(data).format(representation(data)))
where {encoded data} would be some data structure containing everything
but with a slot to put the final data array in.
A regular python quine written in this form looks like
data = [100, 97, 116, 97, 32, 61, 32, 37, 115, 10, 112, 114, 105, 110, 116, 40, 98, 121, 116, 101, 115, 40, 100, 97, 116, 97, 41, 46, 100, 101, 99, 111, 100, 101, 40, 41, 32, 37, 37, 32, 114, 101, 112, 114, 40, 100, 97, 116, 97, 41, 41]
print(bytes(data).decode() % repr(data))
In this case, the data variable stores the bytes of
'data = %s\nprint(bytes(data).decode() %% repr(data))'.
For our purposes, we need to come up our own decode and
representation functions, as well as a way to store the data array as
there is no way to write numbers directly.
data
Although we can't write numbers directly in the source code, getting them is easy
enough. +(()==()) is equal to 1. Then we can make a series
of variables for each power of two that we need.
Ja=+(()==())
Jb=Ja+Ja
Jc=Jb*Jb
Jd=Jc*Jb
Je=Jd*Jb
Jf=Je*Jb
Jg=Jf*Jb
Jh=Jg*Jb
Ji=Jh*Jb
Jj=Ji*Jb
Jk=Jj*Jb
...
The variable names are intentional. Importantly, each variable name is predictable,
simplifying the representation function we'll have to write later
greatly. We then store each number as the sum of binary digits, resulting in code
like
İ=ß=+Ja+Jb+Jf,+Ja+Jb+Jf+Jg,+Ja+Jb+Jc+Jd+Jf+Jg,+Jc+Jf+Jg,+Ja+Jd+Jf+Jg,+Jb+Jc+Jd+Jf+Jg,+Ja+Jb+Jc+Jf+Jg,+Jb+Jd+Je+Jf,+Ja+Je+Jf,+Je+Jf,+Jb+Je+Jf,+Jb+Jc+Je+Jf,+Jb+Jd,+Ja+Jb+Jf,+Ja+Jc+Jf,+Ja+Jc+Jf, ...
The stored data makes up the majority of the submission and is visible as the chunk
of cyrillic letters and N's (mapping to the variable names and the plus character
+ respectively).
decode
The only way to loop over the whole data array is through the use of list
comprehensions. Miraculously, we can use both the character combinations
1for (corresponding to the currently unassigned U+46599)
and 3]in (corresponding to the even more unassigned
U+ec255). It's a stroke of luck that there is any way at all to write
for and in under our restrictions.
The actual code to decode is not too complicated. We use the
%c formatting operator to convert each number to the corresponding
character.
*(((â:=İ[C]),(İ:=İ[à:I]),(ãã:=ãã+á%â))*1for[_3]in'?'*La),
This is essentially equivalent to the following code:
for i in range(len(data)):
result += '%c' % data[0]
data = data[1:]
We're forced to modify data on each iteration since there's no easy way
to index into our array in an index that is not zero.
representation
representation is similar to decode, we'll still need to
loop over the whole data array again, one character at a time. But this time the
loop body is more complex as we need to write out each character using the powers of
two we created earlier.
*(((â:=İ[C]),(İ:=İ[à:I]),(é:=''),(ê:=Ja),(ë:=Jb-Jb),(*((((é:=é+(Ns+á%(Ö+ë+ë//(Ja+Jd)*(Ja+Jb+Jc)))*(ê&â>C·)),(ê:=ê*Jb),(ë:=ë+Ja)))*1for[_3]in'?'*Jf),),(ã:=ã+é+Ck))*1for[_3]in'?'*La),
This code isn't too bad, we just need two nested loops now. The inner loop loops 32
times, with the ê variable keeping track of the current bit. We check
if a bit is present by using the bitwise and operation. Outputting the name of the
variables is a little tricky: for example Ja maps to
'с' (U+0441 CYRILLIC SMALL LETTER ES). We choose the variables
strategically to make the formula as simple as possible, the nth power of two maps
to character code 1089 + n + n // 9 * 7 (that is, nearly linear but we
need to skip two characters partway through). Other than that, the algorithm is
straightforward: for each bit we need to prepend a +, then also
separate each character with a comma. This code will then serialize the list the
same way as in the source code!
This code is roughly equivalent to the following
result = ""
for i in range(len(data)):
charcode = data[0]
character = ""
for bit in range(32):
if charcode & (1 << bit):
character += "N" + chr(1089 + bit + bit // 9 * 7)
result += character + "k"
data = data[1:]
Remember that since we need to output the source file, our
representation function has to encode back into the 1026 codec. So when
we mean a + to add the bits, that actually must appear as
N, and a comma will need to be output as a k.
Final
Putting it all together, we get the following deobfuscated source code:
(C·,)=[C]=(()>()),
_,à,Ns,å,_,Ck='CcNsCk'
á='%'+à
å='%'+å
Ja=+(()==())
Jb=Ja+Ja
Jc=Jb*Jb
Jd=Jc*Jb
Je=Jd*Jb
Jf=Je*Jb
Jg=Jf*Jb
Jh=Jg*Jb
Ji=Jh*Jb
Jj=Ji*Jb
Jk=Jj*Jb
Jl=Jk*Jb
Jm=Jl*Jb
Jn=Jm*Jb
Jo=Jn*Jb
Jp=Jo*Jb
Jq=Jp*Jb
Jr=Jq*Jb
Jö=Jr*Jb
Js=Jö*Jb
İ=ß=~~~
La=+++++Jc+Jj
Ö=+Ja+Jg+Jk
à=(()==())
ãã=''
[I]=La,
*(((â:=İ[C]),(İ:=İ[à:I]),(ãã:=ãã+á%â))*1for[_3]in'?'*La),
İ=ß
ã=''
à=(()==())
*(((â:=İ[C]),(İ:=İ[à:I]),(é:=''),(ê:=Ja),(ë:=Jb-Jb),(*((((é:=é+(Ns+á%(Ö+ë+ë//(Ja+Jd)*(Ja+Jb+Jc)))*(ê&â>C·)),(ê:=ê*Jb),(ë:=ë+Ja)))*1for[_3]in'?'*Jf),),(ã:=ã+é+Ck))*1for[_3]in'?'*La),
[C]=ãã%ã,
{_:C}[C]
Then replacing ~~~ with the data array, changing the Ö variable to the
right length, and finally encoding the code in the correct way yields the final
submission.
Further work
We experimented with making the original source code also valid python, so both
python3 and python3 -x would work. An example is this
brilliant snippet written by orthoplex:
#coding: 1026
{}['utf8']%[[{{kçkÅkçkÂkçkÃkçkÄkçkÉ@~M}@ÅÂÃÄÉ}@lM]]^hì@~ÅNÂNÃNÄNÉNÃk^HçzÌhì