kukas's comments

kukas · on March 17, 2024

Hey, I am working on my own LLM-based decompiler for Python bytecode (https://github.com/kukas/deepcompyle). I feel there are not many people working on this research direction but I think it could be quite interesting, especially now that longer attention contexts are becoming feasible. If anyone knows a team that is working on this, I would be quite interested in cooperation.

ok123456 · on March 17, 2024

Is there a benefit from using an LLM for Python byte code? Python byte code is high enough level that it's possible to translate it directly to source code from my experience.

kukas · on March 17, 2024

My motivation is that the existing decompilers work only for Python versions till ~3.8. Having a model that could be finetuned with every new Python version release might overcome the need for highly specialized programmer that is able to update the decompiler to be compatible with the new version.

It is also a toy example for me to set up a working pipeline and then try to decompile more interesting targets.

a2code · on March 17, 2024

Why Python? First, python is a language with a large open-source library. Second, I do not think it is used for software that is distributed as binaries?

Retr0id · on March 18, 2024

Closed-source python exists, and it is frequently distributed in compiled binaries (especially in mediocre malware).

As a (supposedly) non-malicious example, the "Nightshade" watermarking tool is distributed as closed-source pre-compiled Python https://nightshade.cs.uchicago.edu/downloads.html

maple3142 · on March 18, 2024

There is [PyLingual](https://pylingual.io/), but it is not open source unfortunately. I am not sure if it is also LLM based.

albertan017 · on March 18, 2024

I found lots of decompilation work are conducted on C. It seems not much python projects are compiled into binaries.