Hey, I am working on my own LLM-based decompiler for Python bytecode (https://github.com/kukas/deepcompyle). I feel there are not many people working on this research direction but I think it could be quite interesting, especially now that longer attention contexts are becoming feasible. If anyone knows a team that is working on this, I would be quite interested in cooperation.
Is there a benefit from using an LLM for Python byte code? Python byte code is high enough level that it's possible to translate it directly to source code from my experience.
My motivation is that the existing decompilers work only for Python versions till ~3.8. Having a model that could be finetuned with every new Python version release might overcome the need for highly specialized programmer that is able to update the decompiler to be compatible with the new version.
It is also a toy example for me to set up a working pipeline and then try to decompile more interesting targets.
Why Python? First, python is a language with a large open-source library. Second, I do not think it is used for software that is distributed as binaries?