Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The approach here is interesting in that it answers a question a lot of people have been asking: “what happens if we pipe a binary into a trained LLM and ask it to decompile it?” The answer is that it doesn’t really work at all right now! This is a surprising result because the design of the paper kind of doesn’t allow for any other conclusion to be drawn. Notably, if the LLM did a really good job in the evaluation they designed it would still be unclear whether it was actually useful, because the test “does it compile and pass a few test cases” is not actually a very good way to test a decompiler.

A couple people here have suggested that the generated decompilation should match the source code exactly, which is a challenging thing to achieve and still hotly debated on whether it is a good metric or not. But the results here show that we’re starting to barely get past the “does it produce code” stage and move towards “does it produce code that looks vaguely correct” status but we’re definitely not there yet. Future steps of “is this a useful tool to drive decompilation” and “does this do better than state of the art” and “is this perfect at decompiling things” are still a long ways away. So it’s good to look at as a negative result as this area continues to attract new interest.



Thanks! Our initial experiments indicate that for simple cases, such as short snippets (tens of lines) of code without external dependencies, the LLM can decompile very well. However, for more complicated examples, it tends to offer speculative solutions, and the utility of these results is challenging to assess. The determination of whether the decompiled output is correct or useful is subjective and lacks a universal standard. One approach we're considering is utilizing GPT-4 as a benchmark to evaluate other models' performance. We're open to further suggestions to refine our evaluation methods.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: