Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The mapping from binary to source code.


Even ignoring all sources of irreproducibility, there does not exist a bijection between source and binary artifact irrespective of tool chain. Two different toolchains could compile the same source to different binaries or different sources to the same binary. And you absolutely shouldn't be ignoring sources of irreproducibility in this context, since they'll cause even the same toolchain to keep producing different binaries given the same source.


Exactly, but neither the source nor the binary is what's truly important here. The real question is: can the LLM generate the functionally valid source equivalent of the binary at hand? If I disassemble Microsoft Paint, can I get code that will result in a mostly functional version of Microsoft Paint, or will I just get 515 compile errors instead?


This is what I thought the question was really about.

I assume that an llm will simply see patterns that look similar to other patterns and make assosciations and assume ewuivalences on that level, meanwhile real code is full of things where the programmer, especially assembly programmers, modify something by a single instruction or offset value etc to get a very specific and functionally important result.

Often the result is code that not only isn't obvious, it's nominaly flatly wrong, violating standards, specs, intended function, datasheet docs, etc. If all you knew were the rules written in the docs, the code is broken and invalid.

Is the llm really going to see or understand the intent of that?

They find matching patterns in other existing stuff, and to the user who can not see the infinite body of that other stuff the llm pulled from, it looks like the llm understood the intent of a question, but I say it just found the prior work of some human who understood a similar intent somewhere else.

Maybe an llm or some other flavor of ai can operate some other way like actually playing out the binary like executing in a debugger and map out the results not just look at the code as fuzzy matching patterns. Can that take the place of understanding the intents the way a human would reading the decompiled assembly?

Guess we'll be finding out sooner of later since of course it will all be tried.


The question was about the reverse mapping.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: