Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This chatbot has several C compilers in its training data. How is this possibly a useful benchmark for anything? LLMs routinely output code verbatim or modulo trivial changes as their own (very useful for license-laundering too).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: