Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for the correction. They are currently using FP16 for inference according to OpenRouter. I had thought that implied that they could not use FP8 given the pressure that they have to use as little memory as possible from being solely reliant on SRAM. I wonder why they opted to use FP16 instead of FP8.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: