Exactly. Falcon-180b had a lot of hype at first but the community soon realized ...

Exactly. Falcon-180b had a lot of hype at first but the community soon realized it was nearly worthless. Easily outperformed by smaller LLMs in the general case.

Now they are back and claiming their falcon-11b LLM outperforms Llama 3 8b. I already see a number of issues with this:

- falcon-11b is like 40% larger than Llama 3 8b so how can you compare them when they aren't in the same size class

- their claim seems to be based on automated benchmarks when it has long been clear that automated benchmarks are not enough to make that claim

- some of their automated benchmarks are wildly lower than Llama 3 8b's scores. It only beats Llama 3 8b on one benchmark and just barely. I can make an LLM does the best anyone has ever seen on one benchmark, but that doesn't mean my LLM is good. Far from it

- clickbait headline with knowingly premature claims because there has been zero human evaluation testing

- they claim their LLM is better than Llama 3 but completely ignore Llama 3 70b

Honestly, it annoys me how much attention tiiuae get when they haven't produced anything useful and continue this misleading clickbait.