More

mustaphah · 2026-02-17T12:39:24 1771331964

This is like trying to fix hallucination by telling LLM not to hallucinate.

mustaphah · 2026-02-17T08:07:34 1771315654

What if the most interesting finding ends up buried under a vague title? Aside from the "self-generated skills" aspect, there isn't much there that meaningfully warrants deeper discussion.

I chose a title that directly reflects an interesting finding - something that offers substantial insight to the community. I think the rule should be applied with some nuance; in this case, being explicit is a net positive.

I have no interest in linkbait, and I hope that's evident from my previous submissions

xdotli · 2026-02-17T20:16:41 1771359401

Thanks @dang for moderating! This is indeed not our original findings and this is a sub conclusion for an ablation we did to remove the confound of LLMs internal domain knowledge. Thanks for submitting for us @mustaphah here's a little bit more details on how we approach this:

> I would frame the 'post-trajectory generated skills' as feedback-generated skills, so is Letta: https://www.letta.com/blog/skill-learning. We haven't seen existing research or hypothesis debating whether the skills improvement might come from the skill prompt themselves activated knowledge in LLMs that can help itself. So that's why we added an ablation of 'pre-trajectory generated skills' because we have that hypothesis and this seems a very clean way to test it. Also it is very logical that feedback generated skills can help, because it most certainly contain the failure mode of agents on that specific tasks.

mustaphah · 2026-02-17T20:53:21 1771361601

Yeah, I got your point when I read the paper. You're essentially controlling for "latent domain knowledge."

I might have been a bit blunt with the title - sorry about that, but I still think it was a good title. From what I've observed, a lot of Skills on GitHub are just AI-generated without any feedback or deliberative refinement. Many thought those would still be valuable, but you've shown evidence otherwise.

dang · 2026-02-17T18:09:17 1771351757

Yes, I appreciate that, and yes there is room for nuance. But I think you went too far in this case, meaning that the delta between the article title and the submission title was too large. For example, the word "useless" appears nowhere in the article abstract nor in the article body. That's a big delta.

I was starting to type out a longer explanation but I ran out of time - however, I probably would just be repeating things I've said many times before, for example here: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... - perhaps some of that would be helpful.

You're a fine HN contributor and obviously a genuine user and I hope I didn't come across as critical! From our side it's just standard HN moderation practice. The way we deal with titles has been stable for many years. It isn't entirely mechanical, there are many subtleties (back to the nuance thing) but the core rules have served the site really well. THe main thing we want to avoid is having the title field be a mini-genre where whoever makes the submission gets to put their spin on the article.

mustaphah · 2026-02-17T21:09:06 1771362546

I'm fine with editing the title, and I see your point.

In retrospect, I'd probably avoid "useless." While it's a fairly descriptive term for their finding, it probably carries a subjective tone.

mustaphah · 2026-02-12T20:36:49 1770928609

Leaderboard: https://swe-agi.com/

mustaphah · 2026-02-03T15:19:36 1770131976

Location: Baghdad (UTC+3)

Remote: Yes

Willing to relocate: Depends

Technologies: TypeScript/Node, Ruby/Rails, Python, decent frontend (JS, React, Tailwind, ...), microservices, REST APIs, GraphQL, RabbitMQ, Pub/Sub, Redis, Postgres/MySQL, Elastic Stack, Prometheus, Splunk, Kubernetes/Docker, Ansible.

Website: https://hadid.dev

Résumé/CV: https://hadid.dev/resume/

GitHub: https://github.com/mhadidg

Email: career+hn @ [my website domain]

---

Hi! I'm a backend engineer (~8 YOE) with strong backend & DevOps experience and decent frontend skills. Looking for a backend or backend-leaning full-stack role.

I worked at Automattic (US), the company behind WordPress; fully remote, async teams across the globe.

I've been part of teams focused on speed and rapid iteration, and I've also worked on high-quality systems where long-term maintainability and reliability matter the most. I believe this mix of experience has helped me develop a good sense of where each fits best and has enabled me to adapt quickly based on context and requirements.

I've built and maintained time-sensitive, high-throughput, distributed services (millions of ops daily) and owned features and small- to mid-sized projects end-to-end from design to deployment. I do my best working autonomously, and I like to think of myself as a generalist.

Most of my career has been in large enterprises (7+ YOE), but I've also done a fair amount of freelance work (around 1 YOE) for clients.

I have a couple of small open-source projects on GitHub.

mustaphah · 2026-02-01T12:40:32 1769949632

I've seen a couple of power users already switching to Pi [1], and I'm considering that too. The premise is very appealing:

- Minimal, configurable context - including system prompts [2]

- Minimal and extensible tools; for example, todo tasks extension [3]

- No built-in MCP support; extensions exist [4]. I'd rather use mcporter [5]

Full control over context is a high-leverage capability. If you're aware of the many limitations of context on performance (in-context retrieval limits [6], context rot [7], contextual drift [8], etc.), you'd truly appreciate Pi lets you fine-tune the WHOLE context for optimal performance.

It's clearly not for everyone, but I can see how powerful it can be.

---

[1] https://lucumr.pocoo.org/2026/1/31/pi/

[2] https://github.com/badlogic/pi-mono/tree/main/packages/codin...

[3] https://github.com/mitsuhiko/agent-stuff/blob/main/pi-extens...

[4] https://github.com/nicobailon/pi-mcp-adapter

[5] https://github.com/steipete/mcporter

[6] https://github.com/gkamradt/LLMTest_NeedleInAHaystack

[7] https://research.trychroma.com/context-rot

[8] https://arxiv.org/html/2601.20834v1

CuriouslyC · 2026-02-01T13:43:19 1769953399

Pi is the part of moltXYZ that should have gone viral. Armin is way ahead of the curve here.

The Claude sub is the only think keeping me on Claude Code. It's not as janky as it used to be, but the hooks and context management support are still fairly superficial.

WA · 2026-02-01T16:09:02 1769962142

Author of Pi is Mario, not Armin, but Armin is a contributor

mustaphah · 2026-01-29T19:28:14 1769714894

I can see their point.

Traditional systems (git blame, review history, ticket links) tell you who committed or approved changes, but not whether the content originated from an AI agent, which model it used, or what prompt/context produced it.

Agent Trace is aiming to standardize that missing layer so different tools can read/write it consistently.

mustaphah · 2026-01-26T08:06:35 1769414795

It's kinda popular these days. I've read some high-quality articles there.

mustaphah · 2026-01-09T21:26:48 1767994008

Once exercise becomes a habit, it's very easy to do even on days when your mood is terrible. A strict routine (initially) is the trick to making things easier forever.

You definitely want to build that habit when you're at your best.

mustaphah · 2025-10-31T12:35:44 1761914144

Codex is better for backend coding. For UI/UX, Claude is a clear winner for me.

I use both interchangeably.

SunshineTheCat · 2025-10-31T19:07:49 1761937669

Interesting, that is good to know. I have definitely experienced Codex fumbling really easy UI tasks so that will be worth giving Claude a try for those.

mustaphah · 2025-10-29T19:07:30 1761764850

Seemingly, you didn't bother to read it.

Marshferm · 2025-10-29T19:18:28 1761765508

Of course I did, the paper is about accurate self awareness and metacognition not reversing dunning.

mustaphah · 2025-10-29T19:27:19 1761766039

In their "highlights" section:

> Large Language Model usage levels out the Dunning–Kruger effect.

That's basically my title. I think that's the interesting finding in the study.

Marshferm · 2025-10-29T20:58:03 1761771483

It’s selective narration of a scientific paper.