What if the most interesting finding ends up buried under a vague title? Aside from the "self-generated skills" aspect, there isn't much there that meaningfully warrants deeper discussion.
I chose a title that directly reflects an interesting finding - something that offers substantial insight to the community. I think the rule should be applied with some nuance; in this case, being explicit is a net positive.
I have no interest in linkbait, and I hope that's evident from my previous submissions
Thanks @dang for moderating! This is indeed not our original findings and this is a sub conclusion for an ablation we did to remove the confound of LLMs internal domain knowledge. Thanks for submitting for us @mustaphah here's a little bit more details on how we approach this:
> I would frame the 'post-trajectory generated skills' as feedback-generated skills, so is Letta: https://www.letta.com/blog/skill-learning. We haven't seen existing research or hypothesis debating whether the skills improvement might come from the skill prompt themselves activated knowledge in LLMs that can help itself. So that's why we added an ablation of 'pre-trajectory generated skills' because we have that hypothesis and this seems a very clean way to test it. Also it is very logical that feedback generated skills can help, because it most certainly contain the failure mode of agents on that specific tasks.
Yeah, I got your point when I read the paper. You're essentially controlling for "latent domain knowledge."
I might have been a bit blunt with the title - sorry about that, but I still think it was a good title. From what I've observed, a lot of Skills on GitHub are just AI-generated without any feedback or deliberative refinement. Many thought those would still be valuable, but you've shown evidence otherwise.
Yes, I appreciate that, and yes there is room for nuance. But I think you went too far in this case, meaning that the delta between the article title and the submission title was too large. For example, the word "useless" appears nowhere in the article abstract nor in the article body. That's a big delta.
I was starting to type out a longer explanation but I ran out of time - however, I probably would just be repeating things I've said many times before, for example here: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... - perhaps some of that would be helpful.
You're a fine HN contributor and obviously a genuine user and I hope I didn't come across as critical! From our side it's just standard HN moderation practice. The way we deal with titles has been stable for many years. It isn't entirely mechanical, there are many subtleties (back to the nuance thing) but the core rules have served the site really well. THe main thing we want to avoid is having the title field be a mini-genre where whoever makes the submission gets to put their spin on the article.
Hi! I'm a backend engineer (~8 YOE) with strong backend & DevOps experience and decent frontend skills. Looking for a backend or backend-leaning full-stack role.
I worked at Automattic (US), the company behind WordPress; fully remote, async teams across the globe.
I've been part of teams focused on speed and rapid iteration, and I've also worked on high-quality systems where long-term maintainability and reliability matter the most. I believe this mix of experience has helped me develop a good sense of where each fits best and has enabled me to adapt quickly based on context and requirements.
I've built and maintained time-sensitive, high-throughput, distributed services (millions of ops daily) and owned features and small- to mid-sized projects end-to-end from design to deployment. I do my best working autonomously, and I like to think of myself as a generalist.
Most of my career has been in large enterprises (7+ YOE), but I've also done a fair amount of freelance work (around 1 YOE) for clients.
I have a couple of small open-source projects on GitHub.
I've seen a couple of power users already switching to Pi [1], and I'm considering that too. The premise is very appealing:
- Minimal, configurable context - including system prompts [2]
- Minimal and extensible tools; for example, todo tasks extension [3]
- No built-in MCP support; extensions exist [4]. I'd rather use mcporter [5]
Full control over context is a high-leverage capability. If you're aware of the many limitations of context on performance (in-context retrieval limits [6], context rot [7], contextual drift [8], etc.), you'd truly appreciate Pi lets you fine-tune the WHOLE context for optimal performance.
It's clearly not for everyone, but I can see how powerful it can be.
Pi is the part of moltXYZ that should have gone viral. Armin is way ahead of the curve here.
The Claude sub is the only think keeping me on Claude Code. It's not as janky as it used to be, but the hooks and context management support are still fairly superficial.
Traditional systems (git blame, review history, ticket links) tell you who committed or approved changes, but not whether the content originated from an AI agent, which model it used, or what prompt/context produced it.
Agent Trace is aiming to standardize that missing layer so different tools can read/write it consistently.
Once exercise becomes a habit, it's very easy to do even on days when your mood is terrible. A strict routine (initially) is the trick to making things easier forever.
You definitely want to build that habit when you're at your best.
Interesting, that is good to know. I have definitely experienced Codex fumbling really easy UI tasks so that will be worth giving Claude a try for those.
reply