Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

http://yt-project.org/

I just had an idea about using deep learning to train a bot to judge the quality of a merge. Does it have tests? Does it pass lint? Is the subset of the grammar used in the patch the same as the rest of the source? If a patch is "clean" it should get automatically merged and only require human intervention to reject it.



So basically, anything that looks like a real patch should be merged? This is a terrifying idea. Human review is necessary to determine if a change is semantically sound and technically/politically desirable. No amount of automation could ever address these concerns.

Lint checking is handled by basic CI integration (e.g. Travis). Having tests is something a reviewer can trivially check (and many changes don't necessarily warrant test changes anyway).


There's a great talk on this [1] by Raymond Hettinger where he argues that style guides (e.g. PEP8) are no alternative to thinking hard about the code. Blindly following rules (like a bot would do) just gives a false sense of security that it's good code.

[1] https://www.youtube.com/watch?v=wf-BqAjZb8M


> So basically, anything that looks like a real patch should be merged?

Yes.

> No amount of automation could ever address these concerns.

Watch out for when experts say no.

If a tests and coverage can be trivially checked by a reviewer, why not let a bot do it? Conversely if it doesn't meet basic standards, why not let a bot ask for changes?

What if I train a refactor bot to make trivial refactorings? Naming, extract method, extract interface, warn on over coupling? If the cost of backing out a patch in the future is near zero, the cost of applying a patch today is also near zero. If the person making the change has a good track record, why not automerge?


Doing lots of automated analysis on a patch is definitely great. However I'm talking about the kind of analysis that really requires humans who understand the larger ecosystem. A bot can't understand that some interface design is broken on windows because some kernel API exists that subverts its assumptions.

Further, the cost of backing out or applying a patch isn't zero. A bad patch landing (even one that passed all the analysis at your disposal!) can cause big projects to have to basically shut down their tree as they scramble to figure out what's wrong. This is a huge cost. The classic example of this is creating a sporadic test failure, causing random patches to bounce.

I've definitely worked in projects where I'll basically automerge some PRs because I know the person is great and the risks are low, but this is generally conditional on the purpose of the PR. If it's cleanup and refactoring... great, ship it, I don't care. If it's adding some functionality, I need to review if that functionality should be added.

However a big value of PRs is to force us to uncut corners. We all cut corners (or miss obvious details) when we program, and I see human review constantly catching this kind of sloppiness and forcing great programmers (the kind I'd automerge on trivial changes) to do the job right. If you just automerge, you accumulate code debt much more rapidly. There's also the aspect that human review often helps ensure that at least two people understand the decision process behind some piece of code.


Don't disagree on much.

What if the cost of backing out a change actually was zero or close to zero? What if the prospective change actually was run on all platforms with all of the historic input the function has ever seen?

What happens when our undo buffer and our log of refactorings gets checked into version control. Not a log of pure text, but a log of semantic transformations.

Of course bad patches make it into the tree. Right now all the bad patches require human intervention. I am saying that it might be economically advantageous to let in some of the bad patches automatically so it takes less work.

For most codebases I see in industry, simply rejecting a patch on lint failure, lack of tests or documentation would effectively freeze most trees.


I didn't follow the last sentence. Does every pull request get reviewed by a human before merge? At what stage does the human rejection happen?

The smart deep learning bot can determine if a branch is "clean", but it surely can't determine if it's correct.


I find the type of quality measures you have described as being very superficial. I joined a startup a couple of months back, and it may well have passed all of your quality measures. But the overall design was terrible. I could have rewritten it in around a tenth of the code. The databse design was poor, with lots of redundant data. Joins were being done at the apllication level, and it was making way too many calls to the database an just doing things in a really inefficient roundabout way. 10 x more code than is needed makes understanding it way more difficult than is should be.


I certainly won't trust a bot to validate code for all the operational and trust issues it commingles. Not anytime soon.


What if validating code was your job, and you wanted to automate an unimportant part of it? What part would you automate and why?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: