Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Debugging – Section 5.10 of Programming Pearls (bell-labs.com)
92 points by ColinWright on March 17, 2013 | hide | past | favorite | 30 comments


The stories in the post all have repeatable bug conditions. If you read bugs submitted for something like Chrome or Firefox (or any big complex program) you see a lot of trying to figure out what state triggers a bug. There's a lot of, "I can't repeat it. Works fine for me." It's almost impossible to deduce what causes a bug because so many possible states can exists at any given moment.

The way a lot of bugs are eventually solved is not by asking what caused them, but when they started. It's often the regression testers who find the source. You keep going back through the nightly releases until you find one that doesn't exhibit the bug. At that point you go to the commit logs and there is your problem.


I've half a blog post written about this that I'll probably never finish.

Given how much memory, disk space, etc. we have these days it's getting to the point that we should really have the ability to dump a program's state and all actions from starting the program to finishing it and then fast forward/rewind the actions of the user. Inspect all the locals, see what's actually happening. Set a loop over 30 seconds. Bugs would be a lot easier to find and recreate.

A bit like lightbox, but for debuggers!

It's just that no-one's done it yet and no programming language is really written to do it.


Isn't this what omniscient debugging is? http://www.lambdacs.com/debugger/

Great quote from that page, "The ODB is as close to a silver bullet as you can get. Why don't people use it?"


Last I checked, OCaml had a replay debugger -- one which lets you step backwards.

And I know the mozilla guys sometimes use chronicle-recorder to fully record a running application for debugging. (It was referenced in a bug I filed.)

  http://code.google.com/p/chronicle-recorder/


Does anyone have experience using Chronon?

http://chrononsystems.com/

It is billed as "DVR for Java."


> Given how much memory, disk space, etc. we have these days it's getting to the point that we should really have the ability to dump a program's state and all actions from starting the program to finishing it and then fast forward/rewind the actions of the user.

Given a program from 20 years ago this is definitely possible. Given a program from today, it becomes an intractable problem quite quickly.

> It's just that no-one's done it yet and no programming language is really written to do it.

Live programming might get you what you want, but it won't be through brute force tracing of everything. Better to focus on deterministic replay then its quite easy to rebuild the contexts we need to fake it.


Bisecting the code is a pain in the ass. Until back-in-time debugging is more of a standard, another alternative is simply to log a lot of stuff and include a bug reporting tool in whatever software you ship that will attach the log to the bug report.


The problem with logs is that programmers log the data that they believed, at the time they wrote the code, would be useful to help diagnose problems. However, many bugs arise from conditions that programmers failed to anticipate, and thus may have failed to log.

Also, there are important things that you just can't log for legal or ethical reasons. For example, no user of a browser would be happy to know that the current URL they were viewing or their POST data was sent back to a browser developer (or even saved in a local file) without their explicit permission.


I was debugging some code I had written once. It was leaking 1M of memory per call (C/C++). Stepping through the code revealed nothing. Leak detectors told me that everything was fine. In desperation, I changed a function name to "potato." The leak stopped. I nearly fell out of my chair. It turned out that I was using exactly the same method signature as a function in a library I was linking against. This was over ten years ago so I do not remember the details, but I will never forget the potato function.


I was teaching a coworker about activerecord, and he was trying to create some simple throw-away models to gain familiarity. He decided to name his model "Cow." Fine, so we create the db table cows and run migrate and try a little command line business -- and we can't successfully save, with pg errors about missing elements in some pg system table called "kine."

Cue fifteen minutes of frustrated googling and trying to figure out the problem.

Eventually, I write "cow".pluralize, and rails helpfully informs me that the plural of cow is kine. Aaaargh.



Yeah, we saw that after we finally figured out what the problem was.


Just some "supplemental reading" for anyone who read your post. :-)


Just the other day I was using the github_api gem to inspect Github repos via a Rake task. The instant I ran it, my screen filled up with the same output over and over. I couldn't kill the ps, and `ps -ef | grep rake` spilled hundreds of lines and never stopped. In a moment my machine was dead.

Turns out the problem was accessing `repo[:fork]`. That wasn't telling me whether the repo was original. (Oddly enough, repo.fork was safe.)


One day I found I couldn't log in typing one-handed, holding the keyboard in the other. I finally realised that my password of some years wasn't what I thought it was, I just typed it consistently incorrectly when touch-typing. Quite a secure password when even I didn't know what it was. :-)


I'm actually always afraid of this. Since I use Dvorak, should I ever need to log onto a computer with Qwerty, I had better know my actual password—but I'm not sure I do!


Just pick letters and symbols that are the same on both layouts.


Well? What is it?


I happened to watch some of the TV show Nikita last Friday, and the plot driver was "a character was the password", as he had a subliminal password. He would play some odd video game to "enter" the password, but he never actually knew it.

I thought that sounded made up, but apparently subliminal passwords do exist:

http://www.escapistmagazine.com/forums/read/7.382686-Remembe...


Reminded me of my favorite bug:

http://www.ibiblio.org/harris/500milemail.html


Absolutely. If anyone wants to read previous HN comments on that one, here's the search for you:

https://www.hnsearch.com/search#request/all&q=title%3A(5...


I have the book mentioned at the end, _The Medical Detectives_, by Berton Rouechè, and it is full of fascinating stories - highly recommended.


Great read, particularly relevant to my weekend spent debugging a Core Audio granular synthesis engine. It was definitely of the 'my code is haunted, that's the only explanation' variety -- audio files that were discarded were still faintly audible in the background. I read that article, sat down at my computer, stepped through my code in the debugger again and realized that I was setting my audio stream format to 2 channels/interleaved whilst converting to a mono stream. So, whenever I filled the buffer with a new audio file, my file-length parameter was incorrect, and some bytes were never freed. Because the files are all close to same length, I never noticed the issue before. It only surfaced when I parameterized the 'grain duration' in the engine. The irony is that just yesterday a friend was asking about getting started in Core Audio and the advice I offered was to spend a lot of time learning about Audio Stream Basic Descriptions, because they're usually the cause of most problems.


I debugged a few crazy bugs in the past. Example: we have a listing of some items sorted by ascending distance from the user and someone noticed the ordering isn't always right. Luckily at some point I looked at the SQL server production logs and noticed that the SQL string is always the same, doesn't include the coordinates of the newly arrived user. This bug turned out to be there for years and nobody ever noticed it. The pitfall was that there was a constant reference to a template SQL string for doing the ordering, that was passed to an external plugin that filled the template with detailed data. Unfortunately, the plugin modified the string instead of making a copy, so the sorting criteria applied correctly only to the first user. More unfortunately, there were multiple application servers in a round robin, so when you did the first few tests you hit different servers and the behaviour was correct until you managed to hit the same server the second time with different coordinates.


Programming Pearls (which this story came from) is an excellent book, I recommend it!


I've had a good one of these, even went for help from StackOverflow (subsequently deleted) [1]

I couldn't figure out why my '%' key was exiting insert mode in vim. It turned out to be bad keying - I had remapped caps-lock to escape, and was grazing that key while hitting shift with my pinky. Drove me nuts until I figured it out, was especially hard to debug because it was an 'intermittent bug'.

[1] http://kevinlochner.com/physical-technical-problems


A quaint but very worthwhile read (we know, Comic Sans isn't cool). Lots of fun stories, and it helps control those impulses to just start hacking at the weeds.

http://www.debuggingrules.com/


I have always liked the debugging as figuring out a magic trick comparison. In both cases you are mystified solely due to incorrect assumptions.


In today's throwaway culture, it makes sense to punt debugging (read: conserve debugging resources) whenever possible.

That flaky appliance? Junk it and buy a new one.

The explosion of systems complexity have made many debugging tasks harder than ever before.


Programming Pearls is my all time favorite programming book. All his examples are similar to the one given, real world stuff that programmers know happen all the time though it's like fishing stories, some bugs are so ridiculous and they kick your ass for so many hours, you're ashamed of letting others know or they'll think you're making it up!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: