Scraping also has, in some contexts, negative associations. In a project for a non-profit that I'm involved with that coincidentally was originally a remix of some of Simon's code for one of these "Git scraping" projects + Datasette, I recently made the decision to refer to it strictly as what it is: a crawler.
I'm less warm at this point to the general idea behind the hack of dumping the resulting JSON crawl data to GitHub. It's a very roundabout way of approaching basically what something like TerminusDB was made for. It definitely feels like the main motivation was GitHub-specific stuff and not Git, really—namely, free jobs with GitHub Actions—and everything else flowed from that. It turns out that GitHub Actions proved to be too unreliable for executing on schedule, anyway, so we ported the crawler to JS with an eye towards using Cloudflare Workers and their cron triggers (which also come in a free flavor).
My first implementation of this pattern predated GitHub Actions and used CircleCI - though GitHub Actions made this massively more convenient to build.
I'm less warm at this point to the general idea behind the hack of dumping the resulting JSON crawl data to GitHub. It's a very roundabout way of approaching basically what something like TerminusDB was made for. It definitely feels like the main motivation was GitHub-specific stuff and not Git, really—namely, free jobs with GitHub Actions—and everything else flowed from that. It turns out that GitHub Actions proved to be too unreliable for executing on schedule, anyway, so we ported the crawler to JS with an eye towards using Cloudflare Workers and their cron triggers (which also come in a free flavor).