Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The implementation of the scraper is entirely contained in a single GitHub Actions workflow.

It's interesting that you can run a scraper at fixed intervals on a free, hosted CI like that. If the scraped content is larger, more than a single JSON file, will GitHub have a problem with it?



GitHub repos appear to have a "soft" size limit of about 1GB - I feel completely comfortable with free repos with up to that size of content.

Once you get above 5GB I believe GitHub Support may send you a quiet polite email asking you to reconsider!

https://docs.github.com/en/repositories/working-with-files/m... has some more information on limits - they suggest keeping individual files below 50MB (and definitely below 100MB).


I occasionally scrape results from brazilian lotteries. Their official web sites have internal APIs which simply return JSON data. I simply download the JSON and commit it to the repository. Right now I have 5504 files totalling 22 MB. GitHub hasn't complained yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: