Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Offtopic:

What are some good frameworks for webscraping and PDF document processing -- some public and some behind login, some requiring multiple clicks before the sites display relevant data.

We need to ingest a wide variety of data sources for one solution. Very few of those sources supply data as API / json.



I have built most of this and have it running on Google Cloud as a service. The framework I built is Open Source. Let me know if you want to discuss: https://mitta.ai


I like Crawlee: https://crawlee.dev/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: