What are some good frameworks for webscraping and PDF document processing -- some public and some behind login, some requiring multiple clicks before the sites display relevant data.
We need to ingest a wide variety of data sources for one solution. Very few of those sources supply data as API / json.
I have built most of this and have it running on Google Cloud as a service. The framework I built is Open Source. Let me know if you want to discuss: https://mitta.ai
What are some good frameworks for webscraping and PDF document processing -- some public and some behind login, some requiring multiple clicks before the sites display relevant data.
We need to ingest a wide variety of data sources for one solution. Very few of those sources supply data as API / json.