Also there's the https://github.com/karlicoss/HPI library, which you could build on, though it mainly relies on data dumps from the different services instead of crawling and fetching through APIs, which is why I didn't use it. Keeping up with API changes is bad enough, I don't want to deal with undocumented dump formats...