Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Launching a headless browser just to generate some PDFs.

Turns out, if you want to turn html+css into pdfs quickly, doing via a browser engine is a "works really well" story.



I did the same. We had a tool that would let you export to pdf. That pdf would be sent to our customers. Initially we just used the print functionality in the users browser, but that caused output to vary based on the browser/os used.

People complained that the PDFs generated were slightly different. So instead I had the client send over the entire html in a post request and open it up in a headless chrome with --print-to-pdf and then sent it back to the client.


I've implemented recently just the same thing, but for SVG -> PNG conversion. I found that SVG rendering support is crap in every conversion tool and library I've tried. Apparently even Chrome has some basic features missing, when doing text on path for example. So far Selenium + headless Firefox performs the best ¯\_(ツ)_/¯


I had Chromium component added to a project just to show the users the help file which was a giant PDF document. The PDF file was from a 3rd part vendor who didn't know better/refused to change the system so we had to show it "as is" to the users. Any PDF reader component we tried failed because the PDF file had some crappy features in it that none of those component knew how to parse. Chromium engine, for its hate that gets nowadays, had no problem with any of those PDF files.


I wrote a Python package [1] that does something similar! It allows the generation of images from HTML+CSS strings or files (or even other files like SVGs) and could probably handle PDF generation too. It uses the headless version of Chrome/Chromium or Edge behind the scenes.

Writing this package made me realize that even big projects (such as Chromium) sometimes have features that just don't work. Edge headless wouldn't let you take screenshots up until recently, and I still encountered issues with Firefox last time I tried to add support for it in the package. I also stumbled upon weird behaviors of Chrome CDP when trying to implement an alternative to using the headless mode, and these issues eventually fixed themselves after some Chrome updates.

[1] https://github.com/vgalin/html2image


Yeah it's the same concept, instead of .screenshot you do .pdf in pupetteer.

But with pdfs the money is on getting those headers and footers consistent and on every page, so you do need some handcrafted html and print styling for that (hint: the answer is tables).


This is how we exported designs at Canva. It works!


I've seen a bit of SaaS and legacy websites-with-invoice-system doing that, with e.g. wkhtmltopdf. It isn't a lightweight solution, but it's a good hammer for a strange nail, a lot of off-the-shelf report systems suck.


This also happens to be the easiest path. There are other options but no good ones


We did that at the previous place I worked!


I mean browsers are built for and the best at displaying html+css. Given that they are "living standards", very few other programs can hope to keep up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: