It seems to me that this indeed can be done with nexrender. However, most likely you would need a small team of an After Effects template designer, and a junior/middle javascript developer and about 20-40 hours to pull it off.
After that, in theory, you can just take existing template and infrastructure, feed it with needed data, including voice synthesized with a different library (it would be the developer's responsibility to find a fitting one), and generate as many total videos as you would need.
After that, in theory, you can just take existing template and infrastructure, feed it with needed data, including voice synthesized with a different library (it would be the developer's responsibility to find a fitting one), and generate as many total videos as you would need.