I don't understand what is being optimized by minimizing the product rw. Reducing w reduces the number of digits, but how does reducing the radix r help with component count? Surely a memory cell that could store 1,000,000 different values in a base-1-million system is just one component, even if it is quite difficult to make. Are they talking about the components needed to perform calculations?
I'd say, your base-1-million system is virtually impossible to make...
But you are right: what counts is how much information you can store or process per square inch / per Watt, not really how many components you need to achieve this.
While w measures the number of cells you need, r is a coarse approximation of the complexity of each cell.
In practice people never got a ternary cell that cost only 3/2 of the best binary cells (in both space, power and money). That's why research dried up. But there's no reason to think this is fundamental.