They used uncompressed textures on the cart, in ROM. (not on-cart RAM) Normally a game would store compressed textures in ROM, and decompress them into RAM. It was a solution with significant tradeoffs though.
#1 It was still slower than the cache.
#2 You were still using the single shared bus. You would still be using cycles which contribute to data stalls elsewhere in the system.
#3 ROM was expensive. N64 games were typically in the ballpark of $10 more expensive than Playstation or Saturn games because of the manufacturing expense.
#4 I don't fully understand why, but it was all or nothing. You couldn't have uncompressed textures in ROM but also gain the benefit of the cache. Maybe the cache invalidation was poor or something. I wish I knew more.
Later games were more likely to go this route because ROM was cheaper. (Moore's Law and all that)
So the TMEM wasn't cache, but manually managed memory split into 8 512 byte banks that had to be loaded from the RDP's command list stream. That's half the problem.
Additionally, the TMEM could only be loaded from RDRAM, not directly from the cartridge. I think the RDP's DMA master is only connected to the RDRAM slave port and not the main system's bus matrix.
So going back to it, games would a lot of the time store compressed data with a simple algorithm that could run out of the CPU's cache. Then the scheme looks like
* Cart->RDRAM DMA of compressed texture
* CPU decompresses texture into another RDRAM bank, and can be considered a RDRAM->RDRAM transfer. Sometimes the RSP handles this instead. I'm not sure if you could load straight out of RSP DMEM to avoid another bounce to RDRAM. I don't think XBUS works that way, but I could be wrong.
* RDRAM->TMEM DMA of uncompressed texture
Interestingly, games with more advanced texturing schemes like Indiana Jones tended to use uncompressed textures. They did this to avoid the decompression step and it's bandwidth. At that point it's just staging the texture with that cart's DMA, and slurping that into TMEM without any other processors eating bandwidth in between.