The ARM720T user manual mentions small and large pages. Since the ARM 720T requires a 64KB page table entry to be duplicated 16 times in the page table, why not place 16 small page (4KB) entries to mimic a 64KB page entry instead of using a large page in the first place?
From the ARM720 TRM,
Large Pages consist of 64KB blocks of memory. Large Pages are supported to allow mapping of a large region of memory while using only a single entry in the TLB). Additional access control mechanisms are extended to 16KB Sub-Pages.
The main benefit is a 64k entry will only consume one TLB (MMU page entry cache). The TLB is 64 entries so 64*4k = 256kB
versus 64*64k = 4MB
; a significant increase in the amount of memory that doesn't require a page table lookup to address.
There are many down sides. For instance, a portable OS (and it's API) might require the smaller pages. If all entries are 64k fragmentation can result. The section entries are even better with each representing a 1MB chunk with 64MB fitting in the TLB. Generally the section will work better for a virtual==physical mapping.
If you know your system only has 4MB of usable memory then the 64k page entries can result in more reliable performance. Even with larger memory sizes the interrupt code and data can use 64k entries with TLB lock down note to avoid page table walks. This can result in better IRQ latency. The TLB is a limited resource so using 4k entries for the interrupt handler may result in wasting the TLB. Using section entries may waste memory as most interrupt code is <1MB.
Even without lock down, it is more likely that a 64k entry that is frequently used will remain in the TLB. An OS with per task/process memory may need to change the MMU tables which can result in TLB and cache flushing and invalidate. In order to simplify the context switch, everything maybe invalidated and flushed. So a table walk on an interrupt may be more common than you would suspect. This is a motivation to use the MMU 'PID' functionality and to only flush/invalidate smaller regions of memory and allow kernel code/data to remain in system caches. Additional code like the scheduler will also benefit from being mapped by a 64k entry.
Note: The ARM720T may/may not have lock down, but some ARM CPUs do and the MMU entry are fairly similar between CPU families. This answer applies to many different families of ARM CPUs.