What is the difference in cache memory and tightly coupled memory

Due to being embedded inside the CPU The TCM has a Harvard-architecture, so there is an ITCM (instruction TCM) and a DTCM (data TCM). The DTCM can not contain any instructions, but the ITCM can actually contain data. The size of DTCM or ITCM is minimum 4KiB so the typical minimum configuration is 4KiB ITCM and 4KiB DTCM.

It looks like tcm have same purpose as cache memory.

No. They didn't used the word cache in explanation

Solution

It looks like tcm have same purpose as cache memory.

TL-DR; Yes, in some sense they do. However, the cache is dynamic and the TCM is static. If your design is static, the TCM has advantage of cost and power.

A cache uses access patterns to populate data within the cache. It has extra hardware to track the backing address and may have communication with other system entities (SMP) to track when a cache line is dirty (someone else has written something to primary memory).

The 'TCM' (tightly coupled memory) is fast, probably SRAM multi-transistor memory, like the cache. Both have a fast dedicated connection to the CPU. However, the overhead to implement the TCM is far less than a cache. Typically TCM is found on lower-end (deeply embedded probably Cortex-M) ARM devices.

Most CPU caches have a lock down feature which enables them to behave like the TCM. However, the TCM does not have on the fly capabilities to buffer high use code and data. Because of this, the TCM (and locked cache) is probably more deterministic which may help hard real time applications.

Part of the 'pragmatism' of lower-end versus higher end is that for embedded (applications specific CPU), the work load is controlled and well defined. Here, the TCM is statically allocated to work load as the system in not dynamic and is special purpose/specific tasked. Also, BOM (bill of material) cost of the CPU and battery life or power consumption is a factor. 'lower-end' is often lower power.

For dynamic loads where a goal is maximum performance, the cache is much more flexible. With unlimited power and cost budget, the cache is always better. Most designs do not have unlimited power and cost budgets. And a possible future advantage for TCM is that power creates heat and in the smallest lithography processes, getting heat out of the package is a major constraint. Maybe we will return to transputers with TCM running on a micro-kernel.