Is there any special reason for STOSB to use extra segment?

I have read that STOSB functions like this:

ES:[DI] <-- AL

If DF = 0 increment DI else decrement DI.

So why STOSB doesn't change DS:[DI]?
Is there a special purpose for using extra segment?
In most string instructions we use extra segment. Why?

Solution

So why STOSB doesn't change DS:[DI]

Because this definition would collide with the use of LODSB which already uses DS:[SI]. Using a separate segment register gives you more flexibility.

Is there a special purpose for using extra segment?

Yes. You can transfer bytes between segments easily while processing them. For example, you can use a LODSB loading AL from DS:[SI], modify AL and then store it to a different segment, the Extra Segment, with STOSB using ES:[DI]. In 8086 with its 16-bit segment size and a 20-bit address space this is really useful.

Another instruction illustrating the use is the REP MOVSB instruction which copies a sequence of bytes (with its length in CX) from DS:[SI] to ES:[DI].

(If you don't need to examine each byte as you copy it, you'd simply use rep movsb or rep movsw for better performance than lods/stos in a loop.)

In most string instructions we use extra segment. why?

Well, not in most, but maybe roughly in half of them. Using another segment register gives you the advantage of quick access to different segments - you are not limited to processing data only in one 64KB segment and do not have to change the DS register before each access to a different segment.

stos and movs write es:[di], which makes sense because DI is the "destination index" register.

cmps and scas read from es:[di], which is maybe surprising for scas because it only has one memory operand so you might expect it to read from ds:[si] like lods. Especially because SCASB sets flags from AL - [mem], not the other way around, so it's like a cmp where memory is the right operand (source), not left (destination). Like cmp al, es:[di].

Perhaps the architect of the 8086's instruction set imagined a use-case of a loop that does lods and scas to implement strcmp between segments.