This is the code that I use with FASM:
format PE console
entry main
include '..\MACRO\'
section '.data' data readable writeable
msg db "привіт!",0dh,0ah,0 ;hi
lcl_set db ?
section '.code' code readable executable
;fail without set locale
push msg
call [printf]
pop ecx
;succeed with set locale
push msg
call _liapnuty
pop ecx
push 0
call [ExitProcess]
push ebp
mov ebp, esp
;sub esp, 0
mov ebx,[ebp+8] ; 1st arg addr
mov al, [lcl_set]
or al, al
jnz _liapnuty_rest
call __set_locale
push ebx
call [printf]
pop ebx
mov esp, ebp
pop ebp
ret 0
mov al, [lcl_set]
or al, al
jnz __set_locale_rest
push 1251
call SetConsoleCP
call SetConsoleOutputCP
pop ecx
mov [lcl_set], 1
;push lcl
;call [system]
; pop ecx
; mov [lcl_set], 1
;push cls
;call [printf]
; pop ecx
ret 0
section '.idata' import data readable
library kernel,'kernel32.dll',\
import kernel,\
import msvcrt,\
It works almost perfectly, except that before exiting it waits for like a second for some reason. It outputs data almost instantly, yet it fails to shut down quickly. If the reason is using these libraries or not clearing the stack after calling ExitProcess (which obviously can't be done), then let me know and I will mostly gladly accept this answer, but I want to be 100% sure I'm doing everything correctly.
The reason for all of it was because kernel32 functions pop their parameters themselves on return. If I remove unnecessary pops it starts working fast again. Of course, the program still runs with damaged stack but it does a lot of damage control at the end. That's why it was slow, but still worked. For everyone facing this issue, make sure to be careful with the calling convention.
To debug the application and find the error I used OLLYDBG. It's free and it works. It helps you debug EXEs and DLLs, allowing to step one command at a time. Also it shows the memory, the stack and all of the registers and flags.
Using the stack I was able to find out that it gets corrupted.