Search code examples
pythonlinuxbashfile-descriptorprocfs

is there a way to 'own' or 'get priority over' the stdin file descriptor of an external process in linux?


tl;dr although a tldr wont explain everything fully, i have an external program ( lets say pid 1234 ) is trying to read from another external process ( lets say pid 1111 ), 1111 always reads from its own stdin, but 1234 wants to handle the stdin instead of the program, but 1111 sometimes blocks 1234 from reading a byte from /proc/1111/fd/0, which is not desired, i want to know how to make 1234 always read a byte from it and 1111 never or rarely be able to read from it

i have been trying to develop a concept for GNU BASH syntax highlighting as the only other syntax highlighter for bash i found is VERY slow because it implements the whole readline lib in BASH

i ran into some issues but came up with something working, somewhat -- https://gist.github.com/TruncatedDinosour/e2034cf470f268596235a5c88ffcd048

you can find a more in-depth explenation on it at https://blog.ari-web.xyz/b/bash-syntax-highlighting-part-one-concept/ currently i asked for general developer public for the answer, 'maybe someone knows' i thought to myself, but i think i might get an answer faster here

basically, this concept has a problem, sometimes it misses a byte because bash steals the read() i think, so far i have tried

  • using LD_PRELOAD to overwrite the read() function ( 'oh it surely should work' )
    • making always return 0 ( 'okay if it falsely reads nothing surely itll work' )
    • making always return 1 ( 'uh, maybe lets make it think we read a single byte even though its empty ?' )
    • redirecting it to a FIFO ( 'maybe just redirecting it to a different type of file would work' )
    • closing it ( not sure what i was thinking here )
  • using os.write / read ( 'maybe directly using unbuffered syscalls would be faster, maybe its just python being slow' )
  • using C++ ( 'okay surely if its a python being slow problem c++ can fix it' )
  • using C ( 'okay, c++ didnt work, maybe c being a simpler language could give me more performance ?' )

currently, i never was able to fully get rid of the issue, so i thought to ask here and the general public, maybe you, fellow people, know how to achieve what im trying to achieve

thanks for any help, ideas or clues in advance :)


Solution

  • To better understand your options we need to take a step back and consider how a process gets its I/O.

    • At the lowest level there is the system call sequence: open(2) to obtain the FD, read(2)/write(2) (and a few others) to actually do the I/O, and close(2). These are provided by the kernel directly. To issue them, you need a low level assembly command (SYSCALL/SYSENTER or ARM's SVC).

    • One level above that are the system call wrappers. the above calls (open(2), etc) are often called from libc wrappers, which are exported functions which "hide" the underlying SVC call.

    • one level above that are any wrappers by Python/Java (e.g. InputStreams, etc), or higher level languages.

    if you want to intercept, you thus have several options:

    • LD_PRELOAD, which you've already tried - this only works on two conditions:

      A) You do it BEFORE the process starts, since LD_* variables are parsed by the linker B) what you're hijacking is the system call wrapper (or other function). In other words, if the process is doing the low level assembly instruction inline, that won't work.

    • Hook the actual low level assembly call/SVC - works every time, but cumbersome (requires effectively debugging the target process and waiting it to issue a sys call, then trapping all sys calls, and filtering out read(2).

    • Hook at kernel level by redirecting the FD - will always work, but probably an overkill for your requirements, and will also require kernel code execution (commonly, via a kernel module). Mentioned here only for completeness - again, probably N/A and overkill.

    • Dynamic hooking of system call: using ptrace(2) API, so you remain in user mode and don't go into kernel mode. the well known strace(1) tool records sys calls, but cannot intercept them. jtrace - http://NewAndroidBook.com/tools/jtrace.html - can do so using a simple plugin API. This is probably your path of least resistance.

    • redirect the FD from target process to injecting process: Either via LD_PRELOAD or some other dynamic injection of code you can open a pipe, socket, or some other IPC primitive in place of the original FD. dup2(2) is highly useful for that.

    Per your specific issue - when the stdin is a terminal - there are other considerations. Namely, that bash or whichever shells also manipulates the terminal directly - using ioctl(2) codes (look at stty(1) for examples, too). One other consideration is that the syntax highlighting is done via ANSI escape sequences, which involves writing to the terminal - (\e + other curses). This might potentially account for the "stolen byte" you mention you're encountering.

    TL;DR, like you say - that answers the injection question in the most comprehensive way we can try, but your particulars of syntax highlighting might require to be addressed differently. Be more specific, and we can try to be, as well :-)