Search code examples
windowslocalhostnetwork-sharenetwork-shares

Windows - Does accessing data through "localhost" incur network stack overhead


I have a large number of audio files I am running through a processing algorithm to attempt to extract certain bits of data from it (ie: average volume of the entire clip). I have a number of build scripts that previously pulled the input data from a Samba network share, which I've created a network drive mapping to via net use (ie: M: ==> \\server\share0).

Now that I have a new massive 1TB SSD, I can store the files locally and process them very quickly. To avoid having to do a massive re-write of my processing scripts, I removed my network drive mapping, and re-created it using the localhost host name. ie: M: ==> \\localhost\mydata.

When I make use of such a mapping, do I risk incurring significant overhead, such as from the data having to travel through part of Windows' network stack, or does the OS use any shortcuts so it equates more-or-less to direct disk access (ie: does the machine know it's just pulling files from its own hard drive). Increased latency isn't much of a concern of mine, but maximum sustained average throughput is critical.

I ask this because I'm deciding whether or not I should modify all of my processing scripts to work with a different style for network paths.

Extra Question: Does the same apply to Linux hosts: are they smart enough to know they are pulling from a local disk?


Solution

  • When I make use of such a mapping, do I risk incurring significant overhead,

    Yes. By using an UNC path (\\hostname\sharename\filename) as opposed to a local path ([\\?\]driveletter:\directoryname\filename), you're letting all traffic occur through the Server Message Block protocol (SMB / Samba). This adds a significant overhead in terms of disk access and access times in general.

    The flow over a network is like this:

    Application -> SMB Client -> Network -> SMB Server -> Target file system
    

    Now by moving your files to your local machine, but still using UNC to access them, the flow is like this:

    Application -> SMB Client -> localhost -> SMB Server -> Target file system
    

    The only thing you minimized (not eliminated, SMB traffic to localhost still involves the network layers and all computations and traffic associated) is network traffic.

    Also, given SMB is specifically tailored for network traffic, its reads may not optimally use your disk's and OS's caches. It may for example perform its reads in blocks of a certain size, while your disk performs better when reading blocks of another size.

    If you want optimal throughput and minimal access times, use as little layers in between as possible, in this case by directly accessing the filesystem:

    Application -> Target file system