Search code examples
javaiostreaminputstreamoutputstream

Trying to understand Java I/O and streams


Very new to coding and Java, and I'm trying to wrap my head around streams.

My textbook says that "a stream is linked to a physical device by the Java I/O system." What do they mean when they say "physical"? I've also seen the word used to describe code instead of an actual physical thing that one can see and touch.

When they say a stream is linked to a physical device, do they mean an actual thing you can hold, or something that exists in memory, like an object? Google isn't helping me much with this, saying things like "A Stream is linked to a physical layer", and I'm not sure what that means either.


Solution

  • Your book is probably outdated.

    Java has had the java.io.InputStream interface for about 30 years. That's what it is referring to, but is incorrect.

    Java also has the java.util.stream.Stream stream which is completely unrelated and also uses that word.

    InputStream

    InputStream is an abstract concept representing a readable stream of bytes. By design this stream is not (necessarily) kept in memory. For example, you can turn a file into such a thing, and it doesn't matter if that file is absolutely gigantic. You can simply check the javadoc of InputStream for more.

    Given that it is an interface, you can implement one if you want. Java has many of them baked into the core libraries, and there are hundreds more in commonly used third party libraries.

    They are often used for e.g. files, network connections, and database blobs. Whether you feel that is 'physical'.. now we're just mincing words, I'm not sure it's useful. Point is, the tutorial/document you are reading decided to clarify matters by using that term and I think that was a really bad idea. It clarifies very little. In fact, it's essentially hogwash - there is no need for an inputstream to be backed by a physical device at all. Here:

    byte[] data = new byte[] {1, 2, 3, 4};
    InputStream in = new ByteArrayInputStream(data);
    

    Voila. I made an input stream that is most definitely not backed by anything physical. Or, if 'well, that array is in RAM so I guess that counts' is the line of thinking, then a String is also backed by 'something physical' and in fact everything in java is. Either way, that statement is either incorrect or highly misleading.

    Nevertheless, InputStream is designed to deal with 'physical-ish' concepts.

    For example, whether we are talking about spinning platters or cells in an SSD, both of them are fundamentally incapable of giving you a single byte. Instead they give you whole blocks worth.

    Hence, the API of InputStream is a bit weird - you can ask it for 1 byte, or, you can provide it a byte array and ask it to fill it 'up as far as the input stream feels is efficient'. For example, given a file with 1GB of data left to read, if you hand it a 100MB byte array, the API is free to fill 20MB worth, tell you it gave you 20MB, and return. Even though it could fill 80 more MB.

    Because perhaps that was 'most efficient' - it depends on the hardware, and the entire point of InputStream is that it abstracts this stuff away.

    The book / tutorial / that google result is making a very common explanatory mistake, which is that they have a word, they think everybody assumes that word means what the author thinks it means, and then just uses the word without explaining it.

    No such luck - many words are either straight up ambiguous (in that there are entire culture wars fought over its meaning), or only clear in the sense that 'jargon is clear' - if you know what a 'stream' is in java jargon, then you... know what a stream is. Either way, presuming that the reader knows the jargon when you are trying to explain the basics is didactically speaking of course utterly daft. And yet it sounds like your tutorial or what not made that mistake. Probably best to toss it out, or at least keep in mind it's not doing a very good job.

    At any rate, what the author feels the word 'physical' means is:

    • A file on disk/ssd.
    • A blob in a database.
    • One side of a TCP/IP network connection (the 'data flowing from whomever you are talking to, to your application. For example, if requesting a web page, the HTTP return headers and, say, the HTML).

    Which isn't clear, but even if we take that as read, it's wrong in that you can also make an InputStream for something as simple as a byte array.

    The java.util.stream.Stream sense

    This is a different 'take' on the notion of a collection, such as a list. It has absolutely nothing to do with files and networks and is a similar highly abstract idea. InputStream is a stream of bytes, and the API exposes various ways of grabbing them in bulk for efficiency reasons. j.u.s.Stream is a stream of Ts - as in, pick a type, it can be a stream of strings, bytes, persons, students, horses, computers, videos, messages, songs, or whatever you want. It doesn't have bulk methods "for efficiency", it has bulk methods for readability. Other than the notion that the english dictionary word 'stream' feels like a decent fit for both of them, there is absolutely no relation between the two.

    Further highlighting one clear conclusion: The tutorial / book you are reading is trying to clarify things. It missed the mark. By quite a few ballparks.