How to extract text from pdf and doc file without downloading

I have searched a lot before asking that question. I have a program(java) which crawls some wep pages and trying to find some .doc and .pdf files and it can download them but only one .pdf or .doc can cover up to 3-4mb which is not good because there are millions of files.. so I decied to extract their text without downloading the whole file. Basically, I need to see pdf or doc file online and download their text only but I could not figure out how to do that. If necessary I can provide my code.

Edit:This question can be closed now since I got the idea and (no)solution. Thanks for help.

And What's up with those downgrades on question ?

Solution

That is not possible. You can only start extracting the document once you download the bytes.

(unless you also have control over the server, you could do the extraction server-side and provide a txt download link)

What is the maven-shade-plugin used for, and why would you want to relocate Java packages?
Spring Security issue with JWT: Cannot subclass final class JwtAuthenticationProvider
Stream Audio from Client to Server to Multiple Clients Java
Find minimum sum from given array
Get src value of <img> tags with inconsistent quoting
Unable to find valid certification path to requested target - error even after cert imported
Java | Binary string to byte
Unexpected OutOfMemoryError when allocating an array larger than the heap
Leetcode 643. Maximum Average Subarray I
How to extract ALT-Texts and Images from a PDF
Spring @Sql Annotations, possible to run once before all tests?
Java Wildcard-types vs Kotlin Star-projection
Can't get attributes of AWS SQS queue using Java & TestContainers
Java/VSCode "file.java is not on the classpath of project, only syntax errors are reported"
Using Enums while parsing JSON with GSON
key-value store suggestion
Springboot mySQL Failed to determine a suitable driver class
Ant command 'war' fails with error 'jvxml.xml.lib doesn't denote a zipfileset or a fileset'
Reading Multi-Level XML files in Java
Intellij highlight file in Project Explorer
Java springboot Validation is not working for me
How to start multiple instances of the same Java project in Eclipse?
Unable to resolve name [org.hibernate.dialect.MYSQL5Dialect] as strategy [org.hibernate.dialect.Dialect]
Java JSCrollPane won't resize below minimum size of JButton with text
How to generate classes using maven jaxb implements serializable
String matches method - forward slash not working
How to remove large if-else-if chain with the condition as String::startswith
Save state of object in IntelliJ debug?
When should I use the dollar symbol ($) in a variable name?
Spring-Data-Jpa Repository - Underscore on Entity Column Name