Search code examples
javamavenweb-scrapingjsoup

Maven Package: .txt is not being included in the .jar file


I have a program that scrapes a webpage. I'm using JSoup and Selenium. To configure the user agent in the JSoup request, I have a userAgents.txt file containing a list of user agents. In each execution, I have a method that reads the .txt file, and returns a random user agent.

The program is working as expected when running in IntelliJ.

The problem happens when I try to build the .jar file, with mvn clean package. When running the .jar file, I get a FileNotFoundException, since the program can't find the userAgents.txt file.

If I remove this functionality, and hardcode the user agent, I have no problems.

The file currently is in src/main/resources. When executing the .jar, I get the exception:

java.io.FileNotFoundException: ./src/main/resources/userAgents.txt (No such file or directory)

I tried the maven-resources-plugin to copy the files into the target folder:

<plugin>
    <artifactId>maven-resources-plugin</artifactId>
    <version>3.3.0</version>
    <executions>
        <execution>
            <id>copy-resources</id>
            <phase>package</phase>
            <goals>
                <goal>copy-resources</goal>
            </goals>
            <configuration>
                <outputDirectory>${basedir}/target/extra-resources</outputDirectory>
                <includeEmptyDirs>true</includeEmptyDirs>
                <resources>
                    <resource>
                        <directory>${basedir}/src/main/resources</directory>
                        <filtering>false</filtering>
                    </resource>
                </resources>
            </configuration>
        </execution>
    </executions>
</plugin>

Even changing the path inside the program (to open file from target/extra-resources) the error persists.

I also added this <resources>, and got nothing:

<resources>
    <resource>
        <directory>src/main/resources</directory>
        <includes>
            <include>**/*.txt</include>
            <include>**/*.csv</include>
        </includes>
    </resource>
</resources>

Inside the program, I'm reading the file using:

String filePath = "./src/main/resources/userAgents.txt";
File extUserAgentLst = new File(filePath);
Scanner usrAgentReader = new Scanner(extUserAgentLst);

So, my question is:

  • How to make sure the userAgents.txt file is inside the .jar file, so that when I run it, the program reads from this file and doesn't return any exception?

Solution

  • You can use getResourceAsStream instead, like so:

    import java.io.BufferedReader;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.util.stream.Collectors;
    
    public class MyClass {
    
      public static void main(String[] args) {
        InputStream inStream = MyClass.class.getClassLoader().getResourceAsStream("userAgents.txt");
        if (inStream != null) {
          BufferedReader reader = new BufferedReader(new InputStreamReader(inStream));
          String usersTxt = reader.lines().collect(Collectors.joining());
          System.out.println(usersTxt);
        }
      }
    
    }
    

    It shouldn't be necessary to specify the tag <resources> in the pom.xml file. You just need to place your file inside src/main/resources before running the mvn package command to build the project.