When I attempt to load 160,000 XML documents into MarkLogic 8.0-2 using mlcp on MacOS 10.10.4, the mlcp-Hadoop2-1.3-1/bin/mlcp.sh: line 16: /usr/bin/java: Argument list too long
error is thrown.
The command I'm issuing:
mlcp import -database FO -username sss4r -password ******* -host localhost -port 8003 -mode local -input_file_pattern '*\.xml' -output_uri_replace "/Users/sss4r/Documents/FOPOC,''" -input_file_path .
I realize this is probably a Unix shell issue, mlcp is using the filesystem facilities for returning the list of names. There is a system-based limit on how many filenames can be processed in a command.
What is the MarkLogician-recommended best-practice for resolving this problem? Attempt to bulk-load in smaller chunks? Try to modify the system's limit?
Thanks.
MLCP does not depend on shell expansion to be able to load the files. I'm afraid the shell expansion is happening inside mlcp.sh, but only unintentionally. If you would drop the input file pattern param, you would probably see it will load all files. A quick fix could be to put the files in a sub-dir, don't use the file pattern, and simply point to the sub-dir as input_file_path.
Rob S. is giving another solution that prevents this. Put your params in a file, each param on a separate line, and point to that with the -options_file
parameter on the command-line. That also saves you from issues with quotes, and other special characters unintentionally getting interpreted by the shell environment.
More details here: https://docs.marklogic.com/guide/ingestion/content-pump#id_36150
HTH!
PS: I have filed a bug to improve MLCP (#33670)