Search code examples
javafilehttpinputstreamdocx

Read .docx content from web server url - java


I have WEBDAV server where documents are stored. They are available by url e.q https://my-url.net/document.docx. Now I'd like to get some document and read his content. What i have:

public void getDocumentContent() throws ExternalIntegrationException {
        var client = getHttpClient();
       
            var download = new HttpGet(doc);
            try {
                InputStream input = client.execute(download).getEntity().getContent();
                String str = IOUtils.toString(input, StandardCharsets.UTF_8);
                System.out.println(str);
            } catch(IOException e) {
                throw new ExternalIntegrationException("Failure download file from " + webDavPath  + ". " +
                        "Details:" + e.getMessage(), e);
            }
        
    }

private HttpClient getHttpClient() {
        var credentialsProvider = new BasicCredentialsProvider();
        var credentials = new UsernamePasswordCredentials(userName, password);
        credentialsProvider.setCredentials(AuthScope.ANY, credentials);

        return HttpClientBuilder.create()
                .setDefaultCredentialsProvider(credentialsProvider)
                .build();
    }

my System.out.printl (for tests) get this in the console:

X�K����nDUA*�)Y����ă�ښl  1i�J�/z,'��nV���K~ϲ��)a���m ����j0�Hu�T�9bx�<�9X�
�Q���
�Iʊ~���8��W�Z�"V0}����������>����uQwHo��   �� PK    ! ���   N  _rels/.rels �(�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ���JA���a�}7�
ig�@��X6_�]7~
f��ˉ�ao�.b*lI�r�j)�,l0�%��b�
6�i���D�_���,   �   ���|u�Z^t٢yǯ;!Y,}{�C��/h>  �� PK    ! �d�Q�   1  word/_rels/document.xml.rels �(�                                                                                                                                                                                                                                                                 ���j�0���{-;���ȹ�@��� �����$���~�
�U�>�0̀�"S�+a_݃(���vuݕ���c���T�/<�!s��Xd3�� �����?'g![�?��4���%�9���R�k6��$C�,�`&g�!/=�  �� PK    ! �^��  "     word/document.xml�W]o�0}����y� ��"B���=T+�&�k�����wV���*�D�����s�mfW?
��k���0"�T3�6  yX��$p�*F�V��=8r5�n���Ns  ��\\���{��K� �j
��[��S���|��,�)Ԧ�m�<5�*bhA �ܖנ�ע��mR�$���ٷ3m�1KwX)�w�2cu
�/����k�ga���Իۺ�⪽cgh���� 2_-�WA���`ô�x=�L�7��6�J�� ^ɶ�u:O'�cJ���2O�f:[Z���`�!�=��L,�!w��/�;��-���ٰK���<j�,��r>������/V<�B�~T�q�A����:������ZU��O7ܥx������Ͽ^h�b�^h��`���N�d�U�:��������s�r�Y��1��~��]㓿UϽ��]<��woO �F�ڟ
R�T����ߊ�9��q�Z

How can I get .docx file from URL without downloading and read document content and save it as a string or maybe List if there were more documents ??


Solution

  • Why is it not working for you?

    Since docx is a plain text xml based format contains binary blobs in it- you can't simply print the document as a string.

    Solution:

    I recommend saving the file locally, and opening it as FileInputStream. Just delete the file at the end.

    If you can't save the file locally, you can convert the InputStream to FileInputStream.

    Once you have the variable "input" as FileInputStream - you can use the following code:

    import java.io.File;
    import java.io.FileInputStream;
    import java.util.List;
    import org.apache.poi.xwpf.usermodel.XWPFDocument;
    import org.apache.poi.xwpf.usermodel.XWPFParagraph;
        public void readDocxFile(FileInputStream input) {
                try {
    
                    XWPFDocument document = new XWPFDocument(input);
    
                    List<XWPFParagraph> paragraphs = document.getParagraphs();
    
    
                    for (XWPFParagraph para : paragraphs) {
                        System.out.println(para.getText());
                    }
                    input.close();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }