Search code examples
httpprologgzipswi-prologlogtalk

swi prolog 8.0.2 : gziped http


I tried to make work a piece of code that opens an http connection. Nevertheless, web page may transfered as plain text or gziped. As a result, the code with pragmatism tries to open as plain text and if it fails and receives an exception, tries as if it is gzip encoded.

URL is the sole variable to ground. Try with URL = 'http://releases.llvm.org/6.0.0/tools/clang/docs/ClangCommandLineReference.html' for instance.

                user::catch(
                (
                 user::http_open(URL, DataStream, []),                            
                 user::load_html(stream(DataStream), Terms, []),
                 user::close(DataStream)
                ),
                _
                ,
                (
                 user::open_any(URL, read, GZipDataStream, CloseIt, [encoding(gzip), string(atom)]),
                 /*user::http:encoding_filter(gzip, DataStream, GZipDataStream),*/
                 user::load_html(stream(GZipDataStream), Terms, []),
                 user::close_any(CloseIt)
                )
                )

Infortunately, the recovery part of catch doesn't work.

Any suggestion, please ?


Solution

  • The user:: prefixes in the goals suggests that the code you posted is a fragment of Logtalk. If so, it's misusing Logtalk source code and creating a dependency on the SWI-Prolog autoloading mechanism. The code can be rewritten for clarity and resilience. Doing that and fixing the bug in it (library(zlib) must be loaded to make avaialble the http:encoding_filter/3 filter) results in the following solution:

    :- use_module(library(http/http_open), []).
    :- use_module(library(sgml), []).
    :- use_module(library(iostream), []).
    :- use_module(library(zlib), []).
    
    
    :- object(html).
    
        :- public(get_url/2).
    
        % override ambiguous meta-predicate template
        :- meta_predicate(sgml:load_html(*,*,*)).
    
        get_url(URL, Terms) :-
            catch(
                    setup_call_cleanup(
                        http:http_open(URL, DataStream, []),
                        sgml:load_html(stream(DataStream), Terms, []),
                        close(DataStream)
                    ),
                    _,
                    setup_call_cleanup(
                        iostream:open_any(URL, read, DataStream, CloseIt, [string(atom)]),
                        sgml:load_html(stream(DataStream), Terms, []),
                        iostream:close_any(CloseIt)
                    )
                ).
    
    :- end_object.
    

    The setup_call_cleanup/3 calls ensure that the opened streams are closed in case of error.

    Assuming the object above is saved in a html.lgt file, the following sample call shows it working for the URL you posted:

    ?- {html}.
    ...
    % (0 warnings)
    true.
    
    ?- html::get_url('http://releases.llvm.org/6.0.0/tools/clang/docs/ClangCommandLineReference.html', Terms).
    Terms = [element(html, [xmlns='http://www.w3.org/1999/xhtml'], [element(head, [], [element(meta, ['http-equiv'='Content-Type', content='text/html; charset=utf-8'], []), element(title, [], ['Clang command line argument reference — Clang 6 documentation']), element(link, [... = ...|...], []), element(link, [...|...], []), element(..., ..., ...)|...]), element(body, [role=document], ['      ', element(div, [... = ...|...], [element(..., ..., ...)|...]), '\n      ', element(..., ..., ...)|...])])].