Search code examples
httpcurlwgetlynx

wget without HTML tags


Is there a way to get body of an html page, without the html tags?

curl and wget return the response, but contain HTML tags. We can strip the tags using sed and awk, but I am looking for an existing tool which could do it without sed and awk.

lynx is an option, but it does not come pre-installed.

Thanks !!


Solution

  • Why the aversion to installing an appropriate tool?

    As an alternative to lynx, try w3m, e.g.

    w3m -dump http://google.com