Search code examples

How to use Common Lisp libraries of dex, plump, and clss to extract the title of a web page?

I am using Emacs, Slime, and SBCL to develop Common Lisp in a Desktop PC running NixOS.

In addition, I am using the libraries dex, plump, and clss to extract the title of a webpage. Thus, I did:

CL-USER> (clss:select "title" (plump:parse  (dex:get "")))
#(#<PLUMP-DOM:ELEMENT title {1009C488E3}>)

I was expecting: "Pedro Delfino".

Instead, I got the object:

#(#<PLUMP-DOM:ELEMENT title {1009C488E3}>)

If I describe the object it does not help me finding the value I want:

CL-USER> (clss:select "title" (plump:parse  (dex:get "")))
#(#<PLUMP-DOM:ELEMENT title {100A9888E3}>)
CL-USER> (describe *)
#(#<PLUMP-DOM:ELEMENT title {100A9888E3}>)

Element-type: T
Fill-pointer: 1
Size: 10
Adjustable: yes
Displaced: no
Storage vector: #<(SIMPLE-VECTOR 10) {100A9B65BF}>
; No value

Where is the value that I need?



  • You can ask plump to return the text inside the HTML node with plump:text. It accepts one node, and not an array (returned by clss:select), so you have to use aref to get the first one.

    (plump:text (aref  
       (clss:select "title" (plump:parse  
         (dex:get ""))) 

    plump:serialize would return the HTML content (useful to inspect the results).

    You can also use CLSS and Plump together at the same time by using LQuery. We need to parse the HTML with initialize, then we use $ as in (lquery:$ <document> "selector"). We can add (text) or (serialize) as last arguments.

    (defparameter *PDELFINO-PARSED* (lquery:$ (initialize (dex:get ""))))
    (lquery:$ *PDELFINO-PARSED* "title")
    #(#<PLUMP-DOM:ELEMENT title {1008645923}>)
    CIEL-USER> (lquery:$ *PDELFINO-PARSED* "title" (text))
    #("Pedro Delfino")
    CIEL-USER> (aref * 0)
    "Pedro Delfino"
    CIEL-USER> (lquery:$ *PDELFINO-PARSED* "title" (serialize))
    #("<title>Pedro Delfino</title>")