I am working on a program to list out the elements inside a webpage and their corresponding relative xpaths. Using Java and JSoup, I want to extract relative Xpaths created dynamically for all the elements inside any given webPage. A complete and small working utility will definitely help me here.
I want something like:
//*[@id="menu-item-13686"]/a
Sample output:
Element Or Node or component Name: xxxx AND Xpath = //*[@id="menu-item-13686"]/a
Thank you
I think you can start with this.
Issue got fixed in version jOOX - 1.6.1 Compiled with Java 10
The below code snippet selects all the elements and for each element prints out node name, tag name and CSS selector and Xpath that will uniquely select this element.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.joox.selector.CSS2XPath;
public class TestParser {
public static void main(String[] args) {
try {
Document doc = Jsoup.connect("https://theuserisdrunk.com/").get();
Elements elements = doc.select("*");
for (Element element : elements) {
String path = CSS2XPath.css2xpath(element.cssSelector(), true);
System.out.println("Node name : " + element.nodeName());
System.out.println(" Tag : " + element.tagName());
System.out.println(" CSS : " + element.cssSelector());
System.out.println(" XPath : " + path + "\n");
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Sample Output:
Node name : div
Tag : div
CSS : #mc-embedded-subscribe-form > div.clear:nth-child(4)
XPath : //*[@id='mc-embedded-subscribe-form']/div[@class='clear' or starts-with(@class, 'clear ') or ' clear' = substring(@class, string-length(@class) - string-length(' clear') + 1) or contains(@class, ' clear ')][count(preceding-sibling::*) = 4 - 1]
Node name : input
Tag : input
CSS : #mc-embedded-subscribe
XPath : //*[@id='mc-embedded-subscribe']
Node name : p
Tag : p
CSS : #mc_embed_signup > p.intern:nth-child(2)
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 2 - 1]
Node name : a
Tag : a
CSS : #mc_embed_signup > p.intern:nth-child(2) > a
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 2 - 1]/a
Node name : p
Tag : p
CSS : #mc_embed_signup > p.intern:nth-child(3)
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 3 - 1]
Node name : i
Tag : i
CSS : #mc_embed_signup > p.intern:nth-child(3) > i
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 3 - 1]/i
Node name : p
Tag : p
CSS : #mc_embed_signup > p.intern:nth-child(4)
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 4 - 1]
Node name : a
Tag : a
CSS : #mc_embed_signup > p.intern:nth-child(4) > a:nth-child(1)
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 4 - 1]/a[count(preceding-sibling::*) = 1 - 1]
Node name : a
Tag : a
CSS : #mc_embed_signup > p.intern:nth-child(4) > a:nth-child(2)
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 4 - 1]/a[count(preceding-sibling::*) = 2 - 1]
Node name : i
Tag : i
CSS : #mc_embed_signup > p.intern:nth-child(4) > i
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 4 - 1]/i
Node name : a
Tag : a
CSS : #mc_embed_signup > p.intern:nth-child(4) > a:nth-child(4)
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 4 - 1]/a[count(preceding-sibling::*) = 4 - 1]
Node name : a
Tag : a
CSS : #mc_embed_signup > p.intern:nth-child(4) > a:nth-child(5)
XPath : //*[@id='mc_embed_signup']/p[@class='intern' or starts-with(@class, 'intern ') or ' intern' = substring(@class, string-length(@class) - string-length(' intern') + 1) or contains(@class, ' intern ')][count(preceding-sibling::*) = 4 - 1]/a[count(preceding-sibling::*) = 5 - 1]
Node name : script
Tag : script
CSS : html > body > script:nth-child(3)
XPath : //html/body/script[count(preceding-sibling::*) = 3 - 1]
Node name : script
Tag : script
CSS : html > body > script:nth-child(4)
XPath : //html/body/script[count(preceding-sibling::*) = 4 - 1]