I'm a beginner of XML. I try to extract informations like post-content, post-author and post-date from the thread-sites like this using a XSLT-Stylesheet. I will auto-download multiple HTML-Sites from that forum, convert them to XHTML using Tidy and will then apply a self-written XSLT-stylesheet to the sites. The stylesheet looks as follows
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xhtml="http://www.w3.org/1999/xhtml" version="1.0">
<xsl:output method="xml" indent="yes" encoding="UTF-8" omit-xml-declaration="no" />
<xsl:template match="/">
<!--identifying post-entry-->
<xsl:value-of select="//xhtml:blockquote[@class='postcontent restore']"/>
If I apply it the XHTML-Version of the site mentioned above, just the first post content (from 'Nachdem' till 'hochheilen') is tagged correctly.
Here is a snippet of the XHTML (find 'postcontent restore' at line 326 and 438):
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="de" id=
<meta name="generator" content=
"HTML Tidy for HTML5 for Apple macOS version 5.6.0" />
<base href="http://forum.pcgames.de/" />
<!--[if IE]></base><![endif]-->
<link rel="canonical" href=
"http://forum.pcgames.de/videospiele-allgemein/9326273-erfahrungsaustausch-spoileralarm-hilfe-ich-weiss-nicht-weiter.html" />
<meta http-equiv="Content-Type" content=
"text/html; charset=utf-8" />
<meta id="e_vb_meta_bburl" name="vb_meta_bburl" content=
"http://forum.pcgames.de" />
<meta name="generator" content="vBulletin 4.2.2" />
<meta name="theme-color" content="#333333" />
<meta name="msapplication-navbutton-color" content="#333333" />
<meta name="apple-mobile-web-app-status-bar-style" content=
"#333333" />
<meta http-equiv="X-UA-Compatible" content="IE=9" />
<meta name="viewport" content=
"width=device-width,initial-scale=1.0,maximum-scale=1.0" />
<link rel="Shortcut Icon" href=
"http://forum.pcgames.de/favicon.ico" type="image/x-icon" />
<script type="text/javascript" src=
<title>[Erfahrungsaustausch / Spoileralarm] Hilfe - Ich weiß nicht
<link rel="canonical" href=
"http://forum.pcgames.de/videospiele-allgemein/9326273-erfahrungsaustausch-spoileralarm-hilfe-ich-weiss-nicht-weiter.html" />
<script type="text/javascript" src=
<link rel="stylesheet" type="text/css" href=
"http://forum.pcgames.de/css.php?styleid=11&langid=2&d=1535117522&td=ltr&sheet=toolsmenu.css,postlist.css,showthread.css,postbit.css,options.css,attachment.css,poll.css,lightbox.css" />
<link href=
rel='stylesheet' type='text/css' />
<link rel="stylesheet" type="text/css" href=
"http://forum.pcgames.de/css.php?styleid=11&langid=2&d=1535117522&td=ltr&sheet=additional.css" />
<script type="text/javascript" src=
<div id="content-container">
<div id="main-content" class="clearfix">
<div class="menu">
<div class="wrapper"><a href="http://www.pcgames.de/" class="logo"
alt="PC Games" title="zur Startseite"></a> <a href=
"javascript:void(0)" class="menu_button"></a>
<ul id="navtabs" class="navtabs floatcontainer">
<li class="selected subMenu" id="vbtab_forum"><a class=
"mainMenu navtab dropdown" href=
<ul class="floatcontainer">
<li id="vbflink_newposts" class="subItemA"><a href=
Die letzten 100 Beiträge</a></li>
<li id="vbflink_faq" class="subItemA"><a href=
<li id="vbflink_calendar" class="subItemA"><a href=
<li class="popupmenu subMenu"><a href=
onclick="return false;">Community</a>
<li id="vbclink_groups" class="subItemA"><a href=
<li id="vbclink_albums" class="subItemA"><a href=
"http://forum.pcgames.de/members/albums.html">Bilder &
<li id="vbclink_members" class="subItemA"><a href=
<li class="popupmenu subMenu"><a href=
onclick="return false;">Aktionen</a>
<li id="vbalink_mfr" class="subItemA"><a rel="nofollow" href=
Alle Foren als gelesen markieren</a></li>
<li class="popupmenu subMenu"><a href=
onclick="return false;">Nützliche Links</a>
<li id="vbqlink_posts" class="subItemA"><a href=
der letzten 7 Tage</a></li>
<li id="link_mtg3_542" class="subItemA"><a rel="nofollow" href=
Meine Themen</a></li>
<li id="link_mtg3_639" class="subItemA"><a rel="nofollow" href=
Meine Beiträge</a></li>
<li id="link_mtg3_831" class="subItemA"><a rel="nofollow" href=
Themen mit eigenen Beiträgen</a></li>
<li id="vbqlink_leaders" class="subItemA"><a href=
<li id="vbqlink_online" class="subItemA"><a href=
"http://forum.pcgames.de/online.php">Wer ist online</a></li>
<li id="link_ndgx_744" class="subItemA"><a href="/chat/">Chat
<li class="subMenu" id="vbtab_blog"><a class=
"mainMenu navtab dropdown" href=
<ul class="floatcontainer">
<li id="vbblog_recent" class="subItemA"><a href=
<li id="vbblog_popular" class="subItemA"><a href=
"http://forum.pcgames.de/blogs/best-entries/">Top Einträge</a></li>
<li id="vbblog_member" class="subItemA"><a href=
<li class="subMenu" id="vbtab_whatsnew"><a rel="nofollow" class=
"mainMenu navtab dropdown" href=
Was ist neu?</a>
<ul class="floatcontainer">
<li id="vbnew_activitystream" class="subItemA"><a href=
<li id="vbnew_newposts" class="subItemA"><a rel="nofollow" href=
Neue Beiträge</a></li>
<li id="vbnew_groupm" class="subItemA"><a rel="nofollow" href=
Neue Diskussionen</a></li>
<li id="vbnew_events" class="subItemA"><a rel="nofollow" href=
Neue Termine</a></li>
<li id="vbnew_entries" class="subItemA"><a rel="nofollow" href=
Neue Blog-Einträge</a></li>
<li id="vbnew_mfr" class="subItemA"><a rel="nofollow" href=
Alle Foren als gelesen markieren</a></li>
<li class="subMenu" id="vbtab_activity"><a class="mainMenu navtab"
<a target="_blank" alt="www.gamesworld.de" title=
"zur Gamesworld-Startseite" href="http://www.gamesworld.de" class=
"logo partner" rel="nofollow"></a>
<ul class="usermenu guest">
<li><a class="loginbtn" name="login-dialog" href=
<li><a class="registerbtn" href=
<div class="clear"></div>
<div class="wrapper">
<div class="above_body">
<div id="header" class="floatcontainer doc_header">
<div class="bannerFrame">
<div class="adikett" id="6517819" data-type-id="banner"></div>
<div class="ad_global_header"></div>
<hr /></div>
<div class="body_wrapper loggedout">
<div class="skyFrame">
<div class="adikett" id="6517818" data-type-id="sky"></div>
<div id="breadcrumb" class="breadcrumb">
<ul class="floatcontainer">
<li class="navbithome"><a href="http://forum.pcgames.de/"
<li class="navbit"><a href=
<li class="navbit"><a href=
<li class="navbit"><a href=
<li class="navbit lastnavbit">
<h1><span><a href="javascript:location.reload();" title=
"Seite neu laden">[Erfahrungsaustausch / Spoileralarm] Hilfe - Ich
weiß nicht weiter!</a></span></h1>
<hr /></div>
<div id="above_postlist" class="above_postlist">
<div id="pagination_top" class="pagination_top">
<div id="pagetitle" class="pagetitle">
<div id="vbseo-likes"><span class="vbseo-likes-count" onclick=
"vbseoui.tree_dropdown()"><img src=
class="vbseo-likes-count-image" alt="" />52<em>Gefällt
<div id="liketree_1.9326273" class="vbseo-likes-container">
<ul class="vbseo-likes-tabs">
<li><a href=
onclick="return vbseoui.treetab_click(0)">Top</a></li>
<li><a href=
onclick="return vbseoui.treetab_click(1)">Alle</a></li>
<li><a href=
onclick="return vbseoui.treetab_click(2)">Aktuelle Seite</a></li>
<ul class="vbseo-likes-list"></ul>
<div id="thread_controls" class="thread_controls toolsmenu">
<ul id="postlist_popups" class="postlist_popups popupgroup">
<li class="popupmenu" id="threadtools">
<h6><a class="popupctrl" href=
<ul class="popupbody popuphover">
<li><a href=
accesskey="3" rel="nofollow">Druckbare Version zeigen</a></li>
<li><a href=
rel="nofollow">Thema weiterempfehlen…</a></li>
<li><a href=
rel="nofollow">Thema abonnieren…</a></li>
<li class="popupmenu" id="threadrating">
<h6><a class="popupctrl" href="javascript://">Thema
<div class="popupbody popuphover">
<form action="http://forum.pcgames.de/threadrate.php" method="post"
<input type="hidden" name="s" value="" /> <input type="hidden"
name="securitytoken" value="guest" /> <input type="hidden" name="t"
value="9326273" /> <input type="hidden" name="pp" value="20" />
<input type="hidden" name="page" value="1" /></form>
<div id="postlist" class="postlist restrain">
<ol id="posts" class="posts" start="1">
<li class="postbitlegacy postbitim postcontainer old" id=
<div class="posthead"><span class="postdate old"><span class=
"date">23.10.2013, <span class=
"time">15:06</span></span></span> <span class=
"nodecontrols"><a name="post9651357" href=
class="postcounter">#1</a><a id="postcount9651357" name=
<div class="postdetails">
<div class="userinfo">
<div class="userdetails hasavatar">
<div class="username_container">
<div class="popupmenu memberaction"><a rel="nofollow" class=
"username offline" href=
"http://forum.pcgames.de/members/2905424-monalye.html" title=
"Monalye ist offline"><strong>Monalye</strong></a></div>
<img class="inlineimg onlinestatus" src=
alt="Monalye ist offline" border="0" /></div>
<span class="usertitle">Erfahrener Benutzer</span></div>
<a rel="nofollow" class="postuseravatar" href=
"http://forum.pcgames.de/members/2905424-monalye.html" title=
"Monalye ist offline"><img src=
"http://forum.pcgames.de/customavatars/avatar2905424_9.gif" alt=
"Avatar von Monalye" title="Avatar von Monalye" /></a>
<hr />
<dl class="userinfo_extra">
<div class="post_field">
<dt>Registriert seit</dt>
<div class="post_field">
<div class="post_field">
<div class="imlinks"></div>
<div class="clear"></div>
<div class="postbody">
<div class="postrow has_after_content">
<h2 class="title icon">[Erfahrungsaustausch / Spoileralarm] Hilfe -
Ich weiß nicht weiter!</h2>
<div class="content">
<div id="post_message_9651357">
<blockquote class="postcontent restore">Nachdem es sich nun schon
ein paar mal ergeben hat, das in den verschiedensten Topics um
Walktrough's und Hilfe gebeten wurde (sehr oft von mir <img src=
"http://forum.pcgames.de/images/smilies/default/sm_;-).gif" border=
"0" alt="" title="; )" class="inlineimg" /> ) hab ich nun
beschlossen, den Tipp von LC anzunehmen und einen entsprechenden
Thread zu eröffnen.<br />
<br />
Wann immer man bei einem Spiel nicht mehr weiter kommt, irgendetwas
nicht findet oder Tipps zu schwierigen Erfolgen oder Trophäen
braucht, kann man hier nun um Hilfe bitten.<br />
<br />
Tja um auch gleich den Anfang zu machen ergab sich grade "zufällig"
ein Problem, bei dem ich nicht weiter weiß.<br />
Ich spiele ja gerade Darksiders II, nachdem ich die 3 Lebenssteine
für die goldene Arena gesammelt habe, stehe ich nun vor einem
Bossgegner, nämlich Gnashor. Ich bin nach dieser Komplettlösung
vorgegangen<br />
<a rel="nofollow" href=
target="_blank">Darksiders 2 Komplettlösung - Die goldene Arena
dritter Lebenstein - Bosskampf Arena Champion Gnashor &bull;
Eurogamer.de</a><br />
hab' aber bei meinem Kampf festgestellt, das sich das blöde Biest
wieder selbst hochheilt... und so bekomm' ich den nie tot <img src=
"http://forum.pcgames.de/images/smilies/default/sm_B-(.gif" border=
"0" alt="" title=":(" class="inlineimg" /><br />
Im Grunde dresche ich permanent auf ihn ein, da ich sehr gute
Verteidigungswerte und gute Ausrüstungsgegenstände habe, ertrage
ich das recht gut. Damit konnte ich ihm gleich mal ein Drittel
Leben runterklopfen, doch kaum brauch ich mal 2 - 3 Sekunden, bis
ich wieder an ihm dran bin, heilt er sich in der Zwischenzeit
wieder rauf... und davon steht einfach nirgends was<br />
Egal wie oft ich das Internet befragt habe und Lösungen zu dem Boss
gelesen habe, nirgends steht was, das der sich hoch heilt <img src=
"http://forum.pcgames.de/images/smilies/default/sm_B-(.gif" border=
"0" alt="" title=":(" class="inlineimg" /><br />
Wie habt ihr das gemacht und mache ich irgendwas falsch, das der
sich deshalb hochheilen kann?</blockquote>
<div class="after_content">
<blockquote class="postcontent lastedited">Geändert von Herbboy
(14.11.2013 um <span class="time">00:24</span> Uhr)</blockquote>
<div class="vbseo_buttons" id="lkbtn_1.9326273.9651357">
<div class="vbseo_liked"><a href=
hat "Gefällt mir" geklickt.</div>
<div class="cleardiv"></div>
<div class="postfoot">
<hr />
<li class="postbitlegacy postbitim postcontainer old" id=
<div class="posthead"><span class="postdate old"><span class=
"date">23.10.2013, <span class=
"time">15:37</span></span></span> <span class=
"nodecontrols"><a name="post9651373" href=
class="postcounter">#2</a><a id="postcount9651373" name=
<div class="postdetails">
<div class="userinfo">
<div class="userdetails hasavatar">
<div class="username_container">
<div class="popupmenu memberaction"><a rel="nofollow" class=
"username offline" href=
"http://forum.pcgames.de/members/1145245-hawkins.html" title=
"Hawkins ist offline"><strong>Hawkins</strong></a></div>
<img class="inlineimg onlinestatus" src=
alt="Hawkins ist offline" border="0" /></div>
<span class="usertitle">Erfahrener Benutzer</span></div>
<a rel="nofollow" class="postuseravatar" href=
"http://forum.pcgames.de/members/1145245-hawkins.html" title=
"Hawkins ist offline"><img src=
"http://forum.pcgames.de/customavatars/avatar1145245_1.gif" alt=
"Avatar von Hawkins" title="Avatar von Hawkins" /></a>
<hr />
<dl class="userinfo_extra">
<div class="post_field">
<dt>Registriert seit</dt>
<div class="post_field">
<div class="imlinks"></div>
<div class="clear"></div>
<div class="postbody">
<div class="postrow has_after_content">
<div class="content">
<div id="post_message_9651373">
<blockquote class="postcontent restore">Das Video sollte
helfen:<br />
<br />
<a rel="nofollow" href="http://www.youtube.com/watch?v=tW47BQFzJcw"
target="_blank">Darksiders 2 - Gnashor Boss Fight -
YouTube</a><br />
<br />
<br />
Du musst ihm am Kopf packen, damit wird er auf den Boden geworfen
und die "Wurmphase" startet wieder ohne das er sich
<div class="after_content">
<div class="vbseo_buttons" id="lkbtn_1.9326273.9651373">
<div class="vbseo_liked" style="display:none"></div>
<div class="cleardiv"></div>
<div class="postfoot">
<div class="textcontrols floatcontainer"><span class=
"postcontrols"><img style="display:none" id="progress_9651373" src=
alt="" /> <a id="qrwq_9651373" class="newreply" href=
rel="nofollow" title="Zitieren"><img id="quoteimg_9651373" src=
"http://forum.pcgames.de/clear.gif" alt="Zitieren" />
<hr /></li>
<div class="separator"></div>
<div class="postlistfoot"></div>
<div id="below_postlist" class="noinlinemod below_postlist">
<div id="pagination_bottom" class="pagination_bottom">
<div class="clear"></div>
<div class="clear"></div>
<div id="thread_info" class="thread_info block">
<div id="similar_threads">
<h4 class="threadinfohead blockhead">Ähnliche Themen</h4>
<div id="similar_threads_list" class=
"thread_info_block blockbody formcontrols">
<ol class="similar_threads">
<li class="floatcontainer">
<div class="titleblock">
<div class="starter_forum">Von Graho im Forum PC-Plattform
<div class="dateblock"><span class="shade">Antworten:</span> 2
<div class="starter_forum"><span class="shade">Letzter
Beitrag:</span> 27.07.2006, <span class="time">01:12</span></div>
<li class="floatcontainer">
<div class="titleblock">
<div class="starter_forum">Von Tammy83 im Forum Videospiele
<div class="dateblock"><span class="shade">Antworten:</span> 7
<div class="starter_forum"><span class="shade">Letzter
Beitrag:</span> 13.07.2006, <span class="time">09:59</span></div>
<li class="floatcontainer">
<div class="titleblock">
<div class="starter_forum">Von Killingthefly im Forum PC-Plattform
<div class="dateblock"><span class="shade">Antworten:</span> 5
<div class="starter_forum"><span class="shade">Letzter
Beitrag:</span> 30.10.2004, <span class="time">18:10</span></div>
<div class="options_block_container">
<div class="options_block">
<h4 class="collapse blockhead options_correct"><a class="collapse"
id="collapse_posting_rules" href=
<img src=
alt="" /></a> Berechtigungen</h4>
<div id="posting_rules" class=
"thread_info_block blockbody formcontrols floatcontainer options_correct">
<div id="forumrules" class="info_subblock">
<div class="bbcodeblock">
<p class="rules_link"><a rel="nofollow" href=
"http://forum.pcgames.de/misc.php?do=showrules" target=
<div style="clear: left"></div>
<div id="footer-container">
<div id="footer" class="floatcontainer footer">
<div class="wrapper">
<form action="http://forum.pcgames.de/" method="get" id=
"footer_select" class="footer_select"></form>
<div class="below_body">
<div class="wrapper">
<div class="left">
<div class="right">
<div class="socialicons"></div>
<div class="clear"></div>
The expected output should be:
<content> content1 </content>
<content> content2 </content>
and so on.
How can I modify the stylesheet to apply to multiple post contents?
With your current template, you are matching the /
document node, and outputting a single content
tag for that. However, more importantly, in XSLT 1.0, if you do xsl:value-of
whilst selecting multiple nodes, it will only show the output of the first node in the set.
You are actually closer to the solution in a previous edit of your question. What you need to do is this...
<xsl:template match="/">
<xsl:for-each select="//xhtml:blockquote[@class='postcontent restore']">
<xsl:value-of select="."/>
If you had did <xsl:value-of select="//xhtml:blockquote[@class='postcontent restore']"/>
then this would ignore the current blockquote
you were on, and just get the first blockquote
in the document again. Doing <xsl:value-of select="."/>
does get the value of the current node (selected in the xsl:for-each
) which is what you want.
It might be slightly better to use a template approach though, especially if there are lots more things you want to extract, as it should keep the stylesheet free from too much indentation:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xhtml="http://www.w3.org/1999/xhtml" version="1.0">
<xsl:output method="xml" indent="yes" encoding="UTF-8" omit-xml-declaration="no" />
<xsl:template match="/">
<xsl:apply-templates select="//xhtml:blockquote[@class='postcontent restore']" />
<xsl:template match="xhtml:blockquote">
<xsl:value-of select="."/>