I have hyperlink in Word's ContentControl
like below
http://www.yahoo.com
and I'm storing it's value as below to use it later on
var encoded = Convert.ToBase64String(Encoding.UTF8.GetBytes(cc.Range.WordOpenXML));
when I decode it again as below and getting its text content as,
var decoded = Encoding.UTF8.GetString(Convert.FromBase64String(encoded));
XDocument doc = XDocument.Parse(decoded);
string ccText = doc.Descendants(XName.Get("document", "http://schemas.openxmlformats.org/wordprocessingml/2006/main")).FirstOrDefault().Value;
by this I'm getting HYPERLINK "http://www.yahoo.com/" \o "Follow link"
instead of http://www.yahoo.com
, expecting http://www.yahoo.com
as result.
same is the case for email where getting HYPERLINK "mailto:abc@xyz.com" abc@xyz.com
instead of abc@xyz.com
If I'm using cc.Range.WordOpenXML
in above method to get text content, instead of decoded one, then I'm getting proper value as http://www.yahoo.com
When I compared decoded XML with prior to encoded one, It seems like Hyperlink node of XML is getting modified, I think this is the root cause for this issue.
Original XML before encoding: retrieved from doc.Descendants(XName.Get("document", "http://schemas.openxmlformats.org/wordprocessingml/2006/main"))
<w:hyperlink r:id="rId4" w:tooltip="Follow link" w:history="1">
<w:r w:rsidRPr="00E862A6">
<w:rPr>
<w:rStyle w:val="Hyperlink" />
</w:rPr>
<w:t>http://www.yahoo.com</w:t>
</w:r>
</w:hyperlink>
changed XML structure after decoding:
<w:ins w:id="5" w:author="xxxxxx xxxxxx" w:date="2021-03-30T16:42:00Z">
<w:r>
<w:instrText xml:space="preserve"> HYPERLINK "http://www.yahoo.com/" \o "Follow link" </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate" />
</w:r>
<w:r w:rsidRPr="00E862A6">
<w:rPr>
<w:rStyle w:val="Hyperlink" />
</w:rPr>
<w:t>http://www.yahoo.com</w:t>
</w:r>
<w:r>
<w:rPr>
<w:rStyle w:val="Hyperlink" />
</w:rPr>
<w:fldChar w:fldCharType="end" />
</w:r>
</w:ins>
Any way to get plain hyperlink text instead of its syntax value, from Word's ContentControl
's Range
stored like above use case? not sure if I'm doing something wrong here.
I didn't get any solution for this root cause, So until I get way to retrieve valid required text from range without HYPERLINK syntax,
Not a best way or perfect solution, but as workaround for now I removed HYPERLINK \"
and \\o \"Follow link\"
from string to get only http://www.yahoo.com/
after finding its position in string.
Looking forward to actual solution.