We have a cms system that write html content blocks into sql server database. I know the table name and field name where these html content blocks reside. Some html contains links () to pdf files. Here is a fragment:
<p>A deferred tuition payment plan,
or view the <a href="/uploadedFiles/Tuition-Reimbursement-Deferred.pdf"
target="_blank">list</a>.</p>
I need to extract pdf file names from all such html content blocks. At the end I need to get a list:
Tuition-Reimbursement-Deferred.pdf
Some-other-file.pdf
of all pdf file names from that field.
Any help is appreciated. Thanks.
UPDATE
I have received many replies, thank you so much, but I forgot to mention that we are still using SQL Server 2000 here. So, this had to be done using SQL 2000 SQL.
Well it's not pretty but this works using standard Transact-SQL:
SELECT CASE WHEN CHARINDEX('.pdf', html) > 0
THEN SUBSTRING(
html,
CHARINDEX('.pdf', html) -
PATINDEX(
'%["/]%',
REVERSE(SUBSTRING(html, 0, CHARINDEX('.pdf', html)))) + 1,
PATINDEX(
'%["/]%',
REVERSE(SUBSTRING(html, 0, CHARINDEX('.pdf', html)))) + 3)
ELSE NULL
END AS filename
FROM mytable
Could expand the list of delimiting characters before the filename from ["/]
(which matches either a quotation mark or slash) if you like.
See SQL Fiddle demo