I have the following function defined via PROC FCMP
. The point of the code should be pretty obvious and relatively straightforward. I'm returning the value of an attribute from a line of XHTML. Here's the code:
proc fcmp outlib=library.funcs.crawl;
function getAttr(htmline $, Attribute $) $;
/*-- Find the position of the match --*/
Pos = index( htmline , strip( Attribute )||"=" );
/*-- Now do something about it --*/
if pos > 0 then do;
Value = scan( substr( htmline, Pos + length( Attribute ) + 2), 1, '"');
end;
else Value = "";
return( Value);
endsub;
run;
No matter what I do with length or attrib
statement to try to explicitly declare the data type returned, it ALWAYS returns only a max of 33 bytes of the requested string, regardless of how long the actual return value is. This happens no matter which attribute I am searching for. The same code (hard-coded) into a data step returns the correct results so this is related to PROC FCMP
.
Here is the datastep I'm using to test it (where PageSource.html is any html file that has xhtml compliant attributes -- fully quoted):
data TEST;
length href $200;
infile "F:\PageSource.html";
input;
htmline = _INFILE_;
href = getAttr( htmline, "href");
x = length(href);
run;
UPDATE: This seems to work properly after upgrading to SAS9.2 - Release 2
I ended up backing out of using FCMP defined data step functions. I don't think they're ready for primetime. Not only could I not solve the 33 byte return issue, but it started regularly crashing SAS.
So back to the good old (decades old) technology of macros. This works:
/*********************************/
/*= Macro to extract Attribute =*/
/*= from XHTML string =*/
/*********************************/
%macro getAttr( htmline, Attribute, NewVar );
if index( &htmline , strip( &Attribute )||"=" ) > 0 then do;
&NewVar = scan( substr( &htmline, index( &htmline , strip( &Attribute )||"=" ) + length( &Attribute ) + 2), 1, '"' );
end;
%mend;