Search code examples
regexbashunixsednon-greedy

sed regex match first occurrence


I have the following String:

<div class="downloadlist" id="Mactopia_Office2011"><p><a depEvents="DynamicDownloadsLinkClick[url|downloads?pid=Mactopia_Office2011&amp;fid=78B06C3D-0158-4344-8A8B-5FB822CD44D8#viewer|prodID|Mactopia_Office2011]" id="78B06C3D-0158-4344-8A8B-5FB822CD44D8" class="download_link" href="&#xD;&#xA;                          ?pid=Mactopia_Office2011&amp;fid=78B06C3D-0158-4344-8A8B-5FB822CD44D8#viewer&#xD;&#xA;                        ">Microsoft Office für Mac 2011 14.4.1-Update <span class="link_arrow">&gt;</span></a></p><p><a depEvents="DynamicDownloadsLinkClick[url|downloads?pid=Mactopia_Office2011&amp;fid=F7B8C82F-71FF-4675-8924-DAB652BA6603#viewer|prodID|Mactopia_Office2011]" id="F7B8C82F-71FF-4675-8924-DAB652BA6603" class="download_link" href="&#xD;&#xA;                          ?pid=Mactopia_Office2011&amp;fid=F7B8C82F-71FF-4675-8924-DAB652BA6603#viewer&#xD;&#xA;                        ">Microsoft Office für Mac 2011 14.3.9-Update <span class="link_arrow">&gt;</span></a></p><p><a depEvents="DynamicDownloadsLinkClick[url|downloads?pid=Mactopia_Office2011&amp;fid=3BEDF6DC-1464-4D17-A5BB-C90F8FEF567C#viewer|prodID|Mactopia_Office2011]" id="3BEDF6DC-1464-4D17-A5BB-C90F8FEF567C" class="download_link" href="&#xD;&#xA;                          ?pid=Mactopia_Office2011&amp;fid=3BEDF6DC-1464-4D17-A5BB-C90F8FEF567C#viewer&#xD;&#xA;                        ">Microsoft Office für Mac 2011 14.3.8-Update <span class="link_arrow">&gt;</span></a></p><p><a depEvents="DynamicDownloadsLinkClick[url|downloads?pid=Mactopia_Office2011&amp;fid=3445FBDC-E092-4530-BF31-D60CECD53AB8#viewer|prodID|Mactopia_Office2011]" id="3445FBDC-E092-4530-BF31-D60CECD53AB8" class="download_link" href="&#xD;&#xA;                          ?pid=Mactopia_Office2011&amp;fid=3445FBDC-E092-4530-BF31-D60CECD53AB8#viewer&#xD;&#xA;                        ">Microsoft Office für Mac 2011 14.3.7-Update <span class="link_arrow">&gt;</span></a></p><p><a depEvents="DynamicDownloadsLinkClick[url|downloads?pid=Mactopia_Office2011&amp;fid=EF1E612F-D8E3-4628-9FE4-AD136F0DEBD3#viewer|prodID|Mactopia_Office2011]" id="EF1E612F-D8E3-4628-9FE4-AD136F0DEBD3" class="download_link" href="&#xD;&#xA;                          ?pid=Mactopia_Office2011&amp;fid=EF1E612F-D8E3-4628-9FE4-AD136F0DEBD3#viewer&#xD;&#xA;                        ">

I'm trying to match this part: "Microsoft Office für Mac 2011 14.4.1-Update" using the following sed command: s/^.*Microsoft Office f.r Mac 2011 \([^ ]*\)-Update.*$/\1/ Unfortunately the output is 14.3.7 (so the last occurrence) how can I make it stop after the first occurance, considering that using *? for the non-greedy matching didn't help?


Solution

  • You can use two susbstitution commands, the first one to remove all characters after the first occurance, and the second one to remove all the leading characters:

    sed 's/\(Microsoft Office f.r Mac 2011 \([^ ]*\)-Update\).*$/\1/; s/^.*>//' infile
    

    It yields:

    Microsoft Office für Mac 2011 14.4.1-Update