Search code examples
sitecoresitecore7

Sitecore 7 pdf indexing


I try to index PDF files with Sitecore 7. I installed IFilter , but I received on crawlers log next error :

ManagedPoolThread #17 09:24:20 WARN  LuceneIndexOperations : Update : Could not build document data 4433434-3443-3223-91c4-233232. Skipping.
Exception: System.Runtime.InteropServices.COMException
Message: Error HRESULT E_FAIL has been returned from a call to a COM component.
Source: mscorlib
   at System.Runtime.InteropServices.ComTypes.IPersistFile.Load(String pszFileName, Int32 dwMode)
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
   at Sitecore.ContentSearch.Extracters.IFilterTextExtraction.FilterReader..ctor(String fileName)
   at Sitecore.ContentSearch.ComputedFields.MediaItemIFilterTextExtractor.ComputeFieldValue(IIndexable indexable)
   at Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor.ComputeFieldValue(IIndexable indexable)
   at Sitecore.ContentSearch.LuceneProvider.LuceneDocumentBuilder.AddComputedIndexFields()
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.GetIndexData(IIndexable indexable, IIndexable latestVersion, IProviderUpdateContext context)
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.BuildDataToIndex(IProviderUpdateContext context, IIndexable version, IIndexable latestVersion)
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndexOperations.<>c__DisplayClass7.<Update>b__0(Item version)

What I have to do work because on Sitecore documentation they said it must work out of the box.


Solution

  • I had the same issue and I received from Sitecore support next response (it works fine after):

    1) Copy all the Adobe iFilter .dll files into the "\System32\Inetsrv" folder. This is the working directory for IIS on Windows Server. The Adobe iFilter .dll files are stored at the "C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin" folder by default. Also you can use the "IFilter Explorer" tool to detect the folder where the .dll files are stored: http://www.citeknet.com/Products/IFilters/IFilterExplorer/tabid/62/Default.aspx For more details please see the screenshot: http://screencast.com/t/xmWukanM+

    2) Delete all the files under the "Website/App_Data/MediaCache" folder;

    3) Rebuild the Sitecore Search Indexes (Sitecore -> Control Panel -> Indexing -> Indexing Manager);

    4) Clear the Sitecore cache (the http://{hostname}/sitecore/admin/cache.aspx tool); 5) Restart the IIS;