Friends,
I'm new to Lucene...
I successfully created an index, added fields, I could search etc it works.
Now, I've in my database a view that tell which users can see which document. This view is created using several complicated rules so I want to reuse the view. So I need to add a filter in Lucene search to remove documents that match the query but the users doesn't have access to.
What I tried to do now is:
- Store the db document id in a field. It's a Guid, I store it as a string.
- create a custom filter that fetch all document id the current user can access, then filter using the field in lucene
I've the feeling that it'll not be efficient... User can have access to hundred of thousands documents, so I may retrieve 200 000 document Id I need to filter on.
I suppose I've to cache some stuff...
Here is the code I've writen, but it doesn't work: no document are returned when the filter is used (it should return 3 docs)
public class LuceneAuthorisationFilter : Filter
{
public override DocIdSet GetDocIdSet(Lucene.Net.Index.IndexReader reader)
{
List<Guid> ids = this.load(); // Load list of ID from database
OpenBitSet result = new OpenBitSet(reader.MaxDoc);
int[] docs = new int[1];
int[] freq = new int[1];
for (int i = 0; i < ids.Count; i++)
{
Lucene.Net.Index.TermDocs termDocs = reader.TermDocs(new Lucene.Net.Index.Term("EmId", ids.ElementAt(i).ToString()));
int count = termDocs.Read(docs, freq);
if (count == 1)
{
result.FastSet(docs[0]);
}
}
return result;
}
}
Do you have any idea on what's wrong ? And how to increase perf ?
Thank you
EDIT:
The code above works, the problem was only that the EmId field was not indexed. Now I've changed and it works.
Now I would like to have any tip to improve performances
2ND EDIT TO ADD FEEDBACK
Note: The test environment contains 25 000 documents, and the list of document access contains 50 000 id (because all documents are not yet
indexed)
These are poor performances ... So I've searched again an found 'FieldCacheTermsFilter' filter.
This is acceptable performance
PS: I also found another similar question
Talking about performances is always tricky when no numbers/measurements are given.
That being said, what have you benched in terms of performances? What are your bottlenecks (IO/CPU/etc) and have you compared it against other methods?
Do you actually need to improve performance? Discussions about perfomance improvements are not about "feelings", they are around hard facts based on evidence and a need to improve.
Now for your Filter
, unless theres something I didnt get from the question, I dont see why you cannot use what is already build into Lucene to do the hard work.
Here is how I usually handle permission stuff in Lucene, it always worked well with indexes containing billions of documents. I usually use LRU type caches with a minimum age for an items to be purged off the cache.
IE: cache 100 items, but cache more if the least recently used is not more than 15 minutes old.
If you try something like this, it could be interesting if you compare it to your method and come back to post some performance numbers.
Disclaimer: code written directly in the textarea of SO, take it more as pseudo-code than an already working copy paste solution:
// todo: probably some thread safety
public class AccessFilterFactory
{
private static AccessFilterFactory _instance = new AccessFilterFactory();;
private AccessFilterFactory()
{
}
public AccessFilterFactory Instance
{
get
{
return _instance;
}
}
private Cache<int, Filter> someKindaCache = new Cache<int, Filter> ();
// gets a cached filter if already built, if not it creates one
// caches it and returns it
public Filter GetFilterForUser(int userId)
{
// return from cache if you got it
if(someKindaCache.Exists(userId))
return someKindaCache.Get(userId);
// if not, build and cache it
BooleanQuery filterQuery = new BooleanQuery();
foreach(string id in ids)
{
filterQuery.Add(new TermQuery(new Term("EmId", id)), BooleanClause.Occur.SHOULD);
}
Filter cachingFilter = new CachingWrapperFilter(new QueryWrapperFilter(filterQuery));
someKindaCache.Put(userId, cachingFilter);
return cachingFilter;
}
// removes a new invalid filter from cache (permissions changed)
public void InvalidateFilter(int userId)
{
someKindaCache.Remove(userId);
}
}