I have to search contents within file . that is uploaded in db like image(bmp,tiff,png) or pdf etc.
I am using latest release Mongodb for storing images(png,bmp,jpg) or documents using GridFS. that is storing data in binary . MongoDB uses two ways to store documents one of them binary and other one is json .
so Mongodb does not provide the way to search contents in image directly . other is that for me i can use OCR but OCR provides end result in string so i have to convert that to valid json to store in db. if it is last option for me then how will i convert that string to valid json format .
I am trying to store text file in mongodb with following code .
// result5.txt is a text file that is result of OCR.
string text = System.IO.File.ReadAllText("E:\\result5.txt");
var document = BsonSerializer.Deserialize<BsonDocument>(text);
var collection = Database.GetCollection("articles");
collection.Insert(text);
but i am getting an error .
MongoCommandException: Command insert failed: Wrong type for documents[0]. Expected a object, got a string.
how can i search within image file that i have uploaded in db .??
so any suggestion will be appreciated ,please admin don't turn off comment for this post thanks .
Just create new class to contain OCR results:
public class OcrContainer
{
public BsonObjectId Id { get; set; }
public string OcrResult { get; set;}
}
and than store it to mongo:
var collection = Database.GetCollection<OcrContainer >("articles");
collection.InsertOne(new OcrContainer {OcrResult = text});
after that you could search your results:
collection.Find(x=>x.OcrResult.Contains("bla"))
But: What are you going to do with it? You will need more properties in OcrCollection to connect with ocr results with your other data.