I am writing a Java program to insert two CSV files into a single document consisting of a subdocument but I do not know how to do it. I'll explain:
I have a SNP file containing the fields rsid
, chr
, has_sig
and a LOCUS file containing the fields rsid
, mrna_acc
, gene
, class
, sap_id
where in the LOCUS file, for each rsid
can correspond more mrna_acc
and therefore I will have more rows with same rsid
.
I would like a Mongo document this:
{ _id: ObjectId("7264958211f41a0c647c47b1"),
rsid: rs530,
chr: 21,
has_sig: false,
locus: [
{ mrna_acc: NM_00125,
gene: ETS2,
class: utr_variant
},
{ mrna_acc: NM_00126,
gene: ETS2,
class: utr_variant
},
... ]
}
I tried to read the two CSV files with buffereader and insert them in the document like this:
Document d = new Document();
Document d1 = new Document();
FileSnp fs = new FileSnp("/Users/valentinafratini/Documents/Progetto Tesi/FactoryMethodDb/snp.csv");
fs.readFile();
long startTime = System.currentTimeMillis();
while (fs.line!=null) {
fs.line = fs.reader.readLine();
if (fs.line!=null && fs.line.length()>0) {
fs.obj = fs.line.split("\\s+");
fs.readSingleObj();
d.append("rsid", fs.rsid);
d.append("chr", fs.chr);
d.append("has_sig", fs.has_sig);
}
}
FileLocus fl = new FileLocus("/Users/valentinafratini/Documents/Progetto Tesi/FactoryMethodDb/locus.csv");
fl.readFile();
while (fl.line!=null) {
fl.line = fl.reader.readLine();
if (fl.line!=null && fl.line.length()>0) {
fl.obj = fl.line.split("\\s+");
fl.readSingleObj();
d1.append("mrna_acc", fl.mrna_acc);
d1.append("gene", fl.gene);
d1.append("class", fl.classe);
}
}
d.put("locus", d1);
list.add(d);
coll.insertMany(list);
But the result is the insertion of a single line with all the fields of both the snp file and the locus file.
Can you help me? I really do not know how to do it. Thank you very much.
In your target document structure the locus
attribute contains an array of sub documents ...
locus: [
{ mrna_acc: NM_00125,
gene: ETS2,
class: utr_variant
},
{ mrna_acc: NM_00126,
gene: ETS2,
class: utr_variant
}
]
This suggests that the FileLocus
reader should produce a Document
instance for each line in the locus.csv
and that each of these documents should be added to a collection in the outer document: d
which is created by the FileSnp
reader.
If so, then you should replace the FileLocus
block with the following:
// this will contain the collection of documents, one for each line in `locus.csv`
List<Document> locusDocuments = new ArrayList<>();
FileLocus fl = new FileLocus("/Users/valentinafratini/Documents/Progetto Tesi/FactoryMethodDb/locus.csv");
fl.readFile();
while (fl.line!=null) {
fl.line = fl.reader.readLine();
if (fl.line!=null && fl.line.length()>0) {
fl.obj = fl.line.split("\\s+");
fl.readSingleObj();
// create and populate a sub document for the current line
Document locusDocument = new Document();
locusDocument.append("mrna_acc", fl.mrna_acc);
locusDocument.append("gene", fl.gene);
locusDocument.append("class", fl.classe);
// assign the current sub document to the collection of locus documents
locusDocuments.add(locusDocument);
}
}
// add the collection of locus documents to the outer document
d.append("locus", locusDocuments);