I'm relatively new to C# and I'm trying to get my head around a problem that I believe should be pretty simple in concept, but I just cant get it.
I am currently, trying to display a message to the console when the program is run from the command line with two arguments, if a sequence ID does not exist inside a text file full of sequence ID's and DNA sequences against a query text file full of Sequence ID's. For example args[0] is a text file that contains 41534 lines of sequences which means I cannot load the entire file into memory.:
NR_118889.1 Amycolatopsis azurea strain NRRL 11412 16S ribosomal RNA, partial sequence GGTCTNATACCGGATATAACAACTCATGGCATGGTTGGTAGTGGAAAGCTCCGGCGT
NR_118899.1 Actinomyces bovis strain DSM 43014 16S ribosomal RNA, partial sequence GGGTGAGTAACACGTGAGTAACCTGCCCCNNACTTCTGGATAACCGCTTGAAAGGGTNGCTAATACGGGATATTTTGGCCTGCT
NR_074334.1 Archaeoglobus fulgidus DSM 4304 16S ribosomal RNA, complete sequence >NR_118873.1 Archaeoglobus fulgidus DSM 4304 strain VC-16 16S ribosomal RNA, complete sequence >NR_119237.1 Archaeoglobus fulgidus DSM 4304 strain VC-16 16S ribosomal RNA, complete sequence
ATTCTGGTTGATCCTGCCAGAGGCCGCTGCTATCCGGCTGGGACTAAGCCATGCGAGTCAAGGGGCTT
args[1] is a query text file with some sequence ID's:
NR_118889.1
NR_999999.1
NR_118899.1
NR_888888.1
So when the program is run, all I want are the sequence ID's that were not found in args[0] from args[1] to be displayed.
NR_999999.1 could not be found
NR_888888.1 could not be found
I know this probably super simple, and I have spent far too long on trying to figure this out by myself to the point where I want to ask for help.
Thank you in advance for any assistance.
var saved_ids = new List<String>();
foreach (String args1line in File.ReadLines(args[1]))
{
foreach (String args2line in File.ReadLines(args[2]))
{
if (args1line.Contains(args2line))
{
saved_ids.Add(args2line);
}
}
}
using (System.IO.StreamReader sr1 = new System.IO.StreamReader(args[1]))
{
using (System.IO.StreamReader sr2 = new System.IO.StreamReader(args[2]))
{
string line1, line2;
while ((line1 = sr1.ReadLine()) != null)
{
while ((line2 = sr2.ReadLine()) != null)
{
if (line1.Contains(line2))
{
saved_ids.Add(line2);
break;
}
if (!line1.StartsWith(">"))
{
break;
}
if (saved_ids.Contains(line1))
{
break;
}
if (saved_ids.Contains(line2))
{
break;
}
if (!line1.Contains(line2))
{
saved_ids.Add(line2);
WriteLine("The sequence ID {0} does not exist", line2);
}
}
if (line2 == null)
{
sr2.DiscardBufferedData();
sr2.BaseStream.Seek(0, System.IO.SeekOrigin.Begin);
continue;
}
}
}
}