Search code examples
c#multithreadingdesign-patternsooad

Can I run objects in memory or do I need to look at Serializeing to DB


I am working on a multithread app that has about 4 basic Entities at its core

e.g

 public class Album
{
    public ICPN ICPN { get; set; }
    public string Title { get; set; }
    public string Label { get; set; }
    public string PLine { get; set; }
    public string CLine { get; set; }


    public string Genre { get; set; }
    public string SubGenre { get; set; }

    public string Artist { get; set; }
    public int NumTracks { get; set; }
    public int NumVolumes { get; set; }

    public IList<ITerms> Terms { get; set; }

}

I have a 4 step process where I am using the Producer/Consumer Pattern with BlockingCollections to manage these process's. It is a production line and I modify state on a object once complete, make a copy of the stats & enque onto the next process queue where it waits for the next process/task to be preformed.

I am at that critical juncture of architecture design of whethere I need to look at serializing some of the BlockingCollections to DB or can I run with about 10 million objects of the type above in the Pipe on a Highend server.

I have a queue dedicated to stats on each process I.e Timetaken/Success for the UI Or should I look at storeing this info with the objects ?

Speed/Effeciency is crtical in this process.

Are there any ways to calculate Memory requirements or is this a case of suck it & see

Update At this stage I dont need to persist the data for crashes etc as the metadata objects are modified and writen out to disk...i.e unprocessed folder/processed folder


Solution

  • You don't really describe the form factor of your application, so I'll start by saying that it depends.

    If there is ever a need for your application to restart or if there is a remote chance that your application could have bugs in it, then I suggest persisting your live data to some form of data store (database, file, etc) while it is in flight. Of course this only matters if you never want to lose data.

    EDIT: Considering your edit, I don't think it is really a necessity unless you want to save off results of any previous processing of the data.