Search code examples
linqdistincttext-filesskip

Skip first column and get distinct from other columns


I need to select distinct rows from Textfile display below.

TextFile

 123| one| two| three  <br/>
124| one| two| four <br/>
 125| one |two| three <br/>

Output should like this

 123| one| two| three  <br/>
124| one| two| four <br/>

OR

124| one| two| four <br/>
125| one |two| three <br/>

I am using this code to work out this problem

var readfile = File.ReadAllLines(" text file location ");
        var spiltfile = (from f in readfile
                    let line = f.Split('|')
                    let y = line.Skip(1)
                    select (from str in y
                            select str).FirstOrDefault()).Distinct()

Thanks


Solution

  • The unclear spacing in the question doesn't help (especially around the |two|, which has different spacing than the rest, implying we need to use trimming), but here's some custom LINQ methods that do the job. I've used the anon-type purely as a simple way of flattening out the inconsistent spacing (I could also have rebuilt a string, but it seemed unnecessary)

    Note that without the odd spacing, this can be simply:

    var qry = ReadLines("foo.txt")
            .DistinctBy(line => line.Substring(line.IndexOf('|')));
    

    Full code:

    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    static class Program
    {
        static void Main()
        {
            var qry = (from line in ReadLines("foo.txt")
                       let parts = line.Split('|')
                       select new
                       {
                           Line = line,
                           Key = new
                           {
                               A = parts[1].Trim(),
                               B = parts[2].Trim(),
                               C = parts[3].Trim()
                           }
                       }).DistinctBy(row => row.Key)
                      .Select(row => row.Line);
    
            foreach (var line in qry)
            {
                Console.WriteLine(line);
            }
        }
        static IEnumerable<TSource> DistinctBy<TSource, TValue>(
            this IEnumerable<TSource> source,
            Func<TSource, TValue> selector)
        {
            var found = new HashSet<TValue>();
            foreach (var item in source)
            {
                if (found.Add(selector(item))) yield return item;
            }
        }
        static IEnumerable<string> ReadLines(string path)
        {
            using (var reader = File.OpenText(path))
            {
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    yield return line;
                }
            }
        }
    }