Search code examples
c#linqjson.net

Unexpected behavior when comparing string-object in C# LINQ


I've been experiencing some peculiar behavior in my C# code that I'm struggling to understand. I'm hoping someone might be able to shed some light on it.

I've created a simple class and list of objects from that class where a string-object comparison is being performed. However, the output seems to suggest that a string-object comparison behaves differently depending on whether the object was directly assigned, cloned, or deserialized from a JSON string.

Here is a simplified example:

using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;

namespace TestJsonDeser
{
    internal class Program
    {
        public class Test
        {
            public object Name { get; set; }
            public object Value { get; set; }
        }
        static void Main(string[] args)
        {
            Test test0 = new Test()
            {
                Name = "test",
                Value = 1
            };

            var ts = JsonConvert.SerializeObject(test0);
            var test1 = JsonConvert.DeserializeObject<Test>(ts);

            var test2 = Clone(test0);
            var test3 = Clone(test1);
            var test4 = CreateTest();
            var test5 = Clone(test4);

            List<Test> list = new List<Test>() { test0, test1, test2, test3, test4, test5 };
            var r3 = list.Where(a=>a.Name=="test").ToList();
            Console.WriteLine($"Count:{r3.Count}");
            foreach (var r in r3) Console.WriteLine($"Idx:{list.IndexOf(r)}");
            Console.ReadKey();
        }

        public static T Clone<T>(T source)
        where T : class, new()
        {
            if (source == null)
                return null;
            var tp = typeof(T);
            var ret = Activator.CreateInstance(typeof(T));
            foreach (var prop in tp.GetProperties(BindingFlags.Public | BindingFlags.Instance).Where(a => a.CanWrite && a.CanRead))
            {
                var value = prop.GetValue(source, null);
                if (value is string str)
                    prop.GetSetMethod().Invoke(ret, new object[] { str.Clone() });
                if (value is long ln)
                    prop.GetSetMethod().Invoke(ret, new object[] { ln });
            }
            return (T)ret;
        }
        static Test CreateTest()
        {
            var test = Activator.CreateInstance(typeof(Test));
            typeof(Test).GetProperty("Name").SetValue(test, "test", null);
            typeof(Test).GetProperty("Value").SetValue(test, 1, null);
            return (Test)test;
        }
    }
}

The output is:

Count:4
Idx:0
Idx:2
Idx:4
Idx:5

I would expect that all the Test objects would have the Name property equal to "test", but it seems that only the ones that were directly assigned, cloned from the directly assigned, or created with Activator.CreateInstance are considered as having Name equal to "test".

The Test objects that were deserialized from a JSON string or cloned from the deserialized do not match, even though printing the Name property clearly shows the value is "test".

Could anyone help explain why this is happening? Is this a bug in LINQ or Newtonsoft's JSON deserialization, or is it some language feature that I'm not aware of?

Any help would be appreciated.


Solution

  • The compiler automatically creates one string for all duplicate constant strings, so every reference to "test" is to the same object: in the construction of test0, in the CreateTest method and in the reference based Where test. So that explains why test0 and test4 match.

    When you call your Clone method to clone the objects, you special handle string by calling String.Clone(), which just returns the same string reference. So that explains why your clones of test0 and test4, test2 and test5 match.

    Fundamentally, the JSON deserializer copies the string from the JSON input string character by character into a char[] and then converts that to a string when the value is requested using new string(). Basically, it is something like if you replaced in your Clone method for strings, new string(str.ToArray()).

    Essentially that creates a new instance of string that happens to have the same value.

    So that explains why test1 doesn't match. And test3 doesn't match because again, your Clone method doesn't create a new string instance, just returns the same one.

    It is very important to note that conceptually String in .Net are immutable (can't be changed) and unique. In the current implementation of .Net String, deduplication is not done to prevent creating a instances with the same value, but that may not always be true. You should never use reference equality for String. See this Microsoft article for more on the complications of String comparisons.

    I think it is an unfortunate wart in C#'s design that it works pretty hard to make String seem like it is a value type, and then falls down in the implementation of == not being the same as String.Equals. The fact is, String is a class type, and behaves in the case of comparisons and equality just like any other user-defined class.