I have the following method to clean up strings:
public static String UseStringBuilderWithHashSet(string strIn)
{
var hashSet = new HashSet<char>("?&^$#@!()+-,:;<>’\'-_*");
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(strIn.Length);
foreach (char x in strIn.Where(c => !hashSet.Contains(c)))
{
sb.Append(x);
}
return sb.ToString();
}
However, strings such as [MV] REOL ちるちる ChiruChiru
or [MV] REOL ヒビカセ Hibikase
do not get cleaned up.
How can I modify my method so it can turn one of the above strings into for example:
[MV] REOL ChiruChiru
You're trying to solve this exhaustively by filtering out everything you don't want. This is not optimal as their are 100,000+ possible characters.
You may find better results if you only accept what you do want.
public static string CleanInput(string input)
{
//a-zA-Z allows any English alphabet character upper or lower case
//\[ and \] allows []
//\s allows whitespace
var regex = new Regex(@"[a-zA-Z\[\]\s]");
var stringBuilder = new StringBuilder(input.Length);
foreach(char c in input){
if(regex.IsMatch(c.ToString())){
stringBuilder.Append(c);
}
}
string output = stringBuilder.ToString();
//\s+ will match on any duplicate spaces and replace it with
//a single space.
return Regex.Replace(output , @"\s+", " ");
}