Search code examples
vb.netunicode-string

Last character of a Tamil unicode string


How to get the last character of a unicode tamil string. for example i am having a list of strings like "சுதீப்", "செய்தியை", "கொள்ளாதது", "வில்லன்"

if i use mystring.Last() for the above strings i am getting

"சுதீப்" = ""்"" "செய்தியை" = "ை "கொள்ளாதது" = ""ு"" "வில்லன்" = ""்""

but i need to get

"சுதீப்" = ""ப்"" "செய்தியை" = ""யை"" "கொள்ளாதது" = ""து"" "வில்லன்" = ""ன்""


Solution

  • I suggest you create a helper function where you loop through each char and examine the UnicodeCategory.

    Extension

    <System.Runtime.CompilerServices.Extension()> _
    Public Module StringExtensions
    
        <System.Runtime.CompilerServices.Extension()> _
        Public Function Split(str As String, category As UnicodeCategory) As IList(Of String)
            Dim list As New List(Of String)
            If ((Not str Is Nothing) AndAlso (str.Length > 0)) Then
                Dim item As String = Nothing
                Dim chr As Char = Nothing
                For Each chr In str
                    If (Char.GetUnicodeCategory(chr) = category) Then
                        If ((Not item Is Nothing) AndAlso (item.Length > 0)) Then
                            list.Add(item)
                        End If
                        item = chr
                    Else
                        item += chr
                    End If
                Next
                If ((Not item Is Nothing) AndAlso (item.Length > 0)) Then
                    list.Add(item)
                End If
            End If
            Return list
        End Function
    
    End Module
    

    Usage

    Imports [your_namespace].StringExtensions
    
    Dim values As String() = {"சுதீப்", "செய்தியை", "கொள்ளாதது", "வில்லன்"}
    Dim builder As New System.Text.StringBuilder()
    
    For Each item As String In values
        builder.AppendLine(String.Concat(item, " : ", item.Split(UnicodeCategory.OtherLetter).Last()))
    Next
    
    MessageBox.Show(builder.ToString())
    

    Output

    சுதீப் : ப்
    செய்தியை : யை
    கொள்ளாதது : து
    வில்லன் : ன்