Search code examples
c#stringalgorithmsubstring

Reimplement an algorithm to create a refine list


I'm trying to reimplement an algorithm to create a refine keywords list. I don't have the original source code, only the tool .exe file, so I only have the input and the expected output.

The problem here is that the output of my function doesn't match with the output of the original one. Here's the code that I'm using:

string[] inputLines = File.ReadAllLines("Input.txt");
Dictionary<string, int> keywordsCount = new Dictionary<string, int>();
List<string> refineList = new List<string>();

//Get Keywords Count
foreach (string fileName in inputLines)
{
    string[] fileNameSplitted = fileName.Split('_');
    for (int i = 0; i < fileNameSplitted.Length; i++)
    {
        string currentKeyWord = fileNameSplitted[i];
        if (!string.Equals(currentKeyWord, "SFX", StringComparison.OrdinalIgnoreCase))
        {
            if (keywordsCount.ContainsKey(fileNameSplitted[i]))
            {
                keywordsCount[fileNameSplitted[i]] += 1;
            }
            else
            {
                keywordsCount.Add(fileNameSplitted[i], 1);
            }
        }
    }
}

//Get final keywords
foreach (KeyValuePair<string, int> keyword in keywordsCount)
{
    if (keyword.Value > 2 && keyword.Key.Length > 2)
    {
        refineList.Add(keyword.Key);
    }
}

The input file:

SFX_AMB_BIRDSONG
SFX_AMB_BIRDSONG_MISC
SFX_AMB_BIRDSONG_SEAGULL
SFX_AMB_BIRDSONG_SEAGULL_BUSY
SFX_AMB_BIRDSONG_VULTURE
SFX_AMB_CAVES_DRIP
SFX_AMB_CAVES_DRIP_AUTO
SFX_AMB_CAVES_LOOP
SFX_AMB_DESERT_CICADAS
SFX_AMB_EARTHQUAKE
SFX_AMB_EARTHQUAKE_SHORT
SFX_AMB_EARTHQUAKE_STREAMED
SFX_AMB_FIRE_BURNING
SFX_AMB_FIRE_CAMP_FIRE
SFX_AMB_FIRE_JET
SFX_AMB_FIRE_LAVA
SFX_AMB_FIRE_LAVA_DEEP
SFX_AMB_FIRE_LAVA_JET1
SFX_AMB_FIRE_LAVA_JET2
SFX_AMB_FIRE_LAVA_JET3
SFX_AMB_FIRE_LAVA_JET_STOP
SFX_AMB_UNDW_BUBBLE_RELEASE
SFX_AMB_UNDW_BUBBLE_RELEASE_AUTO
SFX_AMB_WATER_BEACH1
SFX_AMB_WATER_BEACH2
SFX_AMB_WATER_BEACH3
SFX_AMB_WATER_CANALS
SFX_AMB_WATER_FALL_HUGE
SFX_AMB_WATER_FALL_NORMAL
SFX_AMB_WATER_FALL_NORMAL2
SFX_AMB_WATER_FALL_NORMAL3
SFX_AMB_WATER_FOUNTAIN
SFX_CS_LUX_PORTAL_LIGHTNING
SFX_CS_LUX_PORTAL_LIGHTNING1
SFX_CS_LUX_PORTAL_LIGHTNING2
SFX_CS_LUX_PRIEST_COWER
SFX_CS_LUX_PRIEST_MEDAL
SFX_CS_LUX_PRIEST_MEDITATE
SFX_CS_LUX_PRIEST_SCREAM
SFX_CS_LUX_PRIEST_SNIFF1
SFX_CS_LUX_PRIEST_SNIFF2
SFX_CS_LUX_PRIEST_SPIRITS
SFX_CS_LUX_PRIEST_SPIRITS2
SFX_CS_LUX_PRIEST_SPIRITS3
SFX_CS_LUX_PRIEST_SURPRISE
SFX_MON_BM05_TOO_WALK1
SFX_MON_BM05_TOO_WALK2
SFX_MON_BM06_SQU_WALK1
SFX_MON_BM06_SQU_WALK2
SFX_MON_BR06_HAL_ATTACK1
SFX_MON_BR06_HAL_ATTACK2
SFX_MON_BR06_HAL_DIE
SFX_MON_BR06_HAL_HIT
SFX_MON_BR06_HAL_IDLE
SFX_MON_BR06_HAL_IDLE_EATING
SFX_MON_BR06_HAL_LAND1
SFX_MON_BR06_HAL_LAND2
SFX_MON_BR06_HAL_SCRAPE
SFX_MON_BR06_HAL_SLAM
SFX_MON_BR06_HAL_SURPRISE
SFX_MON_BR06_HAL_WALK1
SFX_MON_BR06_HAL_WALK2
SFX_MON_BU01_MUM_ATTACK1
SFX_MON_BU01_MUM_ATTACK2
SFX_MON_BU01_MUM_DIE
SFX_MON_BU01_MUM_HIT
SFX_MON_BU01_MUM_IDLE_RETRIEVE
SFX_MON_BU01_MUM_IDLE_RETRIEVE_GROW
SFX_MON_BU01_MUM_SURPRISE
SFX_MON_BU01_MUM_WALK1
SFX_MON_BU01_MUM_WALK2
SFX_WATER_SPLASH_BIG
SFX_WATER_SPLASH_BIG1
SFX_WATER_SPLASH_BIG2
SFX_WATER_SPLASH_BIG3
SFX_WATER_SPLASH_MED1
SFX_WATER_SPLASH_MED2
SFX_WATER_SPLASH_MED3
SFX_WATER_SPLASH_MEDIUM
SFX_WATER_SPLASH_OUT
SFX_WATER_SPLASH_OUT1
SFX_WATER_SPLASH_OUT2
SFX_WATER_SPLASH_SMALL

And the expected output (from the original tool):

AMB
MON
WATER
LUX
BR06
HAL
SPLASH
PRIEST
FIRE
BU01
MUM
LAVA
BIRDSONG
WALK1
WALK2
JET
IDLE
EARTHQUAKE
FALL
SURPRISE
BIG
CAVES

What should I modify to make that my method matches with the original output?

Thanks in advance!

-------EDIT I've done some new discoveries:

->It is a method of approximately 100-130 lines.

->Use the Visual Basic methods InStr, Len, Right and Left

->Discards the word "SFX", and all words less than 3 characters long.

->It uses a combobox as a temporary list where it puts all the words that appear more than once, and from here it takes out some words, which are the ones that are shown in the combobox visible to the user.

->For the first test case, that I've published, this is the discarded words list:

UNDW
BM05
BM06
SEAGULL
DRIP
BUBBLE
PORTAL
TOO
SQU
OUT
AUTO
RELEASE
NORMAL
LIGHTNING
SPIRITS
ATTACK1
ATTACK2
DIE
HIT
RETRIEVE

Solution

  • I could finally get it!!

    I could finally figure it out, I had to use OllyDbg,Numega SmartCheck, and VB Decompiler tools, a lot of patience and voilà.

    Here is the code, I've done it in VB.Net due to similarity with VB6:

    'Clear comboboxes
    Combo2.Items.Clear()
    Combo3.Items.Clear()
    
    'Start refining
    Dim listboxItemsCount As Integer = listbox_SfxItems.Items.Count - 1
    'Split only six words
    For numberOfIterations As Integer = 0 To 5
        'Iterate listbox items
        For sfxItemIndex As Integer = 0 To listboxItemsCount
            'Iterate listbox items to find matches
            For sfxItemIndexSub As Integer = 0 To listboxItemsCount
                'Skip the line that we are checking in the previus loop
                If sfxItemIndex = sfxItemIndexSub Then
                    Continue For
                End If
                'Get item from listbox
                Dim currentSfx As String = listbox_SfxItems.Items(sfxItemIndex)
                Dim wordToCheck As String = currentSfx
                'Split words
                If numberOfIterations > 0 Then
                    For wordIndex = 1 To numberOfIterations
                        If InStr(1, wordToCheck, "_", CompareMethod.Binary) Then
                            Dim wordLength As Integer = Len(wordToCheck) - InStr(1, wordToCheck, "_", CompareMethod.Binary)
                            wordToCheck = Microsoft.VisualBasic.Right(wordToCheck, wordLength)
                        End If
                    Next
                End If
                If InStr(1, wordToCheck, "_", CompareMethod.Binary) Then
                    Dim wordLength As Integer = InStr(1, wordToCheck, "_", CompareMethod.Binary) - 1
                    wordToCheck = Microsoft.VisualBasic.Left(wordToCheck, wordLength)
                End If
                'Find matches
                If StrComp("SFX", wordToCheck) <> 0 Then
                    If Len(wordToCheck) > 2 Then
                        currentSfx = listbox_SfxItems.Items(sfxItemIndexSub)
                        If InStr(1, currentSfx, wordToCheck, CompareMethod.Binary) Then
                            'Get combo items count
                            Dim addNewItem As Boolean = True
                            For comboboxIndex As Integer = 0 To Combo2.Items.Count - 1
                                Dim comboWordItem As String = CType(Combo2.Items(comboboxIndex), ComboItemData).Name
                                'Check for duplicated
                                If InStr(1, comboWordItem, wordToCheck, CompareMethod.Binary) = 0 Then
                                    Continue For
                                End If
                                'Update combo item with the word appearances count
                                currentSfx = CType(Combo2.Items(comboboxIndex), ComboItemData).Name
                                If StrComp(currentSfx, wordToCheck) = 0 Then
                                    'Get current item data
                                    Dim currentItemData As Integer = CType(Combo2.Items(comboboxIndex), ComboItemData).ItemData
                                    'Update value
                                    currentItemData += 1
                                    CType(Combo2.Items(comboboxIndex), ComboItemData).ItemData = currentItemData
                                End If
                                'Don't add items in the combobox and quit loop
                                addNewItem = False
                                Exit For
                            Next
                            'Check if we have to add the new item
                            If addNewItem Then
                                Combo2.Items.Add(New ComboItemData(wordToCheck, 0))
                            End If
                        End If
                    End If
                End If
            Next
        Next
    Next
    
    'Check final words
    Combo3.Items.Add("All")
    Combo3.Items.Add("HighLighted")
    
    Dim quitLoop As Boolean = False
    Do
        If Combo2.Items.Count > 0 Then
            Dim itemToRemove As Integer = -1
            'Get max value from the remaining words
            Dim maxWordAppearances As Integer = 0
            For itemIndex As Integer = 0 To Combo2.Items.Count - 1
                Dim itemData As Integer = CType(Combo2.Items(itemIndex), ComboItemData).ItemData
                maxWordAppearances = Math.Max(maxWordAppearances, itemData)
            Next
            'Get the item with the max value
            For index As Integer = 0 To Combo2.Items.Count - 1
                Dim itemData As Integer = CType(Combo2.Items(index), ComboItemData).ItemData
                If itemData = maxWordAppearances And itemToRemove = -1 Then
                    itemToRemove = index
                End If
            Next
            'Remove and add items
            Dim itemStringName As String = CType(Combo2.Items(itemToRemove), ComboItemData).Name
            Combo3.Items.Add(itemStringName)
            Combo2.Items.RemoveAt(itemToRemove)
            'Check if we have to skip this loop
            If maxWordAppearances <= 5 Then
                quitLoop = True
            End If
        End If
    Loop While quitLoop <> True
    'Select the first item
    Combo3.SelectedIndex = 0
    

    Not sure if it could be optimized, but works as the original one, and outputs the same words with the same order.

    If you want to test it, requires the following controls: two comboboxes, Combo2 is the temporal one, and Combo3 the one that the user views. It also requires a listbox with the items to check.

    The comboItemData class has been extracted from this site: https://www.elguille.info/colabora/puntonet/alvaritus_itemdataennet.htm

    I've renamed Cls_lista to ComboItemData