I'm trying to reimplement an algorithm to create a refine keywords list. I don't have the original source code, only the tool .exe file, so I only have the input and the expected output.
The problem here is that the output of my function doesn't match with the output of the original one. Here's the code that I'm using:
string[] inputLines = File.ReadAllLines("Input.txt");
Dictionary<string, int> keywordsCount = new Dictionary<string, int>();
List<string> refineList = new List<string>();
//Get Keywords Count
foreach (string fileName in inputLines)
{
string[] fileNameSplitted = fileName.Split('_');
for (int i = 0; i < fileNameSplitted.Length; i++)
{
string currentKeyWord = fileNameSplitted[i];
if (!string.Equals(currentKeyWord, "SFX", StringComparison.OrdinalIgnoreCase))
{
if (keywordsCount.ContainsKey(fileNameSplitted[i]))
{
keywordsCount[fileNameSplitted[i]] += 1;
}
else
{
keywordsCount.Add(fileNameSplitted[i], 1);
}
}
}
}
//Get final keywords
foreach (KeyValuePair<string, int> keyword in keywordsCount)
{
if (keyword.Value > 2 && keyword.Key.Length > 2)
{
refineList.Add(keyword.Key);
}
}
The input file:
SFX_AMB_BIRDSONG
SFX_AMB_BIRDSONG_MISC
SFX_AMB_BIRDSONG_SEAGULL
SFX_AMB_BIRDSONG_SEAGULL_BUSY
SFX_AMB_BIRDSONG_VULTURE
SFX_AMB_CAVES_DRIP
SFX_AMB_CAVES_DRIP_AUTO
SFX_AMB_CAVES_LOOP
SFX_AMB_DESERT_CICADAS
SFX_AMB_EARTHQUAKE
SFX_AMB_EARTHQUAKE_SHORT
SFX_AMB_EARTHQUAKE_STREAMED
SFX_AMB_FIRE_BURNING
SFX_AMB_FIRE_CAMP_FIRE
SFX_AMB_FIRE_JET
SFX_AMB_FIRE_LAVA
SFX_AMB_FIRE_LAVA_DEEP
SFX_AMB_FIRE_LAVA_JET1
SFX_AMB_FIRE_LAVA_JET2
SFX_AMB_FIRE_LAVA_JET3
SFX_AMB_FIRE_LAVA_JET_STOP
SFX_AMB_UNDW_BUBBLE_RELEASE
SFX_AMB_UNDW_BUBBLE_RELEASE_AUTO
SFX_AMB_WATER_BEACH1
SFX_AMB_WATER_BEACH2
SFX_AMB_WATER_BEACH3
SFX_AMB_WATER_CANALS
SFX_AMB_WATER_FALL_HUGE
SFX_AMB_WATER_FALL_NORMAL
SFX_AMB_WATER_FALL_NORMAL2
SFX_AMB_WATER_FALL_NORMAL3
SFX_AMB_WATER_FOUNTAIN
SFX_CS_LUX_PORTAL_LIGHTNING
SFX_CS_LUX_PORTAL_LIGHTNING1
SFX_CS_LUX_PORTAL_LIGHTNING2
SFX_CS_LUX_PRIEST_COWER
SFX_CS_LUX_PRIEST_MEDAL
SFX_CS_LUX_PRIEST_MEDITATE
SFX_CS_LUX_PRIEST_SCREAM
SFX_CS_LUX_PRIEST_SNIFF1
SFX_CS_LUX_PRIEST_SNIFF2
SFX_CS_LUX_PRIEST_SPIRITS
SFX_CS_LUX_PRIEST_SPIRITS2
SFX_CS_LUX_PRIEST_SPIRITS3
SFX_CS_LUX_PRIEST_SURPRISE
SFX_MON_BM05_TOO_WALK1
SFX_MON_BM05_TOO_WALK2
SFX_MON_BM06_SQU_WALK1
SFX_MON_BM06_SQU_WALK2
SFX_MON_BR06_HAL_ATTACK1
SFX_MON_BR06_HAL_ATTACK2
SFX_MON_BR06_HAL_DIE
SFX_MON_BR06_HAL_HIT
SFX_MON_BR06_HAL_IDLE
SFX_MON_BR06_HAL_IDLE_EATING
SFX_MON_BR06_HAL_LAND1
SFX_MON_BR06_HAL_LAND2
SFX_MON_BR06_HAL_SCRAPE
SFX_MON_BR06_HAL_SLAM
SFX_MON_BR06_HAL_SURPRISE
SFX_MON_BR06_HAL_WALK1
SFX_MON_BR06_HAL_WALK2
SFX_MON_BU01_MUM_ATTACK1
SFX_MON_BU01_MUM_ATTACK2
SFX_MON_BU01_MUM_DIE
SFX_MON_BU01_MUM_HIT
SFX_MON_BU01_MUM_IDLE_RETRIEVE
SFX_MON_BU01_MUM_IDLE_RETRIEVE_GROW
SFX_MON_BU01_MUM_SURPRISE
SFX_MON_BU01_MUM_WALK1
SFX_MON_BU01_MUM_WALK2
SFX_WATER_SPLASH_BIG
SFX_WATER_SPLASH_BIG1
SFX_WATER_SPLASH_BIG2
SFX_WATER_SPLASH_BIG3
SFX_WATER_SPLASH_MED1
SFX_WATER_SPLASH_MED2
SFX_WATER_SPLASH_MED3
SFX_WATER_SPLASH_MEDIUM
SFX_WATER_SPLASH_OUT
SFX_WATER_SPLASH_OUT1
SFX_WATER_SPLASH_OUT2
SFX_WATER_SPLASH_SMALL
And the expected output (from the original tool):
AMB
MON
WATER
LUX
BR06
HAL
SPLASH
PRIEST
FIRE
BU01
MUM
LAVA
BIRDSONG
WALK1
WALK2
JET
IDLE
EARTHQUAKE
FALL
SURPRISE
BIG
CAVES
What should I modify to make that my method matches with the original output?
Thanks in advance!
-------EDIT I've done some new discoveries:
->It is a method of approximately 100-130 lines.
->Use the Visual Basic methods InStr, Len, Right and Left
->Discards the word "SFX", and all words less than 3 characters long.
->It uses a combobox as a temporary list where it puts all the words that appear more than once, and from here it takes out some words, which are the ones that are shown in the combobox visible to the user.
->For the first test case, that I've published, this is the discarded words list:
UNDW
BM05
BM06
SEAGULL
DRIP
BUBBLE
PORTAL
TOO
SQU
OUT
AUTO
RELEASE
NORMAL
LIGHTNING
SPIRITS
ATTACK1
ATTACK2
DIE
HIT
RETRIEVE
I could finally get it!!
I could finally figure it out, I had to use OllyDbg,Numega SmartCheck, and VB Decompiler tools, a lot of patience and voilà.
Here is the code, I've done it in VB.Net due to similarity with VB6:
'Clear comboboxes
Combo2.Items.Clear()
Combo3.Items.Clear()
'Start refining
Dim listboxItemsCount As Integer = listbox_SfxItems.Items.Count - 1
'Split only six words
For numberOfIterations As Integer = 0 To 5
'Iterate listbox items
For sfxItemIndex As Integer = 0 To listboxItemsCount
'Iterate listbox items to find matches
For sfxItemIndexSub As Integer = 0 To listboxItemsCount
'Skip the line that we are checking in the previus loop
If sfxItemIndex = sfxItemIndexSub Then
Continue For
End If
'Get item from listbox
Dim currentSfx As String = listbox_SfxItems.Items(sfxItemIndex)
Dim wordToCheck As String = currentSfx
'Split words
If numberOfIterations > 0 Then
For wordIndex = 1 To numberOfIterations
If InStr(1, wordToCheck, "_", CompareMethod.Binary) Then
Dim wordLength As Integer = Len(wordToCheck) - InStr(1, wordToCheck, "_", CompareMethod.Binary)
wordToCheck = Microsoft.VisualBasic.Right(wordToCheck, wordLength)
End If
Next
End If
If InStr(1, wordToCheck, "_", CompareMethod.Binary) Then
Dim wordLength As Integer = InStr(1, wordToCheck, "_", CompareMethod.Binary) - 1
wordToCheck = Microsoft.VisualBasic.Left(wordToCheck, wordLength)
End If
'Find matches
If StrComp("SFX", wordToCheck) <> 0 Then
If Len(wordToCheck) > 2 Then
currentSfx = listbox_SfxItems.Items(sfxItemIndexSub)
If InStr(1, currentSfx, wordToCheck, CompareMethod.Binary) Then
'Get combo items count
Dim addNewItem As Boolean = True
For comboboxIndex As Integer = 0 To Combo2.Items.Count - 1
Dim comboWordItem As String = CType(Combo2.Items(comboboxIndex), ComboItemData).Name
'Check for duplicated
If InStr(1, comboWordItem, wordToCheck, CompareMethod.Binary) = 0 Then
Continue For
End If
'Update combo item with the word appearances count
currentSfx = CType(Combo2.Items(comboboxIndex), ComboItemData).Name
If StrComp(currentSfx, wordToCheck) = 0 Then
'Get current item data
Dim currentItemData As Integer = CType(Combo2.Items(comboboxIndex), ComboItemData).ItemData
'Update value
currentItemData += 1
CType(Combo2.Items(comboboxIndex), ComboItemData).ItemData = currentItemData
End If
'Don't add items in the combobox and quit loop
addNewItem = False
Exit For
Next
'Check if we have to add the new item
If addNewItem Then
Combo2.Items.Add(New ComboItemData(wordToCheck, 0))
End If
End If
End If
End If
Next
Next
Next
'Check final words
Combo3.Items.Add("All")
Combo3.Items.Add("HighLighted")
Dim quitLoop As Boolean = False
Do
If Combo2.Items.Count > 0 Then
Dim itemToRemove As Integer = -1
'Get max value from the remaining words
Dim maxWordAppearances As Integer = 0
For itemIndex As Integer = 0 To Combo2.Items.Count - 1
Dim itemData As Integer = CType(Combo2.Items(itemIndex), ComboItemData).ItemData
maxWordAppearances = Math.Max(maxWordAppearances, itemData)
Next
'Get the item with the max value
For index As Integer = 0 To Combo2.Items.Count - 1
Dim itemData As Integer = CType(Combo2.Items(index), ComboItemData).ItemData
If itemData = maxWordAppearances And itemToRemove = -1 Then
itemToRemove = index
End If
Next
'Remove and add items
Dim itemStringName As String = CType(Combo2.Items(itemToRemove), ComboItemData).Name
Combo3.Items.Add(itemStringName)
Combo2.Items.RemoveAt(itemToRemove)
'Check if we have to skip this loop
If maxWordAppearances <= 5 Then
quitLoop = True
End If
End If
Loop While quitLoop <> True
'Select the first item
Combo3.SelectedIndex = 0
Not sure if it could be optimized, but works as the original one, and outputs the same words with the same order.
If you want to test it, requires the following controls: two comboboxes, Combo2 is the temporal one, and Combo3 the one that the user views. It also requires a listbox with the items to check.
The comboItemData class has been extracted from this site: https://www.elguille.info/colabora/puntonet/alvaritus_itemdataennet.htm
I've renamed Cls_lista to ComboItemData