Search code examples

Append wildcard results to a list in python

Please forgive me I am teaching myself and am pretty new to this. I have searched and found nothing that addressed my issue, lots of things dealing with os and file directories but I couldn't figure out how to implement them here. I also am not super familiar with regex, and have tried implementing that as well but kept getting errors.

So I have a large text file (9GB) that is actually a list of discussion board posts.

I have a list of words I want to add to the stoplist for topic modeling. (I can do that)

However I also want to add any term that ends with any of the words in my list.

A sample of my data and lists is below.

txt = ['satoshiFounderSr MemberOfflineActivity Merit Welcome to the new Bitcoin forumNovember PMMeritedbyVlad Vlad Claymore krogothmanhattan negeroy Referee Vod suchmoon alani Lesbian Cow cryptohunter hv janggernaut matt Jeremycoin MaoChao Kda roslinpl gold MicroGuy elokk notaek BitcoinFX EcuaMobi Lutpin Lincoln Echo Nomad avatar kiyoshi saugwurm BALIK anggriani teeGUMES dooglus bitbollo klarki franckuestein legendster techman Provok mrcash paxmao jeks Cent MrCryptHodl DireWolfM BarbieCasino theunbeatable mindrust fillippone Mister k LFC Bitcoin nutildah Oceat digit Woshib ubay undeadbitcoiner pushups btcrocks realdantreccia Dq Atabey limtjoehua LoyceV anonymousminer MagicByt vizique coinlocket Altcoinsintel baeva OgNasty o solo miner Janation Kalemder sujonali MoparMiningLLC Eddyc jonemil Kryptowolf green slmn TyfrTR cr mprep Searing EFS adaseb notbatman Lucius boltz layer gfx seoincorporation AGD Phinnaeus Gage tabas pawel Lafu pangu Blind Legs Parker itod Potato Chips wonko Arriemoller Coin ruletheworld Halab coupable o e l e o TheBeardedBaby MoxnatyShmel monsanto amishmanish xtraelv Husna QA madnessteat Bthd taikuri dvd rw Toxic styca WorldCoiner bubbalex xyzzy V saya jets crypto trader xzEXrP xlcus solosequenosenada VB MishaSER dragonvslinux Zocadas jahepahit risatrakib chimk Porfirii YuT Coin adrianto famososMuertos angel Financisto RareFortune jakoylantern bere kin mdayonliner sncc squallw cryptjh jazmuzika wishxy markleal BlackHatCoiner an sha ldah DEMENTOR mustangy TaShoKi Adriane Poker Player StackItUp PIOUPIOU loreRex tasadar wego Gustavo Livecoins Palmholder CryptoPravda barjan Crypto Collection collapse jukeee Cuk ng bitc in LBTC Pyrojason M BTC vanobe shortcircuit Toqo Vxv BiT pOL songsunling bitcoinokulu AlexMay Kaonashi Neo Baudrillard RussaX morkaii Welcome to the new Bitcoin forumThe old forum can still be reached here http bitcoinsourceforgenet boards indexphpI ll repost some selected threads here and add updated answers to questions where I canFAQhttp bitcoinsourceforgenet wiki indexphppage FAQDownloadhttp sourceforgenet projects bitcoin files satoshiFounderSr MemberOfflineActivity Merit Welcome to the new Bitcoin forumNovember PMMeritedbyVlad Vlad Claymore krogothmanhattan negeroy Referee Vod suchmoon alani Lesbian Cow cryptohunter hv janggernaut matt Jeremycoin MaoChao Kda roslinpl gold MicroGuy elokk notaek BitcoinFX EcuaMobi Lutpin Lincoln Echo Nomad avatar kiyoshi saugwurm BALIK anggriani teeGUMES dooglus bitbollo klarki franckuestein legendster techman Provok mrcash paxmao jeks Cent MrCryptHodl DireWolfM BarbieCasino theunbeatable mindrust fillippone Mister k LFC Bitcoin nutildah Oceat digit Woshib ubay undeadbitcoiner pushups btcrocks realdantreccia Dq Atabey limtjoehua LoyceV anonymousminer MagicByt vizique coinlocket Altcoinsintel baeva OgNasty o solo miner Janation Kalemder sujonali MoparMiningLLC Eddyc jonemil Kryptowolf green slmn TyfrTR cr mprep Searing EFS adaseb notbatman Lucius boltz layer gfx seoincorporation AGD Phinnaeus Gage tabas pawel Lafu pangu Blind Legs Parker itod Potato Chips wonko Arriemoller Coin ruletheworld Halab coupable o e l e o TheBeardedBaby MoxnatyShmel monsanto amishmanish xtraelv Husna QA madnessteat Bthd taikuri dvd rw Toxic styca WorldCoiner bubbalex xyzzy V saya jets crypto trader xzEXrP xlcus solosequenosenada VB MishaSER dragonvslinux Zocadas jahepahit risatrakib chimk Porfirii YuT Coin adrianto famososMuertos angel Financisto RareFortune jakoylantern bere kin mdayonliner sncc squallw cryptjh jazmuzika wishxy markleal BlackHatCoiner an sha ldah DEMENTOR mustangy TaShoKi Adriane Poker Player StackItUp PIOUPIOU loreRex tasadar wego Gustavo Livecoins Palmholder CryptoPravda barjan Crypto Collection collapse jukeee Cuk ng bitc in LBTC Pyrojason M BTC vanobe shortcircuit Toqo Vxv BiT pOL songsunling bitcoinokulu AlexMay Kaonashi Neo Baudrillard RussaX morkaii Welcome to the new Bitcoin forumThe old forum can still be reached here http bitcoinsourceforgenet boards indexphpI ll repost some selected threads here and add updated answers to questions where I canFAQhttp bitcoinsourceforgenet wiki indexphppage FAQDownloadhttp sourceforgenet projects bitcoin files satoshiFounderSr MemberOfflineActivity Merit Welcome to the new Bitcoin forumNovember PMMeritedbyVlad Vlad Claymore krogothmanhattan negeroy Referee Vod suchmoon alani Lesbian Cow cryptohunter hv janggernaut matt Jeremycoin MaoChao Kda roslinpl gold MicroGuy elokk notaek BitcoinFX EcuaMobi Lutpin Lincoln Echo Nomad avatar kiyoshi saugwurm BALIK anggriani teeGUMES dooglus bitbollo klarki franckuestein legendster techman Provok mrcash paxmao jeks Cent MrCryptHodl DireWolfM BarbieCasino theunbeatable mindrust fillippone Mister k LFC Bitcoin nutildah Oceat digit Woshib ubay undeadbitcoiner pushups btcrocks realdantreccia Dq Atabey limtjoehua LoyceV anonymousminer MagicByt vizique coinlocket Altcoinsintel baeva OgNasty o solo miner Janation Kalemder sujonali MoparMiningLLC Eddyc jonemil Kryptowolf green slmn TyfrTR cr mprep Searing EFS adaseb notbatman Lucius boltz layer gfx seoincorporation AGD Phinnaeus Gage tabas pawel Lafu pangu Blind Legs Parker itod Potato Chips wonko Arriemoller Coin ruletheworld Halab coupable o e l e o TheBeardedBaby MoxnatyShmel monsanto amishmanish xtraelv Husna QA madnessteat Bthd taikuri dvd rw Toxic styca WorldCoiner bubbalex xyzzy V saya jets crypto trader xzEXrP xlcus solosequenosenada VB MishaSER dragonvslinux Zocadas jahepahit risatrakib chimk Porfirii YuT Coin adrianto famososMuertos angel Financisto RareFortune jakoylantern bere kin mdayonliner sncc squallw cryptjh jazmuzika wishxy markleal BlackHatCoiner an sha ldah DEMENTOR mustangy TaShoKi Adriane Poker Player StackItUp PIOUPIOU loreRex tasadar wego Gustavo Livecoins Palmholder CryptoPravda barjan Crypto Collection collapse jukeee Cuk ng bitc in LBTC Pyrojason M BTC vanobe shortcircuit Toqo Vxv BiT pOL songsunling bitcoinokulu AlexMay Kaonashi Neo Baudrillard RussaX morkaii Welcome to the new Bitcoin forumThe old forum can still be reached here http bitcoinsourceforgenet boards indexphpI ll repost some selected threads here and add updated answers to questions where I canFAQhttp bitcoinsourceforgenet wiki indexphppage FAQDownloadhttp sourceforgenet projects bitcoin files Welcome to the new Bitcoin forumNovember PMMeritedbyVlad Vlad Claymore krogothmanhattan negeroy Referee Vod suchmoon alani Lesbian Cow cryptohunter hv janggernaut matt Jeremycoin MaoChao Kda roslinpl gold MicroGuy elokk notaek BitcoinFX EcuaMobi Lutpin Lincoln Echo Nomad avatar kiyoshi saugwurm BALIK anggriani teeGUMES dooglus bitbollo klarki franckuestein legendster techman Provok mrcash paxmao jeks Cent MrCryptHodl DireWolfM BarbieCasino theunbeatable mindrust fillippone Mister k LFC Bitcoin nutildah Oceat digit Woshib ubay undeadbitcoiner pushups btcrocks realdantreccia Dq Atabey limtjoehua LoyceV anonymousminer MagicByt vizique coinlocket Altcoinsintel baeva OgNasty o solo miner Janation Kalemder sujonali MoparMiningLLC Eddyc jonemil Kryptowolf green slmn TyfrTR cr mprep Searing EFS adaseb notbatman Lucius boltz layer gfx seoincorporation AGD Phinnaeus Gage tabas pawel Lafu pangu Blind Legs Parker itod Potato Chips wonko Arriemoller Coin ruletheworld Halab coupable o e l e o TheBeardedBaby MoxnatyShmel monsanto amishmanish xtraelv Husna QA madnessteat Bthd taikuri dvd rw Toxic styca WorldCoiner bubbalex xyzzy V saya jets crypto trader xzEXrP xlcus solosequenosenada VB MishaSER dragonvslinux Zocadas jahepahit risatrakib chimk Porfirii YuT Coin adrianto famososMuertos angel Financisto RareFortune jakoylantern bere kin mdayonliner sncc squallw cryptjh jazmuzika wishxy markleal BlackHatCoiner an sha ldah DEMENTOR mustangy TaShoKi Adriane Poker Player StackItUp PIOUPIOU loreRex tasadar wego Gustavo Livecoins Palmholder CryptoPravda barjan Crypto Collection collapse jukeee Cuk ng bitc in LBTC Pyrojason M BTC vanobe shortcircuit Toqo Vxv BiT pOL songsunling bitcoinokulu AlexMay Kaonashi Neo Baudrillard RussaX morkaii ',
 'satoshiFounderSr MemberOfflineActivity Merit Repost Bitcoin MaturationNovember PMMeritedbyescrowms NeuroticFish finist x icopress jankeman bitcoinbitcoin Bitcoin MaturationPosted Thu of Oct UTC From the user s perspective the bitcoin maturation process can be broken down into stages The initial network transaction that occurs when you first click Generate Coins The time between that initial network transaction and when the bitcoin entry is ready to appear in the All Transactions list The change of the bitcoin entry from outside the All Transaction field to inside it The time between when the bitcoin appears in the All Transfers list and when the Description is ready to change to Generated matures in x more blocks The change of the Description to Generated matures in x more blocks The time between when the Description says Generated matures in x more blocks to when it is ready to change to Generated The change of the Description to Generated The time after the Description has changed to GeneratedWhich stages require network connectivity significant local CPU usage and or significant remote CPU usage Do any of these stages have names sirius m Re Bitcoin MaturationPosted Thu of Oct UTC As far as I know there s no network transaction when you click Generate Coins your computer just starts calculating the next proof of work The CPU usage is when you re generating coinsIn this example the network connection is used when you broadcast the information about the proof of work block you ve created that which entitles you to the new coin Generating coins successfully requires constant connectivity so that you can start working on the next block when someone gets the current block before yousatoshiFounderSr MemberOfflineActivity Merit Repost Bitcoin MaturationNovember PMMeritedbyescrowms NeuroticFish finist x icopress jankeman bitcoinbitcoin Bitcoin MaturationPosted Thu of Oct UTC From the user s perspective the bitcoin maturation process can be broken down into stages The initial network transaction that occurs when you first click Generate Coins The time between that initial network transaction and when the bitcoin entry is ready to appear in the All Transactions list The change of the bitcoin entry from outside the All Transaction field to inside it The time between when the bitcoin appears in the All Transfers list and when the Description is ready to change to Generated matures in x more blocks The change of the Description to Generated matures in x more blocks The time between when the Description says Generated matures in x more blocks to when it is ready to change to Generated The change of the Description to Generated The time after the Description has changed to GeneratedWhich stages require network connectivity significant local CPU usage and or significant remote CPU usage Do any of these stages have names sirius m Re Bitcoin MaturationPosted Thu of Oct UTC As far as I know there s no network transaction when you click Generate Coins your computer just starts calculating the next proof of work The CPU usage is when you re generating coinsIn this example the network connection is used when you broadcast the information about the proof of work block you ve created that which entitles you to the new coin Generating coins successfully requires constant connectivity so that you can start working on the next block when someone gets the current block before yousatoshiFounderSr MemberOfflineActivity Merit Repost Bitcoin MaturationNovember PMMeritedbyescrowms NeuroticFish finist x icopress jankeman bitcoinbitcoin Bitcoin MaturationPosted Thu of Oct UTC From the user s perspective the bitcoin maturation process can be broken down into stages The initial network transaction that occurs when you first click Generate Coins The time between that initial network transaction and when the bitcoin entry is ready to appear in the All Transactions list The change of the bitcoin entry from outside the All Transaction field to inside it The time between when the bitcoin appears in the All Transfers list and when the Description is ready to change to Generated matures in x more blocks The change of the Description to Generated matures in x more blocks The time between when the Description says Generated matures in x more blocks to when it is ready to change to Generated The change of the Description to Generated The time after the Description has changed to GeneratedWhich stages require network connectivity significant local CPU usage and or significant remote CPU usage Do any of these stages have names sirius m Re Bitcoin MaturationPosted Thu of Oct UTC As far as I know there s no network transaction when you click Generate Coins your computer just starts calculating the next proof of work The CPU usage is when you re generating coinsIn this example the network connection is used when you broadcast the information about the proof of work block you ve created that which entitles you to the new coin Generating coins successfully requires constant connectivity so that you can start working on the next block when someone gets the current block before youRepost Bitcoin MaturationNovember PMMeritedbyescrowms NeuroticFish finist x icopress jankeman ',
 'satoshiFounderSr MemberOfflineActivity Merit Repost Request Make this anonymousNovember PMMeritedbyxtraelv anonguy Request Make this anonymousPosted Thu of Oct UTC Are there any plans to make this service anonymouseg Being able to route BitCoin through TorsatoshiFounderSr MemberOfflineActivity Merit Repost Request Make this anonymousNovember PMMeritedbyxtraelv anonguy Request Make this anonymousPosted Thu of Oct UTC Are there any plans to make this service anonymouseg Being able to route BitCoin through TorsatoshiFounderSr MemberOfflineActivity Merit Repost Request Make this anonymousNovember PMMeritedbyxtraelv anonguy Request Make this anonymousPosted Thu of Oct UTC Are there any plans to make this service anonymouseg Being able to route BitCoin through TorRepost Request Make this anonymousNovember PMMeritedbyxtraelv ',
 'satoshiFounderSr MemberOfflineActivity Merit Re Repost Bitcoin MaturationNovember PMMeritedbyhold coins NeuroticFish It s important to have network connectivity while you re trying to generate a coin block and at the moment it is successfully generated During generation when the status bar says Generating and you re using CPU to find a proof of work you must constantly keep in contact with the network to receive the latest block If your block does not link to the latest block it may not be accepted When you successfully generate a block it is immediately broadcast to the network Other nodes must receive it and link to it for it to be accepted as the new latest blockThink of it as a cooperative effort to make a chain When you add a link you must first find the current end of the chain If you were to locate the last link then go off for an hour and forge your link come back and link it to the link that was the end an hour ago others may have added several links since then and they re not going to want to use your link that now branches off the middleAfter a block is created the maturation time of blocks is to make absolutely sure the block is part of the main chain before it can be spent Your node isn t doing anything with the block during that time just waiting for other blocks to be added after yours You don t have to be online during that timesatoshiFounderSr MemberOfflineActivity Merit Re Repost Bitcoin MaturationNovember PMMeritedbyhold coins NeuroticFish It s important to have network connectivity while you re trying to generate a coin block and at the moment it is successfully generated During generation when the status bar says Generating and you re using CPU to find a proof of work you must constantly keep in contact with the network to receive the latest block If your block does not link to the latest block it may not be accepted When you successfully generate a block it is immediately broadcast to the network Other nodes must receive it and link to it for it to be accepted as the new latest blockThink of it as a cooperative effort to make a chain When you add a link you must first find the current end of the chain If you were to locate the last link then go off for an hour and forge your link come back and link it to the link that was the end an hour ago others may have added several links since then and they re not going to want to use your link that now branches off the middleAfter a block is created the maturation time of blocks is to make absolutely sure the block is part of the main chain before it can be spent Your node isn t doing anything with the block during that time just waiting for other blocks to be added after yours You don t have to be online during that timesatoshiFounderSr MemberOfflineActivity Merit Re Repost Bitcoin MaturationNovember PMMeritedbyhold coins NeuroticFish It s important to have network connectivity while you re trying to generate a coin block and at the moment it is successfully generated During generation when the status bar says Generating and you re using CPU to find a proof of work you must constantly keep in contact with the network to receive the latest block If your block does not link to the latest block it may not be accepted When you successfully generate a block it is immediately broadcast to the network Other nodes must receive it and link to it for it to be accepted as the new latest blockThink of it as a cooperative effort to make a chain When you add a link you must first find the current end of the chain If you were to locate the last link then go off for an hour and forge your link come back and link it to the link that was the end an hour ago others may have added several links since then and they re not going to want to use your link that now branches off the middleAfter a block is created the maturation time of blocks is to make absolutely sure the block is part of the main chain before it can be spent Your node isn t doing anything with the block during that time just waiting for other blocks to be added after yours You don t have to be online during that timeRe Repost Bitcoin MaturationNovember PMMeritedbyhold coins NeuroticFish ',
 'satoshiFounderSr MemberOfflineActivity Merit Re Repost Request Make this anonymousNovember PM There will be a proxy setting in version so you can connect through TOR I ve done a careful scrub to make sure it doesn t use DNS or do anything that would leak your IP while in proxy modesatoshiFounderSr MemberOfflineActivity Merit Re Repost Request Make this anonymousNovember PM There will be a proxy setting in version so you can connect through TOR I ve done a careful scrub to make sure it doesn t use DNS or do anything that would leak your IP while in proxy modesatoshiFounderSr MemberOfflineActivity Merit Re Repost Request Make this anonymousNovember PM There will be a proxy setting in version so you can connect through TOR I ve done a careful scrub to make sure it doesn t use DNS or do anything that would leak your IP while in proxy modeRe Repost Request Make this anonymousNovember PM ',
 'satoshiFounderSr MemberOfflineActivity Merit Repost How anonymous are bitcoinsNovember PMMeritedbylivingfree xtraelv bitcoinbitcoin How anonymous are bitcoinsCan nodes on the network tell from which and or to which bitcoin address coins are being sent Do blocks contain a history of where bitcoins have been transfered to and from Can nodes tell which bitcoin addresses belong to which IP addresses Is there a command line option to enable the sock proxy the first time that bitcoin starts What happens if you send bitcoins to an IP address that has multiple clients connected through network address translation NAT satoshiFounderSr MemberOfflineActivity Merit Repost How anonymous are bitcoinsNovember PMMeritedbylivingfree xtraelv bitcoinbitcoin How anonymous are bitcoinsCan nodes on the network tell from which and or to which bitcoin address coins are being sent Do blocks contain a history of where bitcoins have been transfered to and from Can nodes tell which bitcoin addresses belong to which IP addresses Is there a command line option to enable the sock proxy the first time that bitcoin starts What happens if you send bitcoins to an IP address that has multiple clients connected through network address translation NAT satoshiFounderSr MemberOfflineActivity Merit Repost How anonymous are bitcoinsNovember PMMeritedbylivingfree xtraelv bitcoinbitcoin How anonymous are bitcoinsCan nodes on the network tell from which and or to which bitcoin address coins are being sent Do blocks contain a history of where bitcoins have been transfered to and from Can nodes tell which bitcoin addresses belong to which IP addresses Is there a command line option to enable the sock proxy the first time that bitcoin starts What happens if you send bitcoins to an IP address that has multiple clients connected through network address translation NAT Repost How anonymous are bitcoinsNovember PMMeritedbylivingfree xtraelv ']

stop = list(stopwords.words("english"))
stop.append("brand newofflineactivity")
stop.append("jr. memberofflineactivity")
stop.append("full memberofflineactivity")
stop.append("sr. memberofflineactivity")
stop.append("hero memberofflineactivity")
stop.append("global moderatorofflineactivity")

stpexp = ['*brand newofflineactivity', '*newbieofflineactivity', '*jr. memberofflineactivity',\
            '*memberofflineactivity','*full memberofflineactivity','*sr. memberofflineactivity',\
            '*hero memberofflineactivity','*legendaryofflineactivity','*vipofflineactivity'\
            '*donaterofflineactivity','*staffofflineactivity','*moderatorofflineactivity','*global moderatorofflineactivity',\

So I want to search all of the items in 'txt' for all of the variations of the words in the list 'stpexp' and then append all of those variation to my list of stopwords 'stop'.

Any assistance would be greatly appreciated.


  • You can use the fnmatch built in library. For example, if you want to find all the words in your text that ends with 'thispattern', you can do it like this:

    import fnmatch 
    txt = ["longerthepattern is the word i want", "thisisthepattern and it works"]
    pattern =  '*thepattern'
    to_add_to_stoplist = []
    for sentence in txt: 
        filters = fnmatch.filter(sentence.split(" "),pattern)
        to_add_to_stoplist += filters

    And it outputs:

    ['longerthepattern', 'thisisthepattern']

    You can add this list of words to the stopwords.


    Here is a version using for comprehensions to analyze multiple patterns. It no longer uses fnmatch and uses the str.endswith function.

    Note that it requires the patterns to be a tuple, and not a list.

    txt = ["longerthepattern removeme is the word i want", "thisisthepattern and it works"]
    patterns = ("pattern","veme")
    def my_func(sentence): 
        return [x for x in sentence.split(" ") if x.lower().endswith(patterns)]
    to_add_to_stop = [word for sentence in txt for word in my_func(sentence) ]

    It outputs:

    ['longerthepattern', 'removeme', 'thisisthepattern']


    I added the .lower() fonction in the for comprehension to ensure that the words we are comparing with the patterns are all lowercase since the patterns are lowercase as well.