Search code examples
pythondataframejupyterdata-extraction

Extract letters after $ symbol using Pandas


I am trying to extract just the data upto and including the $ symbol from a spreadsheet.

I have isolated the data to give me just the column containing the data but what I am trying to do is extract any and all symbols that follow a $ symbol.

For example: $AAPL $LOW $TSLA and so on from the entire dataset but I don't need or want $1000 $600 and so on - just letters only and either a period or a space follows but just the characters a-z is what I am trying to get.

I haven't been successful in full extraction and my code is starting to get messy so I'll provide the code that will bring back the data for you to see for yourself. I am using Jupyter Notebook.

import mysql.connector
import pandas

googleSheedID = '15fhpxqWDRWkNtEFhi9bQyWUg8pDn4B-R2N18s1xFYTU'
worksheetName = 'Sheet1'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
        googleSheedID,
        worksheetName
)

df = pandas.read_csv(URL)
del df['DATE']
del df['USERNAME']
del df['LINK']
del df['LINK2']
df[df["TWEET"].str.contains("RT")==False]

print(df)

Solution

  • Not sure if I understand what you want correctly, but the following codes give all elements that comes after $ before (blank space).

    import mysql.connector
    import pandas
    
    googleSheedID = '15fhpxqWDRWkNtEFhi9bQyWUg8pDn4B-R2N18s1xFYTU'
    worksheetName = 'Sheet1'
    URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
            googleSheedID,
            worksheetName
    )
    
    df = pandas.read_csv(URL)
    del df['DATE']
    del df['USERNAME']
    del df['LINK']
    del df['LINK2']
    
    unique_results = []
    for i in range(len(df['TWEET'])):
        if 'RT' in df["TWEET"][i]:
            continue
        else:
            for j in range(len(df['TWEET'][i])-1):
                if df['TWEET'][i][j] == '$':
                    if df['TWEET'][i][j+1] == '1' or df['TWEET'][i][j+1] == '2' or df['TWEET'][i][j+1] == '3' or\
                       df['TWEET'][i][j+1] == '4' or df['TWEET'][i][j+1] == '5' or df['TWEET'][i][j+1] == '6' or\
                        df['TWEET'][i][j+1] == '7' or df['TWEET'][i][j+1] == '8' or df['TWEET'][i][j+1] == '9' or df['TWEET'][i][j+1] == '0':
                            continue
                    else:
                        start = j
                        for k in range(start, len(df['TWEET'][i])):
                            if df['TWEET'][i][k] == ' ' or df['TWEET'][i][k:k+1] == '\n':
                                end = k
                                break
                        results = df['TWEET'][i][start:end]
                        if results not in unique_results:
                            unique_results.append(results)
    print(unique_results)
                            
    

    edit: fixed the code

    The outputs are:

    ['$GME', '$SNDL', '$FUBO', '$AMC', '$LOTZ', '$CLOV', '$USAS', '$AIHS', '$PLM', '$LODE', '$TTNP', '$IMTE', '', '$NAK.', '$NAK', '$CRBP', '$AREC', '$NTEC', '$NTN', '$CBAT', '$ZYNE', '$HOFV', '$GWPH', '$KERN', '$ZYNE,', '$AIM', '$WWR', '$CARV', '$VISL', '$SINO', '$NAKD', '$GRPS', '$RSHN', '$MARA', '$RIOT', '$NXTD', '$LAC', '$BTC', '$ITRM', '$CHCI', '$VERU', '$GMGI', '$WNBD', '$KALV', '$EGOC', '$Veru', '$MRNA', '$PVDG', '$DROP', '$EFOI', '$LLIT', '$AUVI', '$CGIX', '$RELI', '$TLRY', '$ACB', '$TRCH', '$TRCH.', '$TSLA', '$cciv', '$sndl', '$ANCN', '$TGC', '$tlry', '$KXIN', '$AMZN', '$INFI', '$LMND', '$COMS', '$VXX', '$LEDS', '$ACY', '$RHE', '$SINO.', '$GPL', '$SPCE', '$OXY', '$CLSN', '$FTFT', '$FTFT.....', '$BIEI', '$EDRY', '$CLEU', '$FSR', '$SPY', '$NIO', '$LI', '$XPEV,', '$UL', '$RGLG', '$SOS', '$QS', '$THCB', '$SUNW', '$MICT', '$BTC.X', '$T', '$ADOM', '$EBON', '$CLPS', '$HIHO', '$ONTX', '$WNRS', '$SOLO', '$Mara,', '$Riot,', '$SOS,', '$GRNQ,', '$RCON,', '$FTFT,', '$BTBT,', '$MOGO,', '$EQOS,', '$CCNC', '$CCIV', '$tsla', '$fsr', '$wkhs', '$ride', '$nio', '$NETE', '$DPW', '$MOSY', '$SSNT', '$PLTR', '$GSAH:', '$EQOS', '$MTSL', '$CMPS', '$CHIF', '$MU', '$HST', '$SNAP', '$CTXR', '$acy', '$FUBOTV', '$DPBE', '$HYLN', '$SPOT', '$NSAV', '$HYLN,', '$aabb', '$AAL', '$BBIG', '$ITNS', '$CTIB', '$AMPG', '$ZI', '$NUVI', '$INTC', '$TSM', '$AAPL', '$MRJT', '$RCMT', '$IZEA', '$BBIG,', '$ARKK', '$LIAUTO', '$MARA:', '$SOS:', '$XOM', '$ET', '$BRNW', '$SYPR', '$LCID', '$QCOM', '$FIZZ', '$TRVG', '$SLV', '$RAFA', '$TGCTengasco,', '$BYND', '$XTNT', '$NBY', '$sos', '$KMPH', '$', '$(0.60)', '$(0.64)', '$BIDU', '$rkt', '$GTT', '$CHUC', '$CLF', '$INUV', '$RKT', '$COST', '$MDCN', '$HCMC', '$UWMC', '$riot', '$OVID', '$HZON', '$SKT', '$FB', '$PLUG', '$BA', '$PYPL', '$PSTH.', '$NVDA', '$AMPG.', '$aese.', '$spy', '$pltr', '$MSFT', '$AMD', '$QQQ', '$LTNC', '$WKHS', '$EYES', '$RMO', '$GNUS', '$gme', '$mdmp', '$kern', '$AEI', '$BABA', '$YALA', '$TWTR', '$WISH', '$GE', '$ORCL', '$JUPW', '$TMBR', '$SSYS', '$NKE', '$AMPGAmpliTech', '$$$', '$$', '$RGLS', '$HOGE', '$GEGR', '$nclh', '$IGAC', '$FCEL', '$TKAT', '$OCG', '$YVR', '$IPDN.', '$IPDN', "$SINO's", '$WIMI', '$TKAT.', '$BAC', '$LZR', '$LGHL', '$F', '$GM', '$KODK', '$atvk', '$ATVK', '$AIKI', '$DS', '$AI', '$WTII', '$oxy', '$DYAI', '$DSS', '$ZKIN', '$MFH', '$WKEY', '$MKGI', '$DLPN', '$PSWW', '$SNOW', '$ALYA', '$AESE', '$CSCW', '$CIDM', '$HOFV.', '$LIVX', '$FNKO', '$HPR', '$BRQS', '$GIGM', '$APOP', '$EA', '$CUEN', '$TMBR?', '$FLNT,', '$APPS', '$METX', '$STG', '$WSRC', '$AMHC', '$VIAC', '$MO', '$UAVL', '$CS', '$MDT', '$GYST', '$CBBT', '$ASTC', '$AACG', '$WAFU.', '$WAFU', '$CASI', '$mmmw', '$MVIS', '$SNOA', '$C', '$KR', '$EWZ', '$VALE', '$EWZ.', '$CSCO', '$PINS', '$XSPA', '$VPRX', '$CEMI', '$M', '$BMRA', '$SPX', '$akt', '$SURG', '$NCLH', '$ARSN', '$ODT', '$SGBX', '$CRWD.', '$TGRR', '$PENN', '$BB', '$XOP', '$XL', '$FREQ', '$IDRA', '$DKNG', '$COHN', '$ADHC', '$ISWH', '$LEGO', '$OTRA', '$NAAC', '$HCAR', '$PPGH', '$SDAC', '$PNTM', '$OUST', '$IO', '$HQGE', '$HENC', '$KYNC', '$ATNF', '$BNSO', '$HDSN', '$AABB', '$SGH', '$BMY', '$VERY', '$EARS', '$ROKU', '$PIXY', '$APRE', '$SFET', '$SQ', '$EEIQ', '$REDU', '$CNWT', '$NFLX', '$RGBPP', '$RGBP', '$SHOP', '$VITL', '$RAAS', '$CPNG', '$JKS', '$COMP', '$NAFS']