Search code examples
stringpython-2.7clipboardpython-idle

How to clean up troublesome characters in clipboard data so I can paste into a python script in IDLE?


I want to copy tables of data displayed in websites and paste as text directly into scripts as string variables using IDLE. This sometimes doesn't work because of something in the copied material that IDLE won't accept as savable. The resulting behavior is not an error message, but IDLE simply ignoring the save request. It just sits there until I close without saving.

That behavior is fine with me at the moment - I'd of course not want to save a python script that contains troublesome characters.

Is there some way I can get those pesky characters out of what's in my computer's clip board so I can get on with my script?

If I just needed to do this once, I could go in and look at the html of the site and possibly extract it, or in the case of the table of satellites on this page maybe I can go into the google app and get it.

But for the purposes of this question, I'd like a way to "fix" the data in my clip board to I can paste as a string into a script using IDLE and run it.

I've tried "Paste and Match Style" in a .txt file first to clean it up, no luck. I have Sublime Text 2 but not very familliar with it, if there is a relatively easy to use function in there, that would be OK.

Trying to paste inside triple quotes thing = """ """ at the prompt gives the following error message: Unsupported characters in input:

enter image description here

note: using Python and IDLE versions '2.7.11', Tk version '8.5.9' (I know, these are a year old) in OSX.

EDIT: Here is a chunk of data from my clip board, as suggested in the comments. Copying from here (as shown) results in unsuccessful save attempts in IDLE, so at least some of the pesky symbols are in here. I'm pasting between a pair of triple quotes, e.g. thing = """ """


1   2/6/2000    PICOSAT 1&2 (TETHERED)  Aerospace Corporation   mil Opal    Opal    T   5   N   Minotaur-1
2   2/10/2000   PICOSAT 3 (JAK) Santa Clara University  uni Opal    Opal    E   2   N   Minotaur-1
3   2/10/2000   PICOSAT 6 (StenSat) Stensat Group. LLC  civ Opal    Opal    C   2   N   Minotaur-1
4   2/12/2000   PICOSAT 4 (Thelma)  Santa Clara University  uni Opal    Opal    S   2   N   Minotaur-1
5   2/12/2000   PICOSAT 5 (Louise)  Santa Clara University  uni Opal    Opal    S   2   N   Minotaur-1
6   9/6/2001    PICOSAT 7&8 (TETHERED)  Aerospace Corporation   mil Opal    Opal    T   2   D   Minotaur-1
7   12/2/2002   MEPSI   Aerospace Corporation   mil 2U  SSPL    T   2   D   Shuttle
8   6/30/2003   DTUSAT 1    Technical University of Denmark uni 1U  PPOD    E   2   N   Rokot-KM
9   6/30/2003   CUTE-1 (CO-55)  Tokyo Institute of Technology   uni 1U  PPOD    E   3   N   Rokot-KM
10  6/30/2003   QUAKESAT 1  Stanford University uni 3U  PPOD    S   5   N   Rokot-KM
11  6/30/2003   AAU CUBESAT 1   Aalborg University  uni 1U  PPOD    E   2   N   Rokot-KM
12  6/30/2003   CANX-1  UTIAS (University of Toronto)   uni 1U  PPOD    E   2   N   Rokot-KM
13  6/30/2003   CUBESAT XI-IV (CO-57)   University of Tokyo uni 1U  PPOD    E   4   S   Rokot-KM
14  10/27/2005  UWE-1   University of Würzburg  uni 1U  TPOD    E   3   N   Kosmos-3M
15  10/27/2005  CUBESAT XI-V (CO-58)    University of Tokyo uni 1U  TPOD    E   5   N   Kosmos-3M
16  10/27/2005  Ncube 2 Norweigan Universities  uni 1U  TPOD    E   2   N   Kosmos-3M
17  2/21/2006   CUTE 1.7    Tokyo Institute of Technology   uni 2U  JPOD    C   2   D   M-5 (2)
18  7/26/2006   AeroCube 1  Aerospace Corporation   mil 1U  PPOD    T   1   D   Dnepr-1
19  7/26/2006   SEEDS   Nihon University    uni 1U  PPOD    E   1   D   Dnepr-1
20  7/26/2006   SACRED  University of Arizona   uni 1U  PPOD    E   1   D   Dnepr-1

Solution

  • I'd try to scan the string and find the characters outside the normal printable range. Maybe the strange character will be easier to identify.

    text = """ <here comes your pasted text> """
    
    def normal(c):
      return (32 <= ord(c) <= 127) or (c in '\n\r\t')
    
    strange = set(ord(c) for c in text if not normal(c))
    
    print strange
    

    I wonder what character codes may end up in strange.