I want to copy tables of data displayed in websites and paste as text directly into scripts as string variables using IDLE. This sometimes doesn't work because of something in the copied material that IDLE won't accept as savable. The resulting behavior is not an error message, but IDLE simply ignoring the save request. It just sits there until I close without saving.
That behavior is fine with me at the moment - I'd of course not want to save a python script that contains troublesome characters.
Is there some way I can get those pesky characters out of what's in my computer's clip board so I can get on with my script?
If I just needed to do this once, I could go in and look at the html of the site and possibly extract it, or in the case of the table of satellites on this page maybe I can go into the google app and get it.
But for the purposes of this question, I'd like a way to "fix" the data in my clip board to I can paste as a string into a script using IDLE and run it.
I've tried "Paste and Match Style" in a .txt
file first to clean it up, no luck. I have Sublime Text 2 but not very familliar with it, if there is a relatively easy to use function in there, that would be OK.
Trying to paste inside triple quotes thing = """ """
at the prompt gives the following error message: Unsupported characters in input
:
note: using Python and IDLE versions '2.7.11', Tk version '8.5.9' (I know, these are a year old) in OSX.
EDIT: Here is a chunk of data from my clip board, as suggested in the comments. Copying from here (as shown) results in unsuccessful save attempts in IDLE, so at least some of the pesky symbols are in here. I'm pasting between a pair of triple quotes, e.g. thing = """ """
1 2/6/2000 PICOSAT 1&2 (TETHERED) Aerospace Corporation mil Opal Opal T 5 N Minotaur-1
2 2/10/2000 PICOSAT 3 (JAK) Santa Clara University uni Opal Opal E 2 N Minotaur-1
3 2/10/2000 PICOSAT 6 (StenSat) Stensat Group. LLC civ Opal Opal C 2 N Minotaur-1
4 2/12/2000 PICOSAT 4 (Thelma) Santa Clara University uni Opal Opal S 2 N Minotaur-1
5 2/12/2000 PICOSAT 5 (Louise) Santa Clara University uni Opal Opal S 2 N Minotaur-1
6 9/6/2001 PICOSAT 7&8 (TETHERED) Aerospace Corporation mil Opal Opal T 2 D Minotaur-1
7 12/2/2002 MEPSI Aerospace Corporation mil 2U SSPL T 2 D Shuttle
8 6/30/2003 DTUSAT 1 Technical University of Denmark uni 1U PPOD E 2 N Rokot-KM
9 6/30/2003 CUTE-1 (CO-55) Tokyo Institute of Technology uni 1U PPOD E 3 N Rokot-KM
10 6/30/2003 QUAKESAT 1 Stanford University uni 3U PPOD S 5 N Rokot-KM
11 6/30/2003 AAU CUBESAT 1 Aalborg University uni 1U PPOD E 2 N Rokot-KM
12 6/30/2003 CANX-1 UTIAS (University of Toronto) uni 1U PPOD E 2 N Rokot-KM
13 6/30/2003 CUBESAT XI-IV (CO-57) University of Tokyo uni 1U PPOD E 4 S Rokot-KM
14 10/27/2005 UWE-1 University of Würzburg uni 1U TPOD E 3 N Kosmos-3M
15 10/27/2005 CUBESAT XI-V (CO-58) University of Tokyo uni 1U TPOD E 5 N Kosmos-3M
16 10/27/2005 Ncube 2 Norweigan Universities uni 1U TPOD E 2 N Kosmos-3M
17 2/21/2006 CUTE 1.7 Tokyo Institute of Technology uni 2U JPOD C 2 D M-5 (2)
18 7/26/2006 AeroCube 1 Aerospace Corporation mil 1U PPOD T 1 D Dnepr-1
19 7/26/2006 SEEDS Nihon University uni 1U PPOD E 1 D Dnepr-1
20 7/26/2006 SACRED University of Arizona uni 1U PPOD E 1 D Dnepr-1
I'd try to scan the string and find the characters outside the normal printable range. Maybe the strange character will be easier to identify.
text = """ <here comes your pasted text> """
def normal(c):
return (32 <= ord(c) <= 127) or (c in '\n\r\t')
strange = set(ord(c) for c in text if not normal(c))
print strange
I wonder what character codes may end up in strange
.