I have a lady at work who sends me phone numbers. They are sent in a messy manner. EVERY TIME. so I want to copy her entire message from Skype and have a batch file parse the saved .txt file, searching only for 10 consecutive digits.
e.g she sends me:
Hello more numbers for settings please,
WYK-0123456789
CAMP-0123456789
0123456789
Include 0123456789
This is an urgent number: 0123456789
TIDO: 0123456789
Send to> 0123456789
It's quite a mess and the only constant is 10 digits. So I would like the .bat file to some how scan this monstrosity and leave me with something like below:
e.g what I want:
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
0123456789
I tried this below
@echo off
setlocal enableDelayedExpansion
(
for /f %%A in (
'findstr "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" yourFile.txt'
) do (
set "ln=%%A"
echo !ln:~0,9!
)
)>newFile.txt
Unfortunately it only works if the beginning of each line starts with 10 digits and doesn't help me in the case where the 10 digits are in the middle or end of a line.
Given that the 10-digit number is the first numeric part in every line of the file (let us call it numbers.txt
) before any other numbers, you could use the following:
@echo off
setlocal EnableExtensions EnableDelayedExpansion
rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"
rem // The first delimiter is TAB, the last one is SPACE:
for /F "usebackq tokens=1 delims= ^!#$%%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^^_`abcdefghijklmnopqrstuvwxyz{|}~ " %%L in ("!_FILE!") do (
set "NUM=%%L#"
if "!NUM:~%_DIG%!"=="#" echo(%%L
)
endlocal
exit /B
This makes use of for /F
and its delims
option string, which includes most ASCII characters except numerals. You may extend the delims
option string to hold also extended characters (those with a code greater than 0x7F
); make sure the SPACE is the last character specified.
This approach can extract the 10-digit number from a line like this:
garbage text>0123456789_more text0123-end
But it fails if a line looks like this, so when the first number is not the 10-digit one:
garbage text: 0123 tel. 0123456789; end
Here is a comprehensive solution based on the above approach. The character list for the delims
option of for /F
is created automatically here. This may take even a few seconds, but this is done once only at the very beginning, so for large files you will probably not recognise this overhead:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"
rem // Define global variables here:
set "$CHARS="
rem // Capture current code page and set Windows default one:
for /F "tokens=2 delims=:" %%P in ('chcp') do set /A "CP=%%P"
> nul chcp 437
rem /* Generate list of escaped characters other than numerals (escaped means every character
rem is preceded by `^`); there are some characters excluded:
rem - NUL (this cannot be stored in an environment variable and should not occur anyway),
rem - CR + LF, (they build up line-breaks, so they cannot occur within a line obviously),
rem - SPACE, (because this must be placed as the last character of the `delims`option),
rem - `"`, (because this impairs the quotation within the following code portion),
rem - `!` + `^` (they may lead to unexpected results when delayed expansion is enabled): */
setlocal EnableDelayedExpansion
for /L %%I in (0x01,1,0xFF) do (
rem // Exclude codes of aforementioned characters:
if %%I GEQ 0x30 if %%I LSS 0x3A (set "SKIP=#") else (set "SKIP=")
if not defined SKIP if %%I NEQ 0x00 if %%I NEQ 0x0A if %%I NEQ 0x0D (
if %%I NEQ 0x20 if %%I NEQ 0x21 if %%I NEQ 0x22 if %%I NEQ 0x5E (
rem // Convert code to character and append to list separated by `^`:
cmd /C exit %%I
for /F delims^=^ eol^= %%J in ('
forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x220x!=ExitCode:~-2!0x22"
') do (
set "$CHARS=!$CHARS!^^%%~J"
)
)
)
)
endlocal & set "$CHARS=%$CHARS%"
rem /* Apply escaped list of characters as delimiters and apply some of the characters
rem excluded before, namely SPACE, `"`, `!` and `^`;
rem read file using `type` in order to convert from Unicode, if applicable: */
for /F tokens^=1*^ eol^=^ ^ delims^=^!^"^^%$CHARS%^ %%K in ('type "%_FILE%"') do (
set "NUM=%%K#" & set "REST=%%L"
rem // Test whether extracted numeric string holds the given number of digits:
setlocal EnableDelayedExpansion
if "!NUM:~%_DIG%!"=="#" echo(%%K
endlocal
rem /* Current line holds more than a single numeric portion, so process them in a
rem sub-routine; this is not called if the line contains a single number only: */
if defined REST call :SUB REST
)
rem // Restore previous code page:
> nul chcp %CP%
endlocal
exit /B
:SUB ref_string
setlocal DisableDelayedExpansion
setlocal EnableDelayedExpansion
set "STR=!%~1!"
rem // Parse line string using the same approach as in the main routine:
:LOOP
if defined STR (
for /F tokens^=1*^ eol^=^ ^ delims^=^^^!^"^^^^%$CHARS%^ %%E in ("!STR!") do (
endlocal
set "NUM=%%E#" & set "STR=%%F"
setlocal EnableDelayedExpansion
rem // Test whether extracted numeric string holds the given number of digits:
if "!NUM:~%_DIG%!"=="#" echo(%%E
)
rem // Loop back if there are still more numeric parts encountered:
goto :LOOP
)
endlocal
endlocal
exit /B
This approach detects 10-digit numbers everywhere in the file, even if there are multiple ones within a single line.