I have used objdump
to disassemble all the functions in a compiled library file and written the output to a text
file. In the text
file the output of function called clear_bit
is as follows.
Disassembly of section .text.stm32f30x::spi1::cr1::_CRCENW::clear_bit:
0: 80 b5 84 b0 addlt r11, r4, r0, lsl #11
4: 01 46 03 90 andls r4, r3, r1, lsl #12
8: 03 98 00 22 andhs r9, r0, #196608
c: 02 91 11 46 ldrmi r9, [r1], -r2, lsl #2
10: ff f7 fe ff <unknown>
14: 01 90 ff e7 ldrb r9, [pc, r1]!
18: 01 98 04 b0 andlt r9, r4, r1, lsl #16
1c: 80 <unknown>
1d: bd <unknown>
The output of an another function set_bit
is as follows-:
Disassembly of section .text.stm32f30x::spi1::cr1::_CRCNEXTW::set_bit:
0: 80 b5 84 b0 addlt r11, r4, r0, lsl #11
4: 01 46 03 90 andls r4, r3, r1, lsl #12
8: 03 98 01 22 andhs r9, r1, #196608
c: 02 91 11 46 ldrmi r9, [r1], -r2, lsl #2
10: ff f7 fe ff <unknown>
14: 01 90 ff e7 ldrb r9, [pc, r1]!
18: 01 98 04 b0 andlt r9, r4, r1, lsl #16
1c: 80 <unknown>
1d: bd <unknown>
Similar to the above two functions, this output.txt
contains disassembly of more than 100 such functions. However, what I need to achieve here is to extract only the hex byte values [80,b5,84,b0,01,..,b0,80,bd]
that are respective to each and every function without assembly instructions, function names, offsets etc. I am trying to extract these byte sequences with corresponding to each function without as a single sequence in order to develop a model in machine learning. Following is what I am expecting for only two functions.(Comments are just for understanding purpose I don't need any of those in my expected output)
// byte sequence related to first function
80 b5 84 b0 01 46 03 90 03 98 00 22 02 91 11 46 ff f7 fe ff 01 90 ff
e7 01 98 04 b0 80 bd
// byte sequence related to second function separated by a line
80 b5 84 b0 01 46 03 90 03 98 01 22 02 91 11 46 ff f7 fe ff 01 90 ff
e7 01 98 04 b0 80 bd
I used xxd -g 1
command but it gives me a sequence of bytes as follows with the offsets, some other values at the right of the byte values and seems like it contains disassembly of all the sections.(Not only the code in the text section).
00000000: 21 3c 61 72 63 68 3e 0a 2f 20 20 20 20 20 20 20 !<arch>./
00000010: 20 20 20 20 20 20 20 20 30 20 20 20 20 20 20 20 0
00000020: 20 20 20 20 30 20 20 20 20 20 30 20 20 20 20 20 0 0
00000030: 30 20 20 20 20 20 20 20 34 37 33 32 34 30 20 20 0 473240
00000040: 20 20 60 0a 00 00 1c 8c 00 07 aa ea 00 07 aa ea `.............
00000050: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................
00000060: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................
00000070: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................
00000080: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................
00000090: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................
000000a0: 00 07 aa ea 00 07 aa ea 00 07 aa ea 00 07 aa ea ................
000000b0: 00 08 1a 1a 00 08 1a 1a 00 08 1a 1a 00 08 1a 1a ................
000000c0: 00 08 1a 1a 00 08 1a 1a 00 08 3a ee 00 08 3a ee ..........:...:.
I have been trying different tools and gone through other similar stack overflow questions but have failed so far. I don't know whether I am using xxd
in a wrong manner, or else there are other tools to achieve my goal. Any help would be highly appreciated. Thank you!
Would you please try the following:
# fold $str, print and clear
flush() {
if [[ -n $str ]]; then
fold -w 69 <<< "$str"
echo
str=""
fi
}
header='^Disassembly of section'
body='^[[:blank:]]*[0-9a-fA-f]+:[[:blank:]]+(([0-9a-fA-f]{2} )+)'
while IFS= read -r line; do
if [[ $line =~ $header ]]; then
flush
echo "// $line"
elif [[ $line =~ $body ]]; then
# concatenate the byte sequence on $str
str+="${BASH_REMATCH[1]}"
fi
done < output.txt
flush
output.txt (as an input to the script above):
Disassembly of section .text.stm32f30x::spi1::cr1::_CRCENW::clear_bit:
0: 80 b5 84 b0 addlt r11, r4, r0, lsl #11
4: 01 46 03 90 andls r4, r3, r1, lsl #12
8: 03 98 00 22 andhs r9, r0, #196608
c: 02 91 11 46 ldrmi r9, [r1], -r2, lsl #2
10: ff f7 fe ff <unknown>
14: 01 90 ff e7 ldrb r9, [pc, r1]!
18: 01 98 04 b0 andlt r9, r4, r1, lsl #16
1c: 80 <unknown>
1d: bd <unknown>
Disassembly of section .text.stm32f30x::spi1::cr1::_CRCNEXTW::set_bit:
0: 80 b5 84 b0 addlt r11, r4, r0, lsl #11
4: 01 46 03 90 andls r4, r3, r1, lsl #12
8: 03 98 01 22 andhs r9, r1, #196608
c: 02 91 11 46 ldrmi r9, [r1], -r2, lsl #2
10: ff f7 fe ff <unknown>
14: 01 90 ff e7 ldrb r9, [pc, r1]!
18: 01 98 04 b0 andlt r9, r4, r1, lsl #16
1c: 80 <unknown>
1d: bd <unknown>
Result:
// Disassembly of section .text.stm32f30x::spi1::cr1::_CRCENW::clear_bit:
80 b5 84 b0 01 46 03 90 03 98 00 22 02 91 11 46 ff f7 fe ff 01 90 ff
e7 01 98 04 b0 80 bd
// Disassembly of section .text.stm32f30x::spi1::cr1::_CRCNEXTW::set_bit:
80 b5 84 b0 01 46 03 90 03 98 01 22 02 91 11 46 ff f7 fe ff 01 90 ff
e7 01 98 04 b0 80 bd
${BASH_REMATCH[1]}
.Hope this is what you want.