Armv8a NEON inline asm code: How to convert 16x8bit vector to four 4x32bit (integer) vectors?

I need to load 8 bit array and then convert every element to 32-bit integer using armv8a neon inline asm code. I have done it with armv7 but no idea how to do it in v8a...

The code I used in v7 is

"pld        [%1, #128]                 \n"
"vld1.u8    {d0,d1}, [%1]!       \n" 
"vmovl.u8   q8, d0               \n"  
"vmovl.u8   q9, d1               \n" 
"vmovl.u16  q0, d16              \n" 
"vmovl.u16  q1, d17              \n" 
"vmovl.u16  q2, d18              \n" 
"vmovl.u16  q3, d19              \n"

How can I finish this by using armv8a neon code? Or how can I convert the code above to armv8a? PS: In my case, I only need inline asm but not intrinsics...

Thanks for the help.

Solution

For unsigned elements, USHLL, USHLL2 with the shift number 0 will do the job.

ld1     {v0.16b}, [%1], #16

USHLL   v16.8h, v0.8b, #0
USHLL2  v17.8h, v0.16b, #0

USHLL   v0.4s, v16.4h, #0
USHLL2  v1.4s, v16.8h, #0
USHLL   v2.4s, v17.4h, #0
USHLL2  v3.4s, v17.8h, #0

For signed elements - guess guess - use SSHLL and SSHLL2 instead.

Similarly, there is no direct equivalent to MOVN on aarch64 as well.

--EDIT

There are XTN/XTN2 instructions that wore exactly like VMOVN on the other hand.