I want to use google speech api to convert audio files to string. But it seems that it only accepts .raw files.
You don't have to convert them. See the Introduction to audio encoding documentation which discusses file formats vs encoding and shows the Supported audio encodings.