I have many text files containing data like this:
{'photo': {'people': {'haspeople': 0}, 'dateuploaded': '1264588417', 'originalformat': 'jpg', 'tags': {'tag': [{'machine_tag': 0, 'author': '14988396@N00', 'text': 'bokehlicious', 'raw': 'Bokehlicious', 'authorname': 'chachahavana', 'id': '1921934-4308203423-4944107'}],[{'machine_tag': 0, 'author': '14988396@N00', 'text': 'bokehlicious2', 'raw': 'Bokehlicious2', 'authorname': 'chachahavana', 'id': '1921934-4308203423-4944107'}], 'stat': 'ok'}
This was supposed to be in json format, but there was some issue which led it to be saved like this.
Now, I want to extract specific strings from these files. For example, I want the following: text bokehlicious, bokehlicious2 and so on as a cell array for this file.
I tried using textscan, but this does not have any proper format and so on, so I'd like to know how to extract all the strings after all occurrences of 'text' in the textfile.
Could you give any inputs on how to do this? Thanks
Try to extract it with regexp.
fid = fopen('...yourpath\textFile.txt','r');
str = fread(fid,inf,'uint8=>char')';
str = strrep(str,'''','');
textStr = regexp(str,'(?<=text:\s*)\w*','match');
If you for example want the 'id' ou use
regexp(str,'(?<=id:\s*)\w*','match');
instead