I have several text files with the following data structure:
{
huge
json
block that spans across multiple lines
}
--#newjson#--
{
huge
json
block that spans across multiple lines
}
--#newjson#--
{
huge
json
block that spans across multiple lines
} etc....
So it is actually json blocks that are row delimited by "--##newjson##--"
string .
I am trying to write a customer extractor to parse this. The problem is that I can't use string
data type to feed json deserializer because it has a maximum size of 128 KB and the json blocks do not fit in this. What is the best approach to parse this file using a custom extractor?
I have tried using the code below, but it doesn't work. Even the row delimiter "--#newjson#--"
doesn't seem to work right.
public SampleExtractor(Encoding encoding, string row_delim = "--#newjson#--", char col_delim = ';')
{
this._encoding = ((encoding == null) ? Encoding.UTF8 : encoding);
this._row_delim = this._encoding.GetBytes(row_delim);
this._col_delim = col_delim;
}
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
//Read the input by json
foreach (Stream current in input.Split(_encoding.GetBytes("--#newjson#--")))
{
var serializer = new JsonSerializer();
using (var sr = new StreamReader(current))
using (var jsonTextReader = new JsonTextReader(sr))
{
var jsonrow = serializer.Deserialize<JsonRow>(jsonTextReader);
output.Set(0, jsonrow.status.timestamp);
}
yield return output.AsReadOnly();
}
}
Here is how you can achieve the solution:
1) Create a c# equivalent of your JSON object Note:- Assuming all your json object are same in your text file. E.g:
Json Code
{
"id": 1,
"value": "hello",
"another_value": "world",
"value_obj": {
"name": "obj1"
},
"value_list": [
1,
2,
3
]
}
C# Equivalent
public class ValueObj
{
public string name { get; set; }
}
public class RootObject
{
public int id { get; set; }
public string value { get; set; }
public string another_value { get; set; }
public ValueObj value_obj { get; set; }
public List<int> value_list { get; set; }
}
2) Change your de-serializing code like below after you have done the split based on the delimiter
using (JsonReader reader = new JsonTextReader(sr))
{
while (!sr.EndOfStream)
{
o = serializer.Deserialize<List<MyObject>>(reader);
}
}
This would deserialize the json data in c# class object which would solve your purpose. Later which you can serialize again or print it in text or ...any file.
Hope it helps.