I'm dealing with cleaning semantic .n3 and rdf files manually from comments and compacting and pretty printing those ones with Regex in C#.
However #
is a very common character in semantic files for describing resources.
Example code:
#Processed by Id: cwm.py,v 1.197 2007/12/13 15:38:39 syosi Exp
# using base http://www.prodigi.eu/instances
# Notation3 generation by
# notation3.py,v 1.200 2007/12/11 21:18:08 syosi Exp
# Base was: http://www.prodigi.eu/instances
@prefix : </ac-schema#> .
@prefix ins: </instances#> .
@prefix olanet: <http://www.ibermaticaindustria.com/soluciones/planta-mes-olanet#> .
@prefix plm: <http://hms.ifw.uni-hannover.de/#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ins:everyone a <http://xmlns.com/foaf/0.1/Group>;
:canSee ins:public;
rdfs:member <http://127.0.0.1/OslcOlanetProvider/api/producer/01>,
[...]
You can try this:
^\s*#.*$
and replace by empty
Assuming that, A comment will start with a #, or it may only be preceded by \r or \n or \t or \f \v or space
Read each file as string and call the following method and write in the file again.
Sample Code:
using System;
using System.Text.RegularExpressions;
..........
...........
public String removeHash(String input)
{
string pattern = @"^\s*#.*$";
string substitution = @"";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
return result;
}