Using spreadsheetgear, is there any way to get the "probable" data type for a column, exclusive of the header row (if one exists) and reasonably tolerant of sparse population without having to do a sample myself...is there already a way to do this?
so for example if I had an excel row like
| Customers | Sales Item | Sale Date | Contact | Quantity |
| IBM | Keyboard | 28-10-2011 | | 2 |
| MS | Mouse | 27-09-2011 | joe | 5 |
I would expect to see
String, String, DateTime, String, Numeric
EDIT
So I ended up having to sample like @Tim Anderson suggested, but I needed to handle the case of sparse data, and default to string when conflicting types in the col. (this is called in a loop that walks the cols, I can't post that as it contains some IP) DataValueType is just a local enum and rowcount is the number of rows to sample and because I am already sampling I simply ignore row 0 in case it's a header row.
private DataType GetDataTypeFromColRange(IRange range, int rowcount, int col)
{
var dtlist = GetValueTypes(range, rowcount, col).Distinct();
// If conflicting types for the col default to string.
if (dtlist.Count() != 1)
{
return new DataType(DataTypeValue.String);
}
else
{
return new DataType(dtlist.First());
}
}
private IEnumerable<DataTypeValue> GetValueTypes(IRange range, int rowcount, int col)
{
for (int i = 1; i < rowcount; i++)
{
switch (range[i, col].ValueType)
{
case SpreadsheetGear.ValueType.Text:
yield return DataTypeValue.String;
break;
case SpreadsheetGear.ValueType.Number:
if (range[i, col].NumberFormatType == NumberFormatType.Date || range[i, col].NumberFormatType == NumberFormatType.DateTime)
{
yield return DataTypeValue.Date;
}
else
{
yield return DataTypeValue.Numeric;
}
break;
case SpreadsheetGear.ValueType.Logical:
yield return DataTypeValue.Bool;
break;
default: // ignore empty or errored cells.
continue;
}
}
}
I am sure this can be further improved so please feel free to post improvements, but this does what I need for now.
No helper method or other API exists in SpreadsheetGear to automatically return the “probable data type” for a column of values. It would not be very difficult to implement something like this to meet your own particular requirements, although there’s no way to do this without “sampling” the data. Below is a very simple method that accepts the range to check and a Boolean that indicates whether or not the range includes a header row. All it does is check the first row of data to determine the type; you might want to build something a little more robust:
private SpreadsheetGear.ValueType[] GetColumnTypes(IRange range, bool hasHeader)
{
SpreadsheetGear.ValueType[] columnTypes = new SpreadsheetGear.ValueType[range.ColumnCount];
for (int i = 0; i < range.ColumnCount; i++)
{
columnTypes[i] = range[hasHeader ? 1 : 0, i].ValueType;
}
return columnTypes;
}
One thing you should be aware of, however, is that SpreadsheetGear uses the same basic internal data types as Excel and will return these types when checking IRange.ValueType (these include Empty, Error, Logical, Number, Text). Note there is no DateTime. In your example this would impact the value type returned on the “Sale Date” column because dates/times are actually stored in Excel and SpreadsheetGear as a doubles representing a date/time serial number. So this type of value would return Number, not something like DateTime. The fact that they show up as a “date” in the cell is simply a function of the NumberFormat of the cell.