To put it simple, how to increase the cap of nvarchar(MAX) to actually hold 280MB of text and not just 8000MB (correct me if I'm wrong)?
So, for my finals project I'm making a web-crawler for a client that wants its own customized search engine for their library website, but my problem arises when i try to store the infomation that the crawlers retrieve.
Specifically the problem I have is that even tho I set the column "HTML" to nvarchar(MAX), which should be able to hold 2GB of data, it wont save any infomation to it, in this case 280MB, cause it's too long.
I did try shortening the length of the text to be saved and when I made it sufficiently short enough it finally agreed to save the data, so from what I can understand it's capped.
EDIT: Code examples as requested
page container class:
public class Page
{
public int ID = -1;
public String URL;
public String HeadLine;
public List<String> Tags;
public String Description;
public String HTML;
public DateTime lastUpdate;
}
Code snippet when crawler saves the page that it has retrieved:
//Save Page content to Database
Page page = new Page();
page.URL = url;
page.HeadLine = headline;
page.Tags = tags.Split(',').Where(s => !string.IsNullOrWhiteSpace(s)).ToList();
page.Description = description;
page.HTML = HTML;
page.lastUpdate = DateTime.Today;
new DBpage(Settings.instance.DBaddress,
Settings.instance.DBname).SavePage(page);
Method used for storing the data:
public void SavePage(Page page) {
String SqlString = "";
//Check is a page by the given URL already exists in the database and assign the SQL string acordingly
Page foundPage = GetPage(page.URL);
if(foundPage == null) {
SqlString = "INSERT INTO WebContent " +
"VALUES (@URL, @HeadLine, @Tags, @Description, @HTML, @LastUpdate)";
}
else {
SqlString = "UPDATE WebContent " +
"SET URL = @URL, HeadLine = @HeadLine, Tags = @Tags, Description = @Description, HTML = @HTML, LastUpdate = @LastUpdate " +
//"SET URL = '" + page.URL + "', HeadLine = '" + page.HeadLine + "', Tags = '" + String.Join(",", page.Tags) + "', Description = '" + page.Description + "', HTML = '" + page.HTML.Replace("'", "''") + "', LastUpdate = " + page.lastUpdate + " " +
"WHERE ID = " + foundPage.ID;
}
//Assign all variables and execute the SQL
try {
using(DBaccess db = new DBaccess(dblocation, dbname)) {
String html = page.HTML.Replace("'", "''"); //Replace all single quotes with double "single quotes" to escape the first single quote.
SqlCommand sqlCmd = db.GetSqlCommand(SqlString);
sqlCmd.Parameters.AddWithValue("@URL", page.URL);
sqlCmd.Parameters.AddWithValue("@HeadLine", page.HeadLine);
sqlCmd.Parameters.AddWithValue("@Tags", String.Join(",", page.Tags));
sqlCmd.Parameters.AddWithValue("@Description", page.Description);
sqlCmd.Parameters.AddWithValue("@HTML", html);
sqlCmd.Parameters.AddWithValue("@LastUpdate", page.lastUpdate);
sqlCmd.ExecuteNonQuery();
}
}
catch(SqlException e) {
Console.WriteLine(e.Message);
}
}
nvarchar(max)
type does allow to store up to 2GB of data. For nvarchar it means about 1 billion characters, because N
types store text in 2-bytes per character unicode.
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^30-1 characters. The maximum storage size in bytes is 2 GB. The actual storage size, in bytes, is two times the number of characters entered + 2 bytes.
Most likely your problem is somewhere in the procedure that tries to INSERT such large text. The first thing that comes to mind is some timeout. It will take a while to upload 280MB of data to the server, so examine the details of failure (look through the error messages and exceptions) to gather clues of what is going wrong.
Few things to check:
Double check the type of the HTML
column in the database.
Maybe SSMS doesn't display the long value correctly. Try to run
SELECT LEN(HTML) FROM YourTable
to verify the length of the stored string.
Overall, just step through the code in the debugger and verify that all variables have expected values.