I have a PHP script which builds a sitemap (a XML file accodring to the standard sitemap structure).
My question is about improving it. As you you, a website has new posts daily. Also post may be edited several times per hour/day/month or whenever. I have two strategy to handle that:
Making a new PHP script which parse that XML file and finds the node and modify it when the post is edited and add a new node when a new post is added (it needs to count the number of all nodes before inserting a new one, since a sitemap file can has 50,000 URL utmost).
Exucuting my current PHP script according to a specific period daily (i.e every night on midnight) using a Cron-Jobs. It means rebuilding it from the scratch every time (actually building a new sitemap every night)
Ok which strategy is more optimal and profitable? Which one is the standard approach?
Modifying a XML file has its dangers. One reason is that you need to compare and compile actions (replace, insert, delete). This is complex and the possibility of errors is high. Another problem is that sitemaps can be large, loading them into memory for modifications might not be possible.
I suggest you generate the XML sitemap in a cronjob. Do not overwrite the current sitemap directly but copy/link it after it is completed. This avoids having no sitemap at all if here is an error.
If you like to manage the URLs incrementally do so in an SQL table, treat the XML sitemap as an export of this table.