I am trying to fill-up page numbers of a Book in its Index Wikisource page. The following code writes well in the specific pageNumber parameter.
If the page is empty, it looks fine. But if i run the code another time, due to the concatenation the 67 becomes 6767. How can i know that the pageNumber parameter ('|Number of pages='
) is empty? or If the parameter already filled how can i set the skip option in the code.
The writing code;-
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pywikibot
indexTitle = 'அட்டவணை:தமிழ் நாடகத் தலைமை ஆசிரியர்-2.pdf'
indexPages = '67'
site1 = pywikibot.Site('ta', 'wikisource')
page = pywikibot.Page(site1, indexTitle)
indexTitlePage = page.text.replace('|Number of pages=','|Number of pages='+indexPages)
page.save(summary='67')
you can use re
- the regular expression library to search for a pattern:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pywikibot
import re
indexTitle = 'அட்டவணை:தமிழ் நாடகத் தலைமை ஆசிரியர்-2.pdf'
indexPages = '67'
site1 = pywikibot.Site('ta', 'wikisource')
page = pywikibot.Page(site1, indexTitle)
print(page.text)
res = re.compile('\|Number of pages= *(\d+)').search(page.text)
if res:
print("number of pages is already assign to %s" % res.group(1))
else:
indexTitlePage = page.text.replace('|Number of pages=','|Number of pages='+indexPages)
page.save(summary='67')
Also, if you are dealing with processing utf8 text, it's better to move to python3 as it has much better support for that.