Search code examples
pythondjangourl-encoding

How do I convert percent encoded slug fields to unicode when migrating from WordPress to Django?


I am migrating Wordpress Data into Django. The current site title, content, slug are in Hindi.

I am using wordpress_xmlrpc to import data from WordPress via xmlrpc.

All the content title, content are in Hindi as well, and they are accessed and saved Okay.

    instance.name = post.title
    instance.content = post.content

While there is issue with Django slugs.

Have the tried the following, does not work.

    instance.slug = unicode(post.slug)
    instance.slug = post.slug

For not working, in one case, slug is saved as

     %e0%a4%9c%e0%a4%b2%e0%a5%8d%e0%a4%a6-%e0%a4%b8%e0%a4%bf%e0%a4%b2%e0%a5%8d%e0%a4%b5%e0%a4%b0-%e0%a4%b8%e0%a5%8d%e0%a4%95%e0%a5%8d%e0%a4%b0%e0%a5%80%e0%a4%a8-%e0%a4%aa%e0%a4%b0-%e0%a4%a6%e0%a4%bf

It is not accessible either.

Getting 404, Page not found for

 http://localhost:8010/%E0%A4%9C%E0%A4%B2%E0%A5%8D%E0%A4%A6-%E0%A4%B8%E0%A4%BF%E0%A4%B2%E0%A5%8D%E0%A4%B5%E0%A4%B0-%E0%A4%B8%E0%A5%8D%E0%A4%95%E0%A5%8D%E0%A4%B0%E0%A5%80%E0%A4%A8-%E0%A4%AA%E0%A4%B0-%E0%A4%A6%E0%A4%BF/

The WordPress has slug like /तापसी-पन्नू-ने-अक्षय-कुमा/

Does anybody know how to resolve this issue.


Solution

  • To convert percent encoding in the migration, you can use django's built in uri_to_iri function.

    >>> from django.utils.encoding import uri_to_iri
    >>> old_slug = '%e0%a4%9c%e0%a4%b2%e0%a5%8d%e0%a4%a6-%e0%a4%b8%e0%a4%bf'
    >>> new_slug = uri_to_iri(old_slug)
    >>> print(old_slug, '->', new_slug)
    
    %e0%a4%9c%e0%a4%b2%e0%a5%8d%e0%a4%a6-%e0%a4%b8%e0%a4%bf -> जल्द-सि
    

    This should convert the wordpress slug to unicode:

    instance.slug = uri_to_iri(post.slug)