import pytest
import nlu
@pytest.mark.parametrize("text", ["我喜欢美妆里面的碧唇果酸面膜!"])
def test_tokenizer_sentence(text):
tokens = nlu.nlp(text)
print (tokens)
assert len(tokens['tokens']['words']) == 9
From the unit pytest above, the output is below if I run it:
test_tokenizer.py::test_tokenizer_sentence[\u6211\u559c\u6b22\u7f8e\u5986\u91cc\u9762\u7684\u78a7\u5507\u679c\u9178\u9762\u819c!] PASSED [100%]{'text': '我喜欢美妆里面的碧唇果酸面膜!', 'tokens': {'words': ['我', '喜欢', '美妆', '里面', '的', '碧唇', '果酸', '面膜', '!']}}
Is there a parameter of pytest.mark.parametrize that can make the unicode to be shown as valid Chinese characters?
\u6211\u559c\u6b22\u7f8e\u5986\u91cc\u9762\u7684\u78a7\u5507\u679c\u9178\u9762\u819c!
I am using pytest in PyCharm.
displaying non-ascii test ids has been a particularly tricky part of the pytest codebase
at the time of writing they are off by default but you can enable them via an experimental flag in pytest.ini
:
[pytest]
disable_test_id_escaping_and_forfeit_all_rights_to_community_support = true
$ pytest t.py -v
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /tmp/y/venv/bin/python
cachedir: .pytest_cache
rootdir: /tmp/y, configfile: pytest.ini
collected 1 item
t.py::test_tokenizer_sentence[我喜欢美妆里面的碧唇果酸面膜!] PASSED [100%]
============================== 1 passed in 0.00s ===============================
disclaimer: I am one of the pytest core devs