I'm relatively new to Python and really new to pytest
. Anyways, I'm trying to write some tests for parsing tweets that are in line-delineated json. Here's a simplified example test_cases.jsonl
:
{"contributors":null,"coordinates":null,"created_at":"Sat Aug 20 01:00:12 +0000 2016","entities":{"hashtags":[{"indices":[97,116],"text":"StandWithLouisiana"}]}}
{"contributors":null,"coordinates":null,"created_at":"Sat Aug 20 01:01:35 +0000 2016","entities":{"hashtags":[]}}
What I would like to do is test a function like the following:
def hashtags(t):
return ' '.join([h['text'] for h in t['entities']['hashtags']])
I can test a single line of the JSON as follows:
@pytest.fixture
def tweet(file='test_cases.jsonl'):
with open(file, encoding='utf-8') as lines:
for line in lines:
return json.loads(line)
def test_hashtag(tweet):
assert hashtags(tweet) == 'StandWithLouisiana'
(I'm just giving the file name as the argument for the function for this example)
This works in the sense that the test passes because the first line passes the test, but what I'm basically trying to do is something like this and I don't expect this to work as it is written.
def test_hashtag(tweet):
assert hashtags(tweet) == 'StandWithLouisiana' # first tweet
assert hashtags(tweet) == '' # second tweet
This fails because it tests whether the first tweet (line in the json) is empty, not the second. I assume that's because of the return
in the fixture, but if I try to yield
instead of return
, I get a yield_fixture function has more than one 'yield'
error` (and the second line still fails).
What I'm doing now to get around this issue is to make each line a separate JSON file and then creating a separate fixture for each of them.
(For shorter examples, I'm using StringIO
to write the JSON inline).
This does work but feels inelegant. I have a feeling that I should use @pytest.mark.parametrize
for this, but I can't get my head around it. I think I also tried pytest_generate_tests
to do this as well, but it would up testing every key. Is it possible to do what I'm thinking of, or is it better to create separate fixtures when I have different values for the assertions?
I think the most fitting approach for you would be parametrizing the fixture:
import json
import pathlib
import pytest
lines = pathlib.Path('data.json').read_text().split('\n')
@pytest.fixture(params=lines)
def tweet(request):
line = request.param
return json.loads(line)
def hashtags(t):
return ' '.join([h['text'] for h in t['entities']['hashtags']])
def test_hashtag(tweet):
assert hashtags(tweet) == 'StandWithLouisiana'
This will invoke test_hashtag
once with each returned value of tweet
:
$ pytest -v
...
test_spam.py::test_hashtag[{"contributors":null,"coordinates":null,"created_at":"Sat Aug 20 01:00:12 +0000 2016","entities":{"hashtags":[{"indices":[97,116],"text":"StandWithLouisiana"}]}}]
test_spam.py::test_hashtag[{"contributors":null,"coordinates":null,"created_at":"Sat Aug 20 01:01:35 +0000 2016","entities":{"hashtags":[]}}]
...
You can include the expected value into tweet
fixture parameters, which are then passed through to the test unchanged. In the below example, the expected tags are zipped with the file lines to build pairs of the form (line, tag)
. The tweet
fixture loads the line into a dictionary, passing the tag through, so the tweet
argument in the test becomes a pair of values.
import json
import pathlib
import pytest
lines = pathlib.Path('data.json').read_text().split('\n')
expected_tags = ['StandWithLouisiana', '']
@pytest.fixture(params=zip(lines, expected_tags),
ids=tuple(repr(tag) for tag in expected_tags))
def tweet(request):
line, tag = request.param
return (json.loads(line), tag)
def hashtags(t):
return ' '.join([h['text'] for h in t['entities']['hashtags']])
def test_hashtag(tweet):
data, tag = tweet
assert hashtags(data) == tag
The test run yields two tests as before:
test_spam.py::test_hashtag['StandWithLouisiana'] PASSED
test_spam.py::test_hashtag[''] PASSED
Another and probably a more clean approach would be to let the tweet
fixture only handle the parsing the tweet from the raw string, moving the parametrization to the test itself. I'm using the indirect parametrization to pass the raw line to the tweet
fixture here:
import json
import pathlib
import pytest
lines = pathlib.Path('data.json').read_text().split('\n')
expected_tags = ['StandWithLouisiana', '']
@pytest.fixture
def tweet(request):
line = request.param
return json.loads(line)
def hashtags(t):
return ' '.join([h['text'] for h in t['entities']['hashtags']])
@pytest.mark.parametrize('tweet, tag',
zip(lines, expected_tags),
ids=tuple(repr(tag) for tag in expected_tags),
indirect=('tweet',))
def test_hashtag(tweet, tag):
assert hashtags(tweet) == tag
The test run now also yields two tests:
test_spam.py::test_hashtag['StandWithLouisiana'] PASSED
test_spam.py::test_hashtag[''] PASSED