I am using snscrape
to scrape users that have a certain keyword in their bio.
the algorithm that I am using right-now is the following:
now what I want to know is there a method that can immediately search for users based on their bio instead of what I am doing right-now i.e. simulating the advance search feature of the Twitter web page ?
I looked at snscrape
docs but all classes that deals with users appears to be only dealing with a specific user not search for users based on some query.
here is my code that I am currently running
import snscrape.modules.twitter as sntwt
query = "co founder (CEO OR Congrees OR CTO) lang:en"
tweets = []
limit = 5000
# instead of searching for tweets I want to search for users
for tweet in sntwt.TwitterSearchScraper(query).get_items():
print(vars(tweet))
print('\n\n\n\n')
# some code that filters the users
finally a screen-shot of Twitter advance-search that simulate the behavior I want.
Check out https://github.com/JustAnotherArchivist/snscrape/issues/263. As of this writing, this is still an open issue, but JustAnotherArchivist (the repository owner) appears to have committed an update a few weeks ago that allows this functionality (it might not be documented yet, or might not be reliable).
I think this requires the developer version of snscrape. So install/upgrade to that if you don't have it yet (from Medium article):
$ pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git
This then should allow the "--user" flag to work (I'm using snscrape from the command line; not sure about the Python wrapper). For example:
$ snscrape --jsonl --max-results 10 twitter-search --user "go bananas since:2022-12-31" > out_file.json
This seems to search for the query string "go bananas" anywhere in the user object. This returns user object for example with: 'username': 'gobananagoband' and 'displayname': 'Go Banana Go!' It also returns a user object with: 'description': "When games go BANANAS, we've got you covered in bunches. Tips? @ reply or bananasalert at gmail." (As far as I can tell, 'description', 'rawDescription', and 'renderedDescription' are all the user bio.)
I am not sure if you can just select for "description." I have not experimented much yet.
This does support some of the other operators / qualifiers. For example, geolocation (from list; within 100km of Twitter HQ):
$ snscrape --jsonl --max-results 10 twitter-search --user "elephant geocode:37.7,-122.4,100km lang:eng since:2022-12-31" > out_file.json