image machine-learning deep-learning classification yolo

Should I use YOLO in this case?

everyone. I'm new to deep learning. My task is to decide if a soccer ball is inside an image(images are extracted from videos), just give true or false.

In this case, is YOLO the best choice to solve this problem? I do not need bounding boxes. And the class number is 1, only soccer ball. So it is a two-class classification problem(contains a ball or not).
If I use YOLO, when training, do I need to train images which do not contain a ball(thus no object)?
What is a reasonable data size? I feel that 500000 is just too big a number.
What is the best way to annotate? I have thousands of images(in fact, 500000), it is almost impossible to annotate by hand. Is there some automatic annotating tools?
English is not my first language. I want to find similar projects and learn. But my description of the task is not good, and cannot get proper answers. Could you please provide me with a more precise description so that I can find similar projects?

It will be great if you can tell me what I can read to solve my questions. Thanks.

Solution

YOLO is an overkill for such a need, as you require image classification and not object detection. For this reason, it will also lead to worse results. There are plenty of good choices suitable for it. You can see the leaderboards in this area here. Popular choices now are SWIN transformer, and EfficientNet.