I am creating an app that capture the web-cam at a certain point(when an event is triggered), like taking a snapshot of the camera and encode the snapshot to base64. But looking at the example online, they first draw that snapshot to a canvas and then convert that canvas to base64. Is there a way to skip "drawing to canvas" part?
No, canvas is the only possible way to read pixels in a browser.
This is why the video stream must be drawn to the canvas first.