Search code examples
c#.netwindowsprocessscreen-scraping

C# - reading text off of an existing process


We are having to read text off of an existing VB6 application. So we use the methods FindWindow, GetWindowText, and EnumChildWindows out of kernel32 and can enumerate and read the displayed text in this process.

We are able to read 90% of the text with our method, but there is a specific control (or box) in general that we cannot read.

We cannot target the text we need to read with UI spy-type programs, so I assume they must be rendering it directly to the screen with GDI/GDI+. They cannot be using a control or window to render the text we need.

Is there a way to determine how they are rendering the text, and possibly read it?

We do not want to grab the hDC of the window and render it onto a bitmap and somehow reverse-CAPTCHA the text... that could be a nightmare.

SOLUTION: We discovered it is possible for use to merely look for 2-3 phrases in this box versus actually OCR-ing the text. So we are going to render it to a bitmap and compare it with 2-3 pre-stored bitmaps so we can merely compare pixel by pixel.

Top answer brought us to this solution.


Solution

  • If they're drawing direct to a surface, there's no way to get the text without some weird OCR stuff.

    Update: after thinking about your problem, I think that doing what you describe (grabbing the window's hDC and creating a bitmap from it) would be a relatively easy task (relative to trying to intercept the API calls that were rendering the text in the first place).

    It wouldn't be as difficult as doing OCR on handwriting, for example. As long as you can determine the font used by the Visual Basic 6 application to draw the text, and as long as the text you want to scrape is drawn to the same location on the form each time, it would be relatively easy to break the drawn text up into discrete characters (as tiny little bitmaps) and then compare each one to a pre-generated collection of characters that you've drawn with the same font at the same size. The characters would match perfectly on a pixel-by-pixel basis.

    There might be a problem if the program runs on different systems and draws the text with different fonts.