Search code examples
htmlvbaexcelio

Read Local HTML File into String With VBA


This feels like it should be simple. I have a .HTML file stored on my computer, and I'd like to read the entire file into a string. When I try the super straightforward

Dim FileAsString as string 

Open "C:\Myfile.HTML" for input as #1
Input #1, FileAsString
Close #1

debug.print FileAsString

I don't get the whole file. I only get the first few lines (I know the immediate window cuts off, but that's not the issue. I'm definitely not getting the whole file into my string.) I also tried using an alternative method using the file system object, and got similar results, only this time with lots of weird characters and question marks thrown in. This makes me think it's probably some kind of encoding issue. (Although frankly, I don't fully understand what that means. I know there are different encoding formats and that this can cause issues with string parsing, but that's about it.)

So more generally, here's what I'd really like to know: How can I use vba to open a file of any extension (that can be viewed in a text editor) and length (that's doesn't exceed VBA's string limit), and be sure that whatever characters I would see in a basic text editor are what gets read into a string? (If that can't be (easily) done, I'd certainly appreciate being pointed towards a method that's likely to work with .html files) Thanks so much for your help

EDIT: Here's an example of what happens when I use the suggested method. Specifically

    Dim oFSO As Object
    Dim oFS As Object, sText As String

    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFS = oFSO.OpenTextFile(Path)

    Do Until oFS.AtEndOfStream
        sText = oFS.ReadAll()
    Loop
    FileToString = sText

    Set oFSO = Nothing
    Set oFS = Nothing

End Function

I'll show you both the beginning (via a message box) and the end (via the immediate window) because both are weird in different ways. In both cases I'll compare it to a screen capture of the html source displayed in chrome:

Beginning: enter image description here

enter image description here

End: enter image description here

enter image description here


Solution

  • Okay so I finally managed to figure this out. The VBA file system object can only read asciiII files, and I had saved mine as unicode. Sometimes, as in my case, saving an asciiII file can cause errors. You can get around this, however, by converting the file to binary, and then back to a string. The details are explained here http://bytes.com/topic/asp-classic/answers/521362-write-xmlhttp-result-text-file.