Search code examples
pythonlarge-language-modelpython-camelotpdf-extraction

Text extraction from PDF , ConnectError: [WinError 10061] No connection could be made because the target machine actively refused it


I'm trying to extract tables and text from pdf and them ask questions regarding the pdf's with the help of llms . However when i run the code , it shows 10061 error , I think this is because I'm using camelot to extract tables, previously there was error of ghostscript and now this comes up .

st.title("Chat with Your PDF Documents - Enhanced Edition")

# Initialize embeddings in session state
if "vector" not in st.session_state:
    st.session_state.embeddings = OllamaEmbeddings(model='nomic-embed-text')

uploaded_file = st.file_uploader("Upload a PDF document", type=["pdf"])

if uploaded_file is not None:
    with open("uploaded_document.pdf", "wb") as f:
        f.write(uploaded_file.read())
    try:
        loader = PyPDFLoader("uploaded_document.pdf")
        docs = loader.load()
        text_content = "\n\n".join([doc.page_content for doc in docs])
    except Exception as e:
        st.error(f"Failed to extract text from PDF: {e}")
        text_content = ""

    # **Extract Tables**
    try:
        tables = camelot.read_pdf("uploaded_document.pdf", pages="all")
        table_content = ""
        if tables:
            for i, table in enumerate(tables):
                table_content += f"\n\nTable {i + 1}:\n{table.df.to_string(index=False)}"
        else:
            st.warning("No tables found in the PDF.")
    except Exception as e:
        st.error(f"Failed to extract tables from PDF: {e}")
        table_content = ""

    combined_content = text_content + table_content

    st.session_state.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    st.session_state.documents = st.session_state.text_splitter.split_text(combined_content)
    st.session_state.vector = FAISS.from_texts(st.session_state.documents, st.session_state.embeddings)

else:
    # Fallback to directory loading
    directory_path = st.text_input("Enter the path of your local document directory")
    if directory_path:
        loader = DirectoryLoader(directory_path, glob="*.txt")
        st.session_state.docs = loader.load()


Solution

  • Error 10061 means that a connection could not be made, because the target machine actively refused it.

    In other words, you're trying to connect to some machine (possibly the machine itself, localhost), but this does not work because the same machine is not listening. This has nothing to do with PDFs; the target machine is refusing to allow you to specify whether you have a PDF, let alone see whether it is a correct PDF or not.

    Camelot, per se, does not have to "listen" to anything and therefore is not the problem. Perhaps you have installed Excalibur (a separate front-end from Camelot), which should be listening on port 5000.

    If that is the case, either the uploading script is not connecting to localhost:5000 but to something else, or the Excalibur is not configured on port 5000 -- or maybe it is not even running; it might not have been properly installed or not activated.

    You can confirm this by running a CMD terminal and entering this command:

     netstat -na | find "5000"
    

    If Excalibur is ready you ought to see something like

     TCP    127.0.0.1:5000      0.0.0.0:0      LISTENING
    

    Better yet, review your code and pinpoint who and where is connecting to anywhere.

    Check out and detail your actual setup in the question, then we can see better what is happening.