Search code examples
phpms-worddocxdoc

PHP reading word document to get only Email address


Ok , Here is the thing i want a php script to Open and read a user uploaded Word document and take the email addresses that will be in the document and store it to database .

Only the email address ! it will be mixed up like

Email : [email protected] or like "Email is [email protected]"

Any format .. One thing for sure is there will be space seperating email id and other words . Can someone help me ? :D


Solution

  • This is a bit broad really. Fundamentally, you need to handle these steps:

    Upload the word document

    You'll need to let users upload a file. There's a tutorial at w3schools which should get you started

    Parse the contents

    Office files are complex - each one is technically an entire file system as you can embed images, other documents, etc... The new .docx are actually just zip files with some XML - try renaming one to .zip and opening it. The old-style .doc is a proprietary MS format and while equally complex is far more obfuscated. This library appears to convert word files to html which may make reading them a lot easier.

    Find the email address

    I suspect your best chance here is to use a regex to extract the email address from the body. What about if there are multiple email addresses? Here's an introduction to email regexes which may be of some help. This answer is for the same thing

    For a more detailed answer, you're going to have to provide a more specific question.