Q and A: How can I get my .EML (email) files crawled and indexed?


Hi Kalena,

The website I maintain is informational and features largely political news. Much material reaches me in the form of e-mails which I wish to upload and make available to visitors. Can you point me to a website search engine which will index the site’s contents, including the email (.eml) files. The Windows Search facility on my computer (Windows XP) does this quite competently but I have been unable to trace a similar web search engine with the appropriate filter which will index the eml files (some of which have attachments (mainly Word or PDF). I should be grateful for any guidance.

With thanks Ezra

Hi Ezra,

As you are probably aware (but for the sake of other readers) the .EML file extension is used for Mail Messages saved from Outlook Express. The main purpose of an EML file is to store e-mail messages (and as you have highlighted may include attachment data as well).  EML files can be used with most e-mail clients, but can not be viewed directly by web browsers.  However, since EML files are plain text and formatted much like MHT (MIME HTML) files, they can be opened directly in most popular browsers (Internet Explorer, Mozilla Firefox and Opera), by changing the file extension from .eml to .mht.

Although search engines do crawl and index a wide variety of filetypes (see the filetypes that Google can index) as far as I am aware no search engines crawl or index EML file types.

EML files typically include the e-mail addresses of the sender and the recipient so from a privacy/security perspective I would expect that you wouldn’t want these types of files to be indexed anyway (and if I were one of your information sources I’d probably be pretty annoyed if you published my email address).

I suggest that if you wish to publish (and have indexed) information that you receive by email, that you extract the relevant content and publish it in a format that is recognised by web browsers and search crawlers (e.g. HTML, PDF, DOC, or even TXT, etc..)

Andy Henderson
