This site uses cookies to improve the user experience.
By continuing to browse the site, you agree to the use of cookies.

Learn more about cookies (https://en.wikipedia.org/wiki/HTTP_cookie)

CONVERT PDF FILE TO CONTENT-INDEXED HTML (full-text search)


Last modified (yyyy-mm-dd): 2020.09.09


CONVERT PDF FILE TO CONTENT-INDEXED HTML (full-text search)
The open source pdftohtml program first converts the PDF file to HTML and then our pdf_ndx_ml program indexes it on each word (full-text search).
Searching for a keyword is triggered by double-clicking on any word.
The program is intended especially for users (civil servants) who make extensive use of PDF files from the government portals or for internal communication.
However, since the program also satisfactorily indexes more complex PDF documents, such as catalogs, it is also very useful for scrutinizing them.

In some cases (complex PDFs), the program does not convert to HTML format with exactly the same appearance and content of the original.
Users accept these risks by using the program, so we assume no liability for damage resulting from program errors.
The program (paid version) is stored on a USB stick and only works if it recognizes the USB stick as an external storage medium with the appropriate license information.
The demo version works without the presence of a USB stick and is freely available at the following link: pdf_ndx_ml.7z

Demo of the indexed document: EU_legislation
Demo of the indexed catalog : Greece_culture
Demo of the indexed document: (US) Guidance on Preparing Workplaces for COVID-19
Demo of the professional article: professional article


The program does not connect to the Internet, not display ads, or do anything that could compromise the security and privacy of the user's data.


WE ARE LOOKING FOR A PUBLISHER AND DISTRIBUTOR

Interested publishers are invited to participate, especially if they have experience with Amazon.
publish@pdf-htm-ndx.info

Useful links:
www.content-explorer.info (library offer)
www.txt-htm-ndx.info (converting TXT to HTML and indexing))
pdfmerger (merge PDF files)