Last modified (yyyy-mm-dd): 2020.05.04
PDF-HTM-NDX, Convert PDF to content-indexed HTML (full text search)
The open source pdftohtml program first converts the PDF file to HTML and then our ndx_ml program indexes it on each word (full-text search).
Searching for a keyword is triggered by double-clicking on any word.
The program is intended especially for users (civil servants) who make extensive use of PDF files from the government portals or for internal communication.
However, since the program also satisfactorily indexes more complex PDF documents, such as catalogs, it is also very useful for scrutinizing them.
In some cases (complex PDFs), the program does not convert to HTML format with exactly the same appearance and content of the original.
Users accept these risks by using the program, so we assume no liability for damage resulting from program errors.
The program (paid version) is stored on a USB stick and only works if it recognizes the USB stick as an external storage medium with the appropriate license information.
The demo version works without the presence of a USB stick and is freely available at the following link: ndx_ml.7z
Demo of the Indexed Document: EU_legislation
Demo of the Indexed Catalog : Greece_culture
Demo of the Indexed Document: (US) Guidance on Preparing Workplaces for COVID-19
The program does not connect to the Internet, not display ads, or do anything that could compromise the security and privacy of the user's data.
The program communicates the internal information in English, otherwise it provides the output files with instructions and comments in the following languages:
We are looking for a publisher and distributor
Interested publishers are invited to participate, especially if they have experience with Amazon.