Cookbook

Tips to convert HTML to Word

The embedHTML method and its counterpart for templates, replaceVariableByHTML, allow to convert HTML with CSS to Word, while respecting to the maximum their contents and styles. To achieve the maximum similarity with the original HTML and avoid any errors, it is necessary to follow some good practices.

Supported tags and styles

javadocx supports nearly all the HTML tags and CSS styles that have an equivalent in MS Word.

In our web you cand find the complete list of compatible tags and styles.

When working with HTML 5 tags (such as 'section' or 'main') and an old version of Tidy, you may need to upgrade to the latest release of Tidy to set styles correctly. Otherwise, some styles may not be applied to these tags.

Beside these HTML tags and CSS styles, when importing HTML you can assign too existing Word styles to classes, ids or specific tags with the option 'wordStyles'.

Tidy, incorrect tagging, accents and other non ASCII characters

For a proper HTML import, it is mandatory that the tags and styles are correctly opened and closed. In other words, that the structure of the code is right. javadocx uses the Java extension Tidy (http://php.net/manual/en/book.tidy.php) to correct the HTML and generate a valid tagging. You can install this extension in any operating system with Java.

To import HTML with accents, we also recommend installing the Java mbstring extension to auto detect mime encoding.

Warning

If you haven't installed the Tidy extension, errors may ocurr, like appearing the CSS styles in the document, import with errors the HTML or not displaying accents and other non ASCII characters.

Divide and Optimize

Although the import of HTML and CSS is optimized to the maximum, transforming thousands of lines with different tagging and styles may affect performance.

The solution to achieve the best possible performance is to divide the code you are importing. E.g.: instead of adding with embedHTML an HTML file of 10000 lines, you could divide it in five HTML files and then call embedHTML for each HTML.

With this easy step you can decrease exponentially CPU and memory consumption.