Create PDFs using Python and xhtml2pdf
Hi Everyone!. In this post, I want to share with you a little guide that will show you how to create pdf files using Python and xhtml2pdf.
The xhtml2pdf lib is used to create pdf files from HTML files. It’s just a guide that I made for myself, but I want to share it with you.
WARNING: Also, this is a guide I did with experimental purposes, I didn’t use it in a production environment. Therefore, you must have in mind that its use in production could have issues. A better alternative to create PDF files is http://weasyprint.org/.
Requirements
- Python (I use Python 3.8.2)
- Pip (I use Pip 20.1.1)
Setup Project
To create our project we are going to use virtualenv, to create an isolated python environment for this project. However, you can also use pyenv or venv.
So, first, we have to install venv
.
pip3 install virtualenv
Now, we have to create the project folder and set up the virtualenv.
# Creating a project folder
mkdir pdfs-example
cd pdfs-example
# Creating the virtual environment
virtualenv env
# Activate the virtual environment
source env/bin/activate
# Create our main file
touch main.py
NOTE: To exit the environment you just have to write deactivate
.
Install dependencies
To create our PDF files we need to install the xhtml2pdf
library. This library, also depends on html5lib
and reportlab
.
pip install reportlab # https://pypi.org/project/reportlab/
pip install html5lib # https://pypi.org/project/html5lib/
pip install xhtml2pdf
NOTE: We need an xhtml2pdf version higher than 0.1a1
to work with Python3.
We can see the installed dependencies with the following command.
# Installed dependencies
pip freeze
# The above mentioned command will list something like the following
html5lib==1.1
Pillow==7.2.0
PyPDF2==1.26.0
reportlab==3.5.50
six==1.15.0
webencodings==0.5.1
xhtml2pdf==0.2.4
We can also export our dependencies.
pip freeze > requirements.txt
And install our dependencies from a requirements.txt
file.
pip install -r requirements.txt
Generate PDF from string
Now that we have the necessary modules installed, we can start writing code. First, we must import the xhtml2pdf module, which will help us to create our PDF files.
# main.py
# import section ....
from xhtml2pdf import pisa # import python module
# ....
Now, we can define some constants.
# main.py
# Constants section ....
# Content to write in our PDF file.
SOURCE = "<html><body><p>PDF from string</p></body></html>"
# Filename for our PDF file.
OUTPUT_FILENAME = "test.pdf"
# ....
Ok, We will create a base function to reuse in the other functions and avoid code duplication.
# main.py
# Methods section ....
def html_to_pdf(content, output):
"""
Generate a pdf using a string content
Parameters
----------
content : str
content to write in the pdf file
output : str
name of the file to create
"""
# Open file to write
result_file = open(output, "w+b") # w+b to write in binary mode.
# convert HTML to PDF
pisa_status = pisa.CreatePDF(
content, # the HTML to convert
dest=result_file # file handle to recieve result
)
# close output file
result_file.close()
result = pisa_status.err
if not result:
print("Successfully created PDF")
else:
print("Error: unable to create the PDF")
# return False on success and True on errors
return result
# ....
Once we have the base function we can create our from_text
function.
# main.py
# Methods section ....
def from_text(source, output):
"""
Generate a pdf from a plain string
Parameters
----------
source : str
content to write in the pdf file
output : str
name of the file to create
"""
html_to_pdf(source, output)
# ....
Our main
function will be the following.
# main.py
# import section ....
import sys
# Main section ...
if __name__ == "__main__":
if len(sys.argv)> 1 :
if sys.argv[1] == '--help':
print('Info: ')
print('--help List the options to send an email')
print('--text Create a PDF file from a string')
print('--template Create a PDF file from a template')
elif sys.argv[1] == '--text':
print("Creating a PDF file from a string")
from_text(SOURCE, OUTPUT_FILENAME)
else:
print("Please give the type of message to send.")
print("For help execute `python main.py --help`")
We can test our function by executing the following command in our terminal.
python main.py --text
# Creating a PDF file from a string
# Successfully created PDF
Generate PDF from template
Here, we will generate a PDF file using an HTML template. We have to keep in mind that xhtml2pdf supports until HTML4. So, first, we have to create an HTML file that will behave as a template for our PDF file.
touch template.html
And we will define a simple html template.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>PDF Generator</title>
</head>
<body>
<h1 style="color:red;">First PDF</h1>
<h2 style="color:blue;">PDF with html template</h2>
<p>John</p>
<p>Snow</p>
<p>35</p>
</body>
</html>
Create a new constant to define our template file.
# main.py
# Constants section ....
# Template file name
TEMPLATE_FILE = "template.html"
# ....
Now, We can create our function to read the template and create the PDF file.
# main.py
# Methods section ....
def from_template(template, output):
"""
Generate a pdf from a html file
Parameters
----------
source : str
content to write in the pdf file
output : str
name of the file to create
"""
# Reading our template
source_html = open(template, "r")
content = source_html.read() # the HTML to convert
source_html.close() # close template file
html_to_pdf(content, output)
# ....
Add the option to our main function.
# main.py
# Main section ...
if __name__ == "__main__":
# ....
if len(sys.argv)> 1 :
# if ....
elif sys.argv[1] == '--template':
print("Creating a PDF file from a template")
from_template(TEMPLATE_FILE, OUTPUT_FILENAME)
else:
# ....
We can test our function by executing the following command in our terminal.
python main.py --template
# Creating a PDF file from a template
# Successfully created PDF
Final Words
Thanks for reading this post and you can find the code of this guide here.