3. Core APIs

This document describes the IVE_TANIM core API, which provides the low-level back-end for the CLI front ends described in Core Functionality. These modules live under ive_tanim.core.

3.1. autocrop_image module

This module provides low-level functionality that implements automatic cropping of lossy (PNG, JPEG, TIFF) and PDF images.

This PDF autocropping functionality is copied over from this repo. That repository is based off pdfcrop.pl to calculate the BoundingBox. This functionality requires a working ghostscript (using the gs executable) to calculate the bounding box, and the PyPDF2 module to read and manipulate PDF files.

https://upload.wikimedia.org/wikipedia/commons/2/2a/PDF_BOX_01.svg

Three methods – get_boundingbox, crop_pdf, and crop_pdf_singlepage – are the higher level hooks to the PDF autocropping functionality.

ive_tanim.core.autocrop_image.autocrop_image(inputfilename, outputfilename=None, color='white', newWidth=None, doShow=False, trans=False, fixEven=False)

performs an autocropping of lossy images, such as PNG, JPEG, or TIFF, with a defined background. The default background color is white. This then creates an automatically cropped image and stores into a new or the same file.

Parameters:
  • inputfilename (str) – the input image file’s name.

  • outputfilename (str) – the output image file’s name. If None, then stores image into inputfilename.

  • color (str) – the background color over which to perform automatic cropping. Default is white.

  • newWidth (int) – the optional new width of the output image. If None, then do not resize the image.

  • doShow (bool) – if True, then display the autocropped image using the computer’s default image viewer (for example Preview on Mac OS X), otherwise output the autocropped into a file. Default is False

  • trans (bool) – if True, then preserves the transparency of the input image. Default is False.

  • fixEven (bool) – If True, then changes the width and height of the autocropped image so that both are divisible by 2. This functionality exists in order to create a movie (using FFMPEG) from a sequence of image files; each image’s width and height should all be the same, and both divisible by 2.

Returns:

a bool of image processing status. True if able to perform the autocropping operation, False otherwise.

Return type:

bool

See also

autocrop_perproc

ive_tanim.core.autocrop_image.autocrop_perproc(input_tuple)

Designed to use Python’s multiprocessing design to autocrop lossy images per processor.

Parameters:

input_tuple (tuple) – a tuple of inputfilename, (input file name) outputfilename (output file name), color (background color), and fixEven (if True, then resize the output image to have even-pixeled width and height).

Returns:

a tuple of inputfilename and success status (True for success, False otherwise).

See also

autocrop_image

ive_tanim.core.autocrop_image.crop_pdf(inputfile, outputfile=None)

Given a possible multi-page PDF file that consists of \(N \ge 1\) pages, creates \(N\) separate single-page autocropped PDF files for each page in the input PDF file. Given a file with name inputfile, the collection of output files are named outputfile<idx>, where <idx> is the page number. This uses PdfReader to read in, and PdfWriter to write out, PDF files.

The Python functionality is a port of the pdfcrop.pl Perl script.

Parameters:
  • inputfile (str) – the name of the input PDF file.

  • outputfile (str) – optional argument, the prefix of the output PDF files. If None, then the prefix is the part of inputfile with the .pdf suffix removed.

ive_tanim.core.autocrop_image.crop_pdf_singlepage(inputfile, outputfile=None)

Given a single-paged PDF file, creates an autocropped output PDF file. This uses PdfReader to read in, and PdfWriter to write out, PDF files.

The Python functionality is a port of the pdfcrop.pl PERL script.

Parameters:
  • inputfile (str) – the name of the input PDF file.

  • outputfile (str) – optional argument, the name of the output PDF file. If None, then the autocropped output PDF file replaces the input PDF file.

See also

crop_pdf.

ive_tanim.core.autocrop_image.get_boundingbox(pdfpath, hiresbb=False)

Given a PDF file, returns its BoundingBox. Requires working ghostscript (gs executable) to calculate it.

get_boundingbox('/path/to/mypdf.pdf')   # doctest: +SKIP
[[23, 34, 300, 555], [0, 0, 300, 555]]
Parameters:
  • pdfpath (str) – the name of the PDF file.

  • hiresbb (bool) – if True, returns the hiresBoundingbox; otherwise returns Boundingbox.

Returns:

a list of BoundingBox, one per PDF page.

Return type:

list

Raises:

IOError – if the ghostscript executable could noty be found.

3.2. convert_image module

This module provides the low-level functionality that uses utility functions to create MP4 movies from a sequence of images, creates animated GIF files, and creates square movies (useful for upload to Instagram).

ive_tanim.core.convert_image.create_images2mp4dict(prefix, image_suffix='png', dirname='/mnt/software/sources/pythonics/ive_tanim/docsrc', fps=5, autocrop=False)

This method creates a complicated and low-level dict of set up, when creating an MP4 file from a collection of images. Here are things needed to make this work. mp4fromimages uses this dict to create the MP4 file.

  1. The collection of image files exist in a directory named dirname.

  2. The format of the image files as frames of a movie must have a name like PREFIX0000.<image_suffix> to PREFIX0401.<image_suffix>.

  3. The first image file must have a zero-padded value of zero. There must also be no number gaps in the sequence of image files as frames. For example, if there are image files PREFIX0200.<image_suffix> and PREFIX0202.<image_suffix> but no PREFIX0201.<image_suffix>, this process will fail.

In case of success, this method returns a dict with these five keys and values.

  • status: the string "SUCCESS".

  • files: the sorted list of image file names as movie frames.

  • autocrop: bool on whether to autocrop the image files.

  • fps: the int number of frames per second in the MP4 file.

  • actual prefix: the input (ffmpeg -i <arg>) argument that goes into FFmpeg when creating the MP4 from a collection of image files as frames.

In case of failure, the status key contains the reason for the failure. mp4fromimages returns this failure message and does nothing.

Parameters:
  • prefix (str) – the base name of each image file as frame, before the integer frame number and .<image_suffix> suffix.

  • image_suffix (str) – the image suffix through which to look. Default is png.

  • dirname (str) – the directory in which these image files live. Default is the current working directory.

  • fps (int) – the number of frames per seconds for the movie. Must be \(\ge 1\).

  • autocrop (bool) – whether to automatically crop out white space in the image files as frames. Default is False.

Returns:

the dict described above.

Return type:

dict

See also

mp4fromimages.

ive_tanim.core.convert_image.make_aspected_mp4video(input_mp4_file, output_mp4_file, aspect='square', background='white')

More FFmpeg voodoo, this time to create a square (or 9/16 aspect or 16/9 aspect) MP4 file for upload into Instagram.

This requires a working ffmpeg and ffprobe executable to work. The input file must be MP4.

Here are resources that I used to get this working.

Parameters:
  • input_mp4_file (str) – the name of the valid input MP4 file.

  • output_mp4_file (str) – the name of the valid output MP4 file.

  • aspect (str) – the aspect ratio to choose. Must be one of “square”, “916” is 9/16 (width 9 units, height 16 units), and “169” is 16/9 (width 16 units, height 9 units). Default is “square”.

  • background (str) – the background color to use for padding. Must be either “white” or “black”. Default is “white”.

ive_tanim.core.convert_image.mp4fromimages(images2mp4dict)

Creates an MP4 file from the low-level input specification dict that create_images2mp4dict creates. Requires the existence of the ffmpeg executable, and status value in the dict must be "SUCCESS". Otherwise, this method does not create a movie file.

If dirname is the directory in which the image files live, and PREFIX is the prefix of all the image files, the MP4 file is named dirname/PREFIX.mp4.

Parameters:

images2mp4dict (dict) – the dictionary specification for creating a specific MP4 file from a collection of image files as frames.

ive_tanim.core.convert_image.mp4togif(input_mp4_file, gif_file=None, duration=None, scale=1.0)

This consists of voodoo FFmpeg magic that converts MP4 to animated GIF reasonably well. Don’t ask me how most of it works, just be on-your-knees-kissing-the-dirt grateful that MILLIONS of people hack onto and into FFmpeg so that this information is available, and the workflow works.

This requires a working ffmpeg and ffprobe executable to work. If the input file is named <input>.mp4, the output animated GIF file is named <input>.gif.

Here are resources that I used to get this working.

Parameters:
  • input_mp4_file (str) – the name of the valid MP4 file.

  • gif_file (str) – the (optional) name of the animated GIF file. If not provided, then creates a GIF file of some default name.

  • duration (float) – duration, in seconds, of MP4 file to use to make the animated GIF. If None is provided, use the full movie. If provided, then must be \(\ge 1\) seconds.

  • scale (float) – scaling of input width and height of MP4 file. Default is 1.0. Must be \(\ge 0\).

ive_tanim.core.convert_image.pdf2png(input_pdf_file, newWidth=None, verify=True)

Returns an Image object of the PNG file produced when the CloudConvert server uploaded an input PDF image file. The output PNG file has the same aspect ratio as the input file.

Parameters:
  • input_png_file (str) – the input PNG file. Filename must end in .png.

  • newWidth (int) – optional argument. If specified, the pixel width of the output image.

  • verify (bool) – optional argument, whether to verify SSL connections. Default is True.

Returns:

the Image object of the PNG file from the input PNG file.

See also

ive_tanim.core.convert_image.png2png(input_png_file, newWidth=None, verify=True)

Returns an Image object of the PNG file produced when the CloudConvert server uploaded an input PNG file. The output PNG file has the same aspect ratio as the input file.

Parameters:
  • input_png_file (str) – the input PNG file. Filename must end in .png.

  • newWidth (int) – optional argument. If specified, the pixel width of the output image.

  • verify (bool) – optional argument, whether to verify SSL connections. Default is True.

Returns:

the Image object of the PNG file from the input PNG file.

See also

3.3. rst2html module

This module provides low-level functionality that converts reStructuredText into HTML, with the option of using MathJax to render LaTeX formulae in HTML. This also consists of methods to create a properly CID-converted MIMEMultipart low-level message object for other, external modules that convert RST documents into HTML emails that most email servers will work on.

class ive_tanim.core.rst2html.MyHTMLTranslator(document)

I am copying this code without any real understanding from this GitHub gist. This class seems to extend HTMLTranslator, but I don’t know what that means.

The usage in that gist is as follows, because the gist is an enhancement of rst2html.py called myrst2html.py.

htmlwriter = Writer( )
htmlwriter.translator_class = MyHTMLTranslator
publish_cmdline(writer=htmlwriter)

I imagine I will have to modify the methods check_valid_RST and convert_string_RST to represent more correct functionality. There, I am following what I inferred from publish_parts and the included code block in this object’s description.

mathjax_url = 'https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'

URL of the MathJax javascript library.

The MathJax library ought to be installed on the same server as the rest of the deployed site files and specified in the math-output setting appended to “mathjax”. See Docutils Configuration.

The fallback tries a local MathJax installation at /usr/share/javascript/mathjax/MathJax.js.

ive_tanim.core.rst2html.check_valid_RST(myString, use_mathjax=False)

Checks to see whether the input string is valid reStructuredText.

Parameters:
Returns:

True if valid, otherwise False.

Return type:

bool

ive_tanim.core.rst2html.cid_out_mimeMultiMessage(msg, mainHTML)

Goes through the HTML email message, and creates a CID-enabled email with additional image attachments inside. Follows ideas found on this useful stackoverflow article.

Parameters:
  • msg (MIMEMultipart) – the message object that will be modified.

  • mainHTML (str) – the HTML message that we will parse through for images on disk rather than as external URLs.

Returns:

the dict of cid to image file name.

Return type:

dict

ive_tanim.core.rst2html.config_email_alias(alias, candidate_rfc5322_email)

Uses parse_rfc5322_email to validate, then configurationally store an alias into the configuration JSON file, ~/.config/ive_tanim/config.json. To make things simpler, this uses a lower case transform of the alias into the configuration file.

THIS IS NOT THREAD-SAFE!

Parameters:
  • alias (str) – an useful key that identifies the email alias so you don’t have to write out a full RFC 5322 qualified email address.

  • candidate_rfc5322_email (str) – the input RFC 5322 fully qualified email address.

Returns:

False if the candidate email address is not valid according to RFC 5322. Otherwise returns True.

Return type:

bool

Note

The alias is stored in lower case, and references in other methods and CLI’s perform an implicit to-lower-case conversion if you want to specify a sender, recipient, CC recipient, or BCC recipient.

ive_tanim.core.rst2html.config_email_default_sender(sender_rfc5322_email)

Uses parse_rfc5322_email to validate, then configurationally store the current default settings of the SENDER into the configuration JSON file, ~/.config/ive_tanim/config.json.

THIS IS NOT THREAD-SAFE!

Parameters:

sender_rfc5322_email (str) – the input RFC 5322 fully qualified email address of the default sender.

Returns:

False if the candidate sender email address is not valid according to RFC 5322. Otherwise returns True and sets the default sender to this value.

Return type:

bool

ive_tanim.core.rst2html.config_email_default_smtp(server='localhost', port=25)

Sets the default SMTP server sending settings, and stores the default SMTP server settings into the configuration JSON file, ~/.config/ive_tanim/config.json.

THIS IS NOT THREAD SAFE!

Parameters:
  • server (str) – the SMTP server to use. Default is localhost.

  • portnumber (int) – the port number to use to send the email to the local SMTP server. Default is port 25.

ive_tanim.core.rst2html.convert_string_RST(myString, use_mathjax=False, outputfilename=None)

Converts a valid reStructuredText input string into rich HTML.

Parameters:
  • myString (str) – the candidate reStructuredText input.

  • use_mathjax (bool) – if True, then use MathJax for math formulae. Default is False.

  • outputfilename (str) – if not None, then sets the HTML document title to this value.

Returns:

If the input string is valid reStructuredText, returns the rich HTML as a string. Otherwise emits a logging error message and returns None.

Return type:

str

See also

check_valid_RST

ive_tanim.core.rst2html.create_collective_email_full(mainHTML, subject, fromEmail, to_emails, cc_emails=[], bcc_emails=[], attachments=[])

Creates a MIMEMultipart email that also contains the following

  • the body of the message as an HTML document.

  • sender.

  • TO recipients.

  • CC recipients.

  • BCC recipients.

  • explicitly included attachments.

  • soft-conventioned cid of embedded and inlined images (PNG, JPEG, TIFF, GIF, etc.).

Go easy on me, it’s my first day!

Parameters:
  • mainHTML (str) – the email body as an HTML string document.

  • subject (str) – the email subject.

  • fromEmail (str) – the dict of email and optional full name of sender.

  • to_emails (list) – the list of dict of TO recipients. Each TO recipient is a dict of email and optionally full name.

  • cc_emails (list) – the list of dict of CC recipients. Each CC recipient is a dict of email and optionally full name.

  • bcc_emails (list) – the list of dict of BCC recipients. Each BCC recipient is a dict of email and optionally full name.

  • attachments (list) – the collection of attachments to send out.

Returns:

the message object, with soft-conventioned cid of images included.

Return type:

MIMEMultipart

ive_tanim.core.rst2html.create_rfc5322_email(email_fullname_dict)

Given a dict containing email address and (optionally) full name, returns the RFC 5322 fully qualified email address.

Parameters:

email_fullname_dict (dict) – the dict that should contain the email and optionally the fully qualified name. Email address is in the email key, and optional full name is in the full name key.

Returns:

an RFC 5322 fully qualified email address under the following conditions:

  1. If there is an email address; and

  2. If there is an email address AND a fully qualified name.

Otherwise returns None.

Return type:

str

ive_tanim.core.rst2html.get_attachment_object(full_file_path)

Create the attachment dict given the input file.

Parameters:

full_file_path (str) – the location of the file on-disk.

Returns:

a dict of name (which is file base name), mimetype, and filepath (which is full_file_path). If file does not exist, then function returns None.

Return type:

dict

ive_tanim.core.rst2html.parse_rfc5322_email(candidate_rfc5322_email)

Uses parseaddr to create a dict of candidate email dictionary (keys are email and optionally full name).

Parameters:

candidate_rfc5322_email (str) – the input RFC 5322 fully qualified email address.

Returns:

a dict of candidate email dictionary only if there is a valid email address. Otherwise returns None.

Return type:

dict

ive_tanim.core.rst2html.send_email_localsmtp(msg, server='localhost', portnumber=25)

Sends the email using the SMTP Python functionality to send through a local SMTP server.

This blog post describes how I set up a GMail relay using my local SMTP server on my Ubuntu machine.

Parameters:
  • msg (MIMEMultipart) – the email message object to send. At a high level, this is an email with body, sender, recipients, and optional attachments.

  • server (str) – the SMTP server to use. Default is localhost.

  • portnumber (int) – the port number to use to send the email to the local SMTP server. Default is port 25.