bcftbx.Md5sum

Md5sum

Classes and functions for performing various MD5 checksum operations.

The code function is the ‘md5sum’ function, which computes the MD5 hash for a file and is based on examples at:

http://www.python.org/getit/releases/2.0.1/md5sum.py

and

http://stackoverflow.com/questions/1131220/get-md5-hash-of-a-files-without-open-it-in-python

Usage:

>>> import Md5sum
>>> Md5Sum.md5sum("myfile.txt")
... eacc9c036025f0e64fb724cacaadd8b4

This module implements two methods for generating the md5 digest of a file: the first uses a method based on the hashlib module, while the second (used as a fallback for pre-2.5 Python) uses the now deprecated md5 module. Note however that the md5sum function determines itself which method to use.

There is also a high-level class ‘Md5Checker’ which implements various class methods for running MD5 checks across all files in a directory, and a wrapper class ‘Md5Reporter’ which

class bcftbx.Md5sum.Md5CheckReporter(results=None, verbose=False, fp=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)

Provides a generic reporting class for Md5Checker methods

Typical usage modes are either:

>>> r = Md5CheckReporter()
>>> for f,s in Md5Checker.md5cmp_dirs(d1,d2):
...    r.add_result(f,s)

or more concisely:

>>> r = Md5CheckReporter(Md5Checker.md5cmp_dirs(d1,d2))

Use the ‘summary’ method to generate a summary of all the checks.

Use the ‘status’ method to get a single indicator of success or failure which is consistent with UNIX-style return codes.

To find out how many results were processed in total, how many failed etc use the following properties:

  • n_files : total number of results examined

  • n_ok : number that passed MD5 checks (MD5_OK)

  • n_failed : number that failed due to different MD5 sums (MD5_FAILED)

  • n_missing: number that failed due to a missing target file (MISSING_TARGET)

  • n_errors : number that had errors calculating their MD5 sums (MD5 ERROR)

add_result(f, status)

Add a result to the reporter

Takes a file and an Md5Checker status code and adds it to the results.

If the status code indicates a failed check then the file name is added to a list corresponding to the nature of the failure (e.g. MD5 sums didn’t match, target was missing etc).

property n_errors

Number of files with errors checking MD5 sums

property n_failed

Number of failed MD5 sum checks

property n_files

Total number of files checked

property n_missing

Number of missing files

property n_ok

Number of passed MD5 sum checks

property status

Return status code

Returns 0 if all files that were checked passed the MD5 check, or 1 if at least one file failed the check for whatever reason.

summary()

Write a summary of the results

Writes a summary of the number of files checked, how many passed or failed MD5 checks and so on, to the specified output stream.

class bcftbx.Md5sum.Md5Checker

Provides static methods for performing checks using MD5 sums

The Md5Checker class is a collection of static methods that can be used for performing checks using MD5 sums.

It also provides a set of constants to

classmethod compute_md5sums(d, links=0)

Calculate MD5 sums for all files in directory

Given a directory, traverses the structure underneath (including subdirectories) and yields the path and MD5 sum for each file that is found.

The ‘links’ option determines how symbolic links are handled, see the ‘walk’ function for details.

Parameters:
  • dirn – name of the top-level directory

  • links – (optional) specify how symbolic links are handled

Returns:

Yields a tuple (f,md5) where f is the path of a file relative to the top-level directory, and md5 is the calculated MD5 sum.

classmethod md5_walk(dirn, links=0)

Calculate MD5 sums for all files in directory

Given a directory, traverses the structure underneath (including subdirectories) and yields the path and MD5 sum for each file that is found.

The ‘links’ option determines how symbolic links are handled, see the ‘walk’ function for details.

Parameters:
  • dirn – name of the top-level directory

  • links – (optional) specify how symbolic links are handled

Returns:

Yields a tuple (f,md5) where f is the path of a file relative to the top-level directory, and md5 is the calculated MD5 sum.

classmethod md5cmp_dirs(d1, d2, links=0)

Compares the contents of one directory with another using MD5 sums

Given two directory names ‘d1’ and ‘d2’, compares the MD5 sum of each file found in ‘d1’ against that of the equivalent file in ‘d2’, and yields the result as an Md5checker constant for each file pair, i.e.:

MD5_OK: if MD5 sums match; MD5_FAILED: if MD5 sums differ.

If the equivalent file doesn’t exist then yields MISSING_TARGET.

If one or both MD5 sums cannot be computed then yields MD5_ERROR.

How symbolic links are handled depends on the setting of the ‘links’ option:

FOLLOW_LINKS: (default) MD5 sums are computed and compared for

the targets of symbolic links. Broken links are treated as if the file was missing.

IGNORE_LINKS: MD5 sums are not computed or compared if either file

is a symbolic link, and links to directories are not followed.

Parameters:
  • d1 – ‘reference’ directory

  • d2 – ‘target’ directory to be compared with the reference

  • links – (optional) specify how symbolic links are handled.

Returns:

Yields a tuple (f,status) where f is the relative path of the file pair being compared, and status is the Md5Checker constant representing the outcome of the comparison.

classmethod md5cmp_files(f1, f2)

Compares the MD5 sums of two files

Given two file names, attempts to compute and compare their MD5 sums.

If the MD5s match then returns MD5_OK, if they don’t match then returns MD5_FAILED.

If one or both MD5 sums cannot be computed then returns MD5_ERROR.

Note that if either file is a link then MD5 sums will be computed for the link target(s), if they exist and can be accessed.

Parameters:
  • f1 – name and path for reference file

  • f2 – name and path for file to be checked

Returns:

Md5Checker constant representing the outcome of the comparison.

classmethod verify_md5sums(filen=None, fp=None)

Verify md5sums from a file

Given a file (or a file-like object opened for reading), reads each line and attemps to interpret as an md5sum line i.e. of the form

<md5 sum> <path/to/file>

e.g.

66b201ae074c36ae9bffec7fb74ff03a md5checker.py

It then attempts to verify the MD5 sum against the file located on the file system, and yields the result as an Md5checker constant for each file line i.e.:

MD5_OK: if MD5 sums match; MD5_FAILED: if MD5 sums differ.

If the file cannot be found then it yields MISSING_TARGET; if there is a problem computing the MD5 sum then it yields MD5_ERROR.

Parameters:
  • filen – name of the file containing md5sum output

  • fp – file-like object opened for reading, with md5sum output

Returns:

Yields a tuple (f,status) where f is the path of the file being verified (as it appears in the file), and status is the Md5Checker constant representing the outcome.

classmethod walk(dirn, links=0)

Traverse all files found in a directory structure

Given a directory, traverses the structure underneath (including subdirectories) and yields the path for each file that is found.

How symbolic links are handled depends on the setting of the ‘links’ option:

FOLLOW_LINKS: symbolic links to files are treated as files; links

to directories are followed.

IGNORE_LINKS: symbolic links to files are ignored; links to

directories are not followed.

Parameters:
  • dirn – name of the top-level directory

  • links – (optional) specify how symbolic links are handled

Returns:

Yields the name and full path for each file under ‘dirn’.

bcftbx.Md5sum.md5sum(f)

Return md5sum digest for a file or stream

This implements the md5sum checksum generation using both the hashlib module.

Parameters:

f – name of the file to generate the checksum from, or a file-like object opened for reading in binary mode.

Returns:

Md5sum digest for the named file.