Python package for detecting copy-move attack on a digital image

This is a python package for detecting copy-move attack on a digital image.

This project is part of our paper that has been published at Springer. More detailed theories and steps are explained there.

Using the package

To install the package, simply hit it with pip: pip3 install pimage. Example script for using this package is also provided here.

Configuring the algorithm

The algorithm can be dynamically configured with Configuration class. If omitted, the default value from both of the paper will be used. The default value and description for each of the parameter is detailed on configuration.py.

from pimage.configuration import Configuration

conf = Configuration(
    block_size=32,
    nn=2,
    nf=188,
    nd=50,
    p=(1.80, 1.80, 1.80, 0.0125, 0.0125, 0.0125, 0.0125),
    t1=2.80,
    t2=0.02
)

Determining the block_size: The first algorithm use block size of 32 pixels so this package will use the same value by default. Increasing the size means faster run time at a reduced accuracy. Analogically, decreasing the size means longer run time with increased accuracy.

API for the detection process

The API for detection process is provided via copy_move.detect() method. For example:

from pimage import copy_move
from pimage.configuration import Configuration

conf = Configuration(block_size=32)

fraud_list, ground_truth_image, result_image = copy_move.detect("dataset_example_blur.png", configuration=conf)

fraud_list will be the list of (x_coordinate, y_coordinate) of the blocks group and the total number of the blocks it is formed with. If this list is not empty, we can assume that the image is being tampered. For example, running the cattle dataset with 32 px of block size will result in:
```
((-57, -123), 2178)
((-11, 140), 2178)
((-280, 114), 2178)
((-34, -305), 2178)
((-37, 148), 2178)
```
the above output means there are 5 possible matched/identical region with 2178 overlapping blocks on each of it
ground_truth_image contains the black and white ground truth of the detection result. This is useful for comparing accuracy, MSE, etc with the ground truth from the dataset
result_image is the given image where the possible fraud region will be color-bordered (if any)

ground_truth_image and result_image will be formatted as numpy.ndarray. It can further be processed as needed. For example, it can be programmatically modified and then exported later as image like so:

import imageio

imageio.imwrite("result_image.png", result_image)
imageio.imwrite("ground_truth_image.png", ground_truth_image)

Quick command to detect an image

To quickly run the detection command for your image, the copy_move.detect_and_export() is also provided. The command is identical with .detect() but it also save the result to desired output path.

from pimage import copy_move

copy_move.detect_and_export('dataset_example_blur.png', 'output')

this code will save the ground_truth_image and result_image inside output folder.

Verbose mode

When running copy_move.detect() or copy_move.detect_and_export(), you can pass verbose=True to output the status of each step. The default value will be False so nothing will be printed.

Example output when verbose mode is being enabled:

Processing: dataset/multi_paste/cattle_gcs500_copy_rb5.png
Step 1 of 4: Object and variable initialization
Step 2 of 4: Computing characteristic features
100%|██████████| 609/609 [04:14<00:00,  2.39it/s]
Step 3 of 4:Pairing image blocks
100%|██████████| 241163/241163 [00:00<00:00, 816659.95it/s]
Step 4 of 4: Image reconstruction
Found pair(s) of possible fraud attack:
((-57, -123), 2178)
((-11, 140), 2178)
((-280, 114), 2178)
((-34, -305), 2178)
((-37, 148), 2178)
Computing time : 254.81 second
Sorting time   : 0.89 second
Analyzing time : 0.3 second
Image creation : 1.4 second
Total time    : 0:04:17 second

The algorithm

The implementation generally manipulates overlapping blocks, and are constructed based on two algorithms:

Duplication detection algorithm, taken from Exposing Digital Forgeries by Detecting Duplicated Image Region (alternative link); Fast and smooth attack detection algorithm on digital image using principal component analysis, but sensitive to noise and any following manipulations that are being applied after the attack phase (in which they call it post region duplication process)
Robust detection algorithm, taken from Robust Detection of Region-Duplication Forgery in Digital Image; Relatively slower process with rough result on the detection edge but are considered robust towards noise and post region duplication process

How do we modify them?

We know that the first algorithm use coordinate and principal_component features, while the second algorithm use coordinate and seven_features.

Knowing that, we then attempt to give a tolerance by merging all the features like so:

The attributes are saved as one object. A lexicographical sorting is then applied to the principal component and the seven features.

The principal component will bring similar block closer, while the seven features will back up the detection for a block that can't be detected by principal component due to being applied with post region duplication process (for example being blurred).

By doing so, the new algorithm will have a tolerance regarding variety of the input image. The detection result will be relatively smooth and accurate for any type of image, with a trade-off in run time as we basically run two algorithm.

Example image

All the result of the dataset should be inside output directory of this repository.

The image shown is ordered as: original, attacked, and the resulting detection image.

Horse dataset

Cattle dataset

Clean walls dataset

Knight moves dataset

Additional note

The project is formerly written with Python 2 for our Undergraduate Thesis, which is now left unmaintained here. The original thesis is written in Indonesian that in any case can also be downloaded from here.