imagedup

module
v2.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 17, 2024 License: GPL-3.0

README

ImageDup

Build Release codecov Go Report Card Go Reference

Got a lot of images with many duplicates? Maybe of different sizes? imagedup uses perceptual hashing to find images that are close in appearance but not exact. Once imagedup is finished the verify tool can be used to read the delete log and open images in pairs so you can double check them before they are deleted. This step is necessary as perceptual hashing is not perfect and will sometimes show two completely different images. A second tool uniqdirs can be use with the same options as verify and will dedup within directories which are each considered unique. This is helpful with more organized directory layouts.

Run

./nsquared -cache-file cache.json -output-file delete.log -dir /path/to/images -threads 5 -dedup-file-pairs 
# OR
./uniqdirs -cache-file cache.json -output-file delete.log -dir /path/to/images -threads 5 -dedup-file-pairs

# this will create delete.log which will be used by the verify tool.

./verify -delete-file delete.log

print help:

imagedup -h

cache file

The cache contains hashes that correspond to the image in -dir and thus if -dir changes so should -cache-file, e.g.

  • -cache-file one.json -dir /path/to/one
  • -cache-file two.json -dir /path/to/two

Passing a -cache-file with a different -dir will result in an error, e.g.

  • -cache-file one.json -dir /path/to/two

Deduping pairs of images

Deduping is done with a roaring bitmap which will reduce the number of comparisons by half but will increase memory usage. This is a tradeoff you will need to consider. This feature is disabled by default and can be changed by passing -dedup-file-pairs.

Without deduping the pairs
INFO[2022-09-15 11:29:32] Found 31722 dirs                             
INFO[2022-09-15 11:29:32] Started, go to grafana to monitor            
INFO[2022-09-15 11:51:34] Shutting down                                
INFO[2022-09-15 11:51:34] Total time taken: 22m2.221316446s   
Deduping the pairs
INFO[2022-09-15 11:56:28] Found 31722 dirs                             
INFO[2022-09-15 11:56:28] Started, go to grafana to monitor            
INFO[2022-09-15 12:13:52] Shutting down                                
INFO[2022-09-15 12:13:52] Total time taken: 17m24.991176074s 
Compare Stats

First run is without deduping file pairs, second is with it. grafana screenshot

Directories

Path Synopsis
cmd
internal
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL