Compare Directories and Remove Duplicates This Bash script compares subdirectories in two locations, groups them based on fuzzy name similarity, and automatically removes duplicates. It supports both undesirable-word–based removal and an automatic pruning mechanism (keeping the first directory in each group). Features Normalization & Cleaning: Converts directory names to lowercase, removes punctuation, and strips out any undesirable words specified in a words file. Fuzzy Matching: Uses Python's difflib to compute a similarity ratio between cleaned directory names. A configurable threshold determines if two names are “similar.” Automatic Removal: Undesirable Words: Automatically removes directories whose original name contains one of the undesirable words if another similar directory does not. Duplicate Pruning: After the initial pass, automatically removes all duplicates in a group while keeping the first entry. Dry-Run Mode: Preview actions without deleting any directories by using the --dry-run flag. Requirements Bash (version 4+ recommended) Python 3 bc for floating point comparisons Installation Clone the repository: git clone https://github.com/yourusername/compare-dirs.git cd compare-dirs Make the script executable: bash Kopiér chmod +x compare_dirs.sh Usage ./compare_dirs.sh [--dry-run] [--threshold ] Arguments : The first directory containing subdirectories to compare. : The second directory containing subdirectories to compare. : A text file with one undesirable word per line. These words are removed from directory names during the cleaning process. Options --dry-run Print the actions without actually removing any directories. --threshold Set the fuzzy similarity threshold (default is 0.8). A lower threshold (e.g., 0.7) will group more directories as duplicates. Examples Dry Run with Default Threshold (0.8): ./compare_dirs.sh --dry-run /mnt/dsnas /mnt/dsnas1 ./words Dry Run with a Custom Threshold (0.7): ./compare_dirs.sh --dry-run --threshold 0.7 /mnt/dsnas /mnt/dsnas1 ./words Actual Run (without dry-run): ./compare_dirs.sh /mnt/dsnas /mnt/dsnas1 ./words How It Works Scanning: The script scans for immediate subdirectories in the two specified directories. Normalization & Cleaning: Each subdirectory name is normalized (converted to lowercase, punctuation removed) and then “cleaned” by stripping out undesirable words (one per line from the words file). Grouping: Using a Python helper with difflib.SequenceMatcher, directories are grouped by comparing their cleaned names. If the similarity ratio meets or exceeds the threshold, they are considered duplicates. Removal: Automatic Removal Based on Undesirable Words: Within duplicate groups, if one directory’s original name contains an undesirable word while an alternative does not, that directory is flagged for removal. Duplicate Pruning: After the undesirable-word check, any remaining duplicate groups are pruned by keeping the first directory in each group and removing the rest. Dry-Run: When run with the --dry-run flag, the script will print what it would remove without actually deleting any directories.