program updates
This commit is contained in:
parent
56d41fb487
commit
6f0781ea12
105
README.md
105
README.md
@ -6,7 +6,7 @@ Normalization & Cleaning:
|
||||
Converts directory names to lowercase, removes punctuation, and strips out any undesirable words specified in a words file.
|
||||
|
||||
Fuzzy Matching:
|
||||
Uses Python's difflib to compute a similarity ratio between cleaned directory names. A configurable threshold determines if two names are “similar.”
|
||||
Uses Python's difflib to compute a similarity ratio between cleaned directory names. A configurable threshold determines if two names are "similar."
|
||||
|
||||
Automatic Removal:
|
||||
|
||||
@ -15,65 +15,136 @@ Duplicate Pruning: After the initial pass, automatically removes all duplicates
|
||||
Dry-Run Mode:
|
||||
Preview actions without deleting any directories by using the --dry-run flag.
|
||||
|
||||
New Features (in compare_dirs_improved.sh):
|
||||
- Parallel Processing: Both directory scanning and filtering are now processed in parallel for improved performance
|
||||
- Configuration File: Support for persistent configuration via compare_dirs.conf file
|
||||
- Comprehensive Logging: Detailed logging with configurable levels (DEBUG, INFO, WARNING, ERROR)
|
||||
- Better Error Handling: Improved error detection and reporting
|
||||
- Color-coded Console Output: Better visual distinction between different types of messages
|
||||
|
||||
Requirements
|
||||
Bash (version 4+ recommended)
|
||||
Python 3
|
||||
bc for floating point comparisons
|
||||
|
||||
Installation
|
||||
Clone the repository:
|
||||
|
||||
git clone https://github.com/yourusername/compare-dirs.git
|
||||
cd compare-dirs
|
||||
Make the script executable:
|
||||
|
||||
bash
|
||||
Kopiér
|
||||
Make the scripts executable:
|
||||
|
||||
```bash
|
||||
chmod +x compare_dirs.sh
|
||||
chmod +x compare_dirs_improved.sh
|
||||
```
|
||||
|
||||
Usage
|
||||
## Original Script
|
||||
```
|
||||
./compare_dirs.sh [--dry-run] [--threshold <threshold>] <dir1> <dir2> <words_file>
|
||||
```
|
||||
|
||||
## Improved Script
|
||||
```
|
||||
./compare_dirs_improved.sh [--dry-run] [--threshold <threshold>] [--config <config_file>] [--log-file <log_file>] [--log-level <level>] [--parallel <processes>] [<dir1> <dir2> <words_file>]
|
||||
```
|
||||
|
||||
Arguments
|
||||
<dir1>: The first directory containing subdirectories to compare.
|
||||
<dir2>: The second directory containing subdirectories to compare.
|
||||
<words_file>: A text file with one undesirable word per line. These words are removed from directory names during the cleaning process.
|
||||
|
||||
Options
|
||||
--dry-run
|
||||
### Common Options
|
||||
`--dry-run`: Print the actions without actually removing any directories.
|
||||
`--threshold <threshold>`: Set the fuzzy similarity threshold (default is 0.8). A lower threshold (e.g., 0.7) will group more directories as duplicates.
|
||||
|
||||
Print the actions without actually removing any directories.
|
||||
### Improved Script Options
|
||||
`--config <config_file>`: Specify a configuration file (default: ./compare_dirs.conf)
|
||||
`--log-file <log_file>`: Specify a log file path (default: ./compare_dirs.log)
|
||||
`--log-level <level>`: Set logging level (DEBUG, INFO, WARNING, ERROR)
|
||||
`--parallel <processes>`: Number of parallel processes to use (0 = auto, uses all CPU cores)
|
||||
`--help`: Display usage information
|
||||
|
||||
--threshold <threshold>
|
||||
Set the fuzzy similarity threshold (default is 0.8). A lower threshold (e.g., 0.7) will group more directories as duplicates.
|
||||
Config File
|
||||
The improved script supports a configuration file (default: compare_dirs.conf) with the following parameters:
|
||||
|
||||
```
|
||||
# Configuration file for compare_dirs.sh
|
||||
|
||||
# Default directory paths
|
||||
DIR1="/path/to/dir1"
|
||||
DIR2="/path/to/dir2"
|
||||
|
||||
# Path to words file
|
||||
WORDS_FILE="./words"
|
||||
|
||||
# Similarity threshold (0.0-1.0)
|
||||
SIMILARITY_THRESHOLD=0.8
|
||||
|
||||
# Enable/disable dry run mode (true/false)
|
||||
DRY_RUN=false
|
||||
|
||||
# Number of parallel processes to use (0 = auto)
|
||||
PARALLEL_PROCESSES=0
|
||||
|
||||
# Logging configuration
|
||||
LOG_ENABLED=true
|
||||
LOG_FILE="./compare_dirs.log"
|
||||
LOG_LEVEL="INFO" # DEBUG, INFO, WARNING, ERROR
|
||||
```
|
||||
|
||||
Examples
|
||||
### Original Script
|
||||
|
||||
Dry Run with Default Threshold (0.8):
|
||||
|
||||
|
||||
```
|
||||
./compare_dirs.sh --dry-run /mnt/dsnas /mnt/dsnas1 ./words
|
||||
```
|
||||
|
||||
Dry Run with a Custom Threshold (0.7):
|
||||
|
||||
|
||||
```
|
||||
./compare_dirs.sh --dry-run --threshold 0.7 /mnt/dsnas /mnt/dsnas1 ./words
|
||||
```
|
||||
|
||||
Actual Run (without dry-run):
|
||||
|
||||
|
||||
```
|
||||
./compare_dirs.sh /mnt/dsnas /mnt/dsnas1 ./words
|
||||
```
|
||||
|
||||
### Improved Script
|
||||
|
||||
Using Configuration File:
|
||||
```
|
||||
./compare_dirs_improved.sh --config my_config.conf
|
||||
```
|
||||
|
||||
With Parallel Processing and Custom Log Level:
|
||||
```
|
||||
./compare_dirs_improved.sh --parallel 8 --log-level DEBUG /mnt/dsnas /mnt/dsnas1 ./words
|
||||
```
|
||||
|
||||
How It Works
|
||||
Scanning:
|
||||
The script scans for immediate subdirectories in the two specified directories.
|
||||
The script scans for immediate subdirectories in the two specified directories. In the improved version, this is done in parallel for better performance.
|
||||
|
||||
Normalization & Cleaning:
|
||||
Each subdirectory name is normalized (converted to lowercase, punctuation removed) and then “cleaned” by stripping out undesirable words (one per line from the words file).
|
||||
Each subdirectory name is normalized (converted to lowercase, punctuation removed) and then "cleaned" by stripping out undesirable words (one per line from the words file).
|
||||
|
||||
Grouping:
|
||||
Using a Python helper with difflib.SequenceMatcher, directories are grouped by comparing their cleaned names. If the similarity ratio meets or exceeds the threshold, they are considered duplicates.
|
||||
|
||||
Removal:
|
||||
Automatic Removal Based on Undesirable Words:
|
||||
Within duplicate groups, if one directory’s original name contains an undesirable word while an alternative does not, that directory is flagged for removal.
|
||||
Within duplicate groups, if one directory's original name contains an undesirable word while an alternative does not, that directory is flagged for removal.
|
||||
|
||||
Duplicate Pruning:
|
||||
After the undesirable-word check, any remaining duplicate groups are pruned by keeping the first directory in each group and removing the rest.
|
||||
|
||||
Dry-Run:
|
||||
When run with the --dry-run flag, the script will print what it would remove without actually deleting any directories.
|
||||
|
||||
Logging (Improved Script):
|
||||
The improved script maintains a detailed log of all operations, which can be used for audit purposes or troubleshooting.
|
22
compare_dirs.conf
Normal file
22
compare_dirs.conf
Normal file
@ -0,0 +1,22 @@
|
||||
# Configuration file for compare_dirs.sh
|
||||
|
||||
# Default directory paths
|
||||
DIR1="/path/to/dir1"
|
||||
DIR2="/path/to/dir2"
|
||||
|
||||
# Path to words file
|
||||
WORDS_FILE="./words"
|
||||
|
||||
# Similarity threshold (0.0-1.0)
|
||||
SIMILARITY_THRESHOLD=0.8
|
||||
|
||||
# Enable/disable dry run mode (true/false)
|
||||
DRY_RUN=false
|
||||
|
||||
# Number of parallel processes to use (0 = auto)
|
||||
PARALLEL_PROCESSES=0
|
||||
|
||||
# Logging configuration
|
||||
LOG_ENABLED=true
|
||||
LOG_FILE="./compare_dirs.log"
|
||||
LOG_LEVEL="INFO" # DEBUG, INFO, WARNING, ERROR
|
475
compare_dirs_improved.sh
Executable file
475
compare_dirs_improved.sh
Executable file
@ -0,0 +1,475 @@
|
||||
#!/bin/bash
|
||||
# compare_dirs_improved.sh
|
||||
#
|
||||
# Usage: ./compare_dirs_improved.sh [--dry-run] [--threshold <threshold>] [--config <config_file>] [<dir1> <dir2> <words_file>]
|
||||
#
|
||||
# This script:
|
||||
# 1. Scans immediate subdirectories in <dir1> and <dir2> in parallel.
|
||||
# 2. For each directory, if its name contains any undesirable word (one per line in <words_file>),
|
||||
# the directory is removed outright.
|
||||
# 3. The remaining directories are "cleaned" (converted to lowercase, punctuation removed)
|
||||
# and then grouped by fuzzy similarity using a configurable threshold.
|
||||
# The fuzzy similarity process is optimized with a multiprocessing helper.
|
||||
# 4. Within each group, if one directory's name contains "2160p" and another contains "1080p",
|
||||
# the 1080p directory(ies) are removed (or flagged in dry-run mode).
|
||||
# 5. For any remaining duplicate groups, the user is prompted to select a directory to remove.
|
||||
# 6. A --dry-run mode is available to preview removals without actually deleting any directories.
|
||||
# 7. Supports configuration files for persistent settings.
|
||||
# 8. Provides comprehensive logging of operations.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# Default configuration file location
|
||||
CONFIG_FILE="./compare_dirs.conf"
|
||||
|
||||
# Initialize log function
|
||||
log() {
|
||||
local level="$1"
|
||||
local message="$2"
|
||||
local timestamp=$(date "+%Y-%m-%d %H:%M:%S")
|
||||
|
||||
# Only log if logging is enabled
|
||||
if [[ "$LOG_ENABLED" == "true" ]]; then
|
||||
# Log level filtering
|
||||
case "$LOG_LEVEL" in
|
||||
"DEBUG")
|
||||
;;
|
||||
"INFO")
|
||||
if [[ "$level" == "DEBUG" ]]; then return; fi
|
||||
;;
|
||||
"WARNING")
|
||||
if [[ "$level" == "DEBUG" || "$level" == "INFO" ]]; then return; fi
|
||||
;;
|
||||
"ERROR")
|
||||
if [[ "$level" == "DEBUG" || "$level" == "INFO" || "$level" == "WARNING" ]]; then return; fi
|
||||
;;
|
||||
esac
|
||||
|
||||
# Print to console with color
|
||||
case "$level" in
|
||||
"DEBUG") echo -e "\033[36m[$timestamp] [$level] $message\033[0m" ;; # Cyan
|
||||
"INFO") echo -e "\033[32m[$timestamp] [$level] $message\033[0m" ;; # Green
|
||||
"WARNING") echo -e "\033[33m[$timestamp] [$level] $message\033[0m" ;; # Yellow
|
||||
"ERROR") echo -e "\033[31m[$timestamp] [$level] $message\033[0m" ;; # Red
|
||||
*) echo "[$timestamp] [$level] $message" ;;
|
||||
esac
|
||||
|
||||
# Write to log file (without color codes)
|
||||
echo "[$timestamp] [$level] $message" >> "$LOG_FILE"
|
||||
fi
|
||||
}
|
||||
|
||||
# Default options
|
||||
DRY_RUN=false
|
||||
SIMILARITY_THRESHOLD=0.8
|
||||
LOG_ENABLED=true
|
||||
LOG_FILE="./compare_dirs.log"
|
||||
LOG_LEVEL="INFO"
|
||||
PARALLEL_PROCESSES=0
|
||||
|
||||
# Load configuration file if it exists
|
||||
load_config() {
|
||||
local config_file="$1"
|
||||
|
||||
if [[ -f "$config_file" ]]; then
|
||||
# Source the config file
|
||||
source "$config_file"
|
||||
echo "Configuration loaded from $config_file"
|
||||
return 0
|
||||
else
|
||||
echo "Warning: Configuration file '$config_file' not found. Using defaults."
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Process directories in parallel
|
||||
process_directories_parallel() {
|
||||
local dir="$1"
|
||||
local max_procs="${2:-4}" # Default to 4 processes if not specified
|
||||
local temp_dir=$(mktemp -d)
|
||||
local pids=()
|
||||
local count=0
|
||||
local result=()
|
||||
|
||||
if [[ ! -d "$dir" ]]; then
|
||||
log "ERROR" "Directory '$dir' not found in parallel processing."
|
||||
return 1
|
||||
}
|
||||
|
||||
# If PARALLEL_PROCESSES is 0, use available CPU cores
|
||||
if [[ "$max_procs" -eq 0 ]]; then
|
||||
max_procs=$(nproc 2>/dev/null || echo 4)
|
||||
fi
|
||||
|
||||
log "DEBUG" "Processing directory '$dir' with $max_procs parallel processes"
|
||||
|
||||
# Get all directories to process
|
||||
local all_dirs=()
|
||||
for d in "$dir"/*; do
|
||||
if [[ -d "$d" ]]; then
|
||||
all_dirs+=("$d")
|
||||
fi
|
||||
done
|
||||
|
||||
local total_dirs=${#all_dirs[@]}
|
||||
log "DEBUG" "Found $total_dirs directories to process"
|
||||
|
||||
# Process in batches based on max_procs
|
||||
for ((i=0; i<total_dirs; i++)); do
|
||||
local d="${all_dirs[i]}"
|
||||
local base=$(basename "$d")
|
||||
local output_file="$temp_dir/$count"
|
||||
|
||||
# Process directory in background
|
||||
{
|
||||
remove_flag=false
|
||||
# Check if the directory name contains any undesirable word (case-insensitive)
|
||||
for word in "${words[@]}"; do
|
||||
if echo "$base" | grep -qi "$word"; then
|
||||
remove_flag=true
|
||||
break
|
||||
fi
|
||||
done
|
||||
|
||||
if $remove_flag; then
|
||||
echo "REMOVE:$d"
|
||||
else
|
||||
echo "KEEP:$d"
|
||||
fi
|
||||
} > "$output_file" &
|
||||
|
||||
pids+=($!)
|
||||
((count++))
|
||||
|
||||
# If we've reached max_procs or this is the last directory, wait for processes to finish
|
||||
if [[ ${#pids[@]} -eq $max_procs || $i -eq $((total_dirs-1)) ]]; then
|
||||
for pid in "${pids[@]}"; do
|
||||
wait "$pid"
|
||||
done
|
||||
|
||||
# Read results
|
||||
for ((j=0; j<${#pids[@]}; j++)); do
|
||||
local file="$temp_dir/$j"
|
||||
while IFS= read -r line; do
|
||||
result+=("$line")
|
||||
done < "$file"
|
||||
done
|
||||
|
||||
# Reset for next batch
|
||||
pids=()
|
||||
count=0
|
||||
fi
|
||||
done
|
||||
|
||||
# Clean up temporary directory
|
||||
rm -rf "$temp_dir"
|
||||
|
||||
# Output results
|
||||
for line in "${result[@]}"; do
|
||||
echo "$line"
|
||||
done
|
||||
}
|
||||
|
||||
# Process command-line flags
|
||||
while [[ $# -gt 0 && "$1" == --* ]]; do
|
||||
case "$1" in
|
||||
--dry-run)
|
||||
DRY_RUN=true
|
||||
shift
|
||||
;;
|
||||
--threshold)
|
||||
SIMILARITY_THRESHOLD="$2"
|
||||
shift 2
|
||||
;;
|
||||
--config)
|
||||
CONFIG_FILE="$2"
|
||||
shift 2
|
||||
;;
|
||||
--log-file)
|
||||
LOG_FILE="$2"
|
||||
shift 2
|
||||
;;
|
||||
--log-level)
|
||||
LOG_LEVEL="$2"
|
||||
shift 2
|
||||
;;
|
||||
--parallel)
|
||||
PARALLEL_PROCESSES="$2"
|
||||
shift 2
|
||||
;;
|
||||
--help)
|
||||
echo "Usage: $0 [OPTIONS] [<dir1> <dir2> <words_file>]"
|
||||
echo
|
||||
echo "OPTIONS:"
|
||||
echo " --dry-run Preview actions without deleting any directories"
|
||||
echo " --threshold <value> Set the fuzzy similarity threshold (default: 0.8)"
|
||||
echo " --config <file> Specify a configuration file to use"
|
||||
echo " --log-file <file> Specify a log file to use"
|
||||
echo " --log-level <level> Set log level: DEBUG, INFO, WARNING, ERROR"
|
||||
echo " --parallel <num> Number of parallel processes (0 = auto)"
|
||||
echo " --help Display this help message"
|
||||
echo
|
||||
echo "If no directories and words file are specified, values from config file will be used."
|
||||
exit 0
|
||||
;;
|
||||
*)
|
||||
echo "Unknown option: $1"
|
||||
echo "Use --help for usage information."
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
# Load configuration file if specified
|
||||
if [[ -f "$CONFIG_FILE" ]]; then
|
||||
load_config "$CONFIG_FILE"
|
||||
fi
|
||||
|
||||
# Initialize logging
|
||||
if [[ "$LOG_ENABLED" == "true" ]]; then
|
||||
touch "$LOG_FILE"
|
||||
log "INFO" "Logging initialized"
|
||||
fi
|
||||
|
||||
# Check if arguments were provided to override config values
|
||||
if [ $# -eq 3 ]; then
|
||||
DIR1="$1"
|
||||
DIR2="$2"
|
||||
WORDS_FILE="$3"
|
||||
log "INFO" "Using command-line arguments for directories and words file"
|
||||
elif [ $# -ne 0 ]; then
|
||||
log "ERROR" "Incorrect number of arguments"
|
||||
echo "Usage: $0 [--dry-run] [--threshold <threshold>] [--config <config_file>] [<dir1> <dir2> <words_file>]"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log "INFO" "Script started with parameters: DIR1=$DIR1, DIR2=$DIR2, WORDS_FILE=$WORDS_FILE"
|
||||
log "INFO" "Configuration: THRESHOLD=$SIMILARITY_THRESHOLD, DRY_RUN=$DRY_RUN, PARALLEL_PROCESSES=$PARALLEL_PROCESSES"
|
||||
|
||||
# Verify input paths
|
||||
if [ ! -d "$DIR1" ]; then
|
||||
log "ERROR" "Directory '$DIR1' not found."
|
||||
echo "Error: Directory '$DIR1' not found."
|
||||
exit 1
|
||||
fi
|
||||
if [ ! -d "$DIR2" ]; then
|
||||
log "ERROR" "Directory '$DIR2' not found."
|
||||
echo "Error: Directory '$DIR2' not found."
|
||||
exit 1
|
||||
fi
|
||||
if [ ! -f "$WORDS_FILE" ]; then
|
||||
log "ERROR" "Words file '$WORDS_FILE' not found."
|
||||
echo "Error: Words file '$WORDS_FILE' not found."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Read undesirable words (one per line) into an array, ignoring blank lines.
|
||||
mapfile -t words < <(grep -v '^[[:space:]]*$' "$WORDS_FILE")
|
||||
log "INFO" "Loaded ${#words[@]} undesirable words from $WORDS_FILE"
|
||||
|
||||
echo "=== Pre-filtering Directories by Undesirable Words ==="
|
||||
log "INFO" "Starting parallel directory filtering"
|
||||
|
||||
# Process directories in parallel and process results
|
||||
filtered_dirs=()
|
||||
|
||||
# Process DIR1
|
||||
log "INFO" "Processing directories in $DIR1"
|
||||
while IFS= read -r line; do
|
||||
if [[ "$line" == KEEP:* ]]; then
|
||||
dir="${line#KEEP:}"
|
||||
filtered_dirs+=("$dir")
|
||||
log "DEBUG" "Keeping directory: $dir"
|
||||
elif [[ "$line" == REMOVE:* ]]; then
|
||||
dir="${line#REMOVE:}"
|
||||
log "INFO" "Removing directory with undesirable word: $dir"
|
||||
if $DRY_RUN; then
|
||||
echo "Dry-run: would remove '$dir'"
|
||||
log "INFO" "Dry-run: would remove '$dir'"
|
||||
else
|
||||
rm -rf "$dir"
|
||||
log "INFO" "Removed '$dir'"
|
||||
echo "Removed '$dir'"
|
||||
fi
|
||||
fi
|
||||
done < <(process_directories_parallel "$DIR1" "$PARALLEL_PROCESSES")
|
||||
|
||||
# Process DIR2
|
||||
log "INFO" "Processing directories in $DIR2"
|
||||
while IFS= read -r line; do
|
||||
if [[ "$line" == KEEP:* ]]; then
|
||||
dir="${line#KEEP:}"
|
||||
filtered_dirs+=("$dir")
|
||||
log "DEBUG" "Keeping directory: $dir"
|
||||
elif [[ "$line" == REMOVE:* ]]; then
|
||||
dir="${line#REMOVE:}"
|
||||
log "INFO" "Removing directory with undesirable word: $dir"
|
||||
if $DRY_RUN; then
|
||||
echo "Dry-run: would remove '$dir'"
|
||||
log "INFO" "Dry-run: would remove '$dir'"
|
||||
else
|
||||
rm -rf "$dir"
|
||||
log "INFO" "Removed '$dir'"
|
||||
echo "Removed '$dir'"
|
||||
fi
|
||||
fi
|
||||
done < <(process_directories_parallel "$DIR2" "$PARALLEL_PROCESSES")
|
||||
|
||||
log "INFO" "Filtered directories remaining: ${#filtered_dirs[@]}"
|
||||
|
||||
# Function: Normalize and clean a directory name.
|
||||
clean_name() {
|
||||
local name="$1"
|
||||
echo "$name" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9 ]//g' | xargs
|
||||
}
|
||||
|
||||
# Function: Compute fuzzy similarities between a target and a list of strings using multiprocessing.
|
||||
compute_similarities() {
|
||||
local target="$1"
|
||||
shift
|
||||
# Pass target and the list of representatives as command-line arguments to Python.
|
||||
python3 - "$target" "$@" <<EOF
|
||||
import sys
|
||||
from difflib import SequenceMatcher
|
||||
from multiprocessing import Pool, cpu_count
|
||||
|
||||
target = sys.argv[1]
|
||||
reps = sys.argv[2:]
|
||||
|
||||
def similarity(rep):
|
||||
return SequenceMatcher(None, target, rep).ratio()
|
||||
|
||||
# Use all available CPUs
|
||||
with Pool(processes=cpu_count()) as pool:
|
||||
results = pool.map(similarity, reps)
|
||||
|
||||
print(" ".join(map(str, results)))
|
||||
EOF
|
||||
}
|
||||
|
||||
echo "=== Grouping Remaining Directories by Fuzzy Similarity ==="
|
||||
log "INFO" "Starting fuzzy similarity grouping"
|
||||
|
||||
# Initialize grouping arrays.
|
||||
declare -a group_rep=() # Array for representative cleaned names.
|
||||
declare -A groups=() # Associative array: groups[i] holds newline-separated directory paths.
|
||||
|
||||
# Group the directories in filtered_dirs.
|
||||
total_dirs=${#filtered_dirs[@]}
|
||||
log "INFO" "Grouping $total_dirs directories based on similarity threshold $SIMILARITY_THRESHOLD"
|
||||
|
||||
for d in "${filtered_dirs[@]}"; do
|
||||
base=$(basename "$d")
|
||||
cleaned=$(clean_name "$base")
|
||||
added=false
|
||||
if [ "${#group_rep[@]}" -gt 0 ]; then
|
||||
# Compute similarities between the cleaned name and all group representatives concurrently.
|
||||
similarities=$(compute_similarities "$cleaned" "${group_rep[@]}")
|
||||
read -r -a sims <<< "$similarities"
|
||||
for i in "${!sims[@]}"; do
|
||||
if (( $(echo "${sims[$i]} >= $SIMILARITY_THRESHOLD" | bc -l) )); then
|
||||
groups["$i"]+=$'\n'"$d"
|
||||
log "DEBUG" "Added '$d' to group $i (${group_rep[$i]})"
|
||||
added=true
|
||||
break
|
||||
fi
|
||||
done
|
||||
fi
|
||||
if [ "$added" = false ]; then
|
||||
new_index=${#group_rep[@]}
|
||||
group_rep+=("$cleaned")
|
||||
groups["$new_index"]="$d"
|
||||
log "DEBUG" "Created new group $new_index with representative '$cleaned'"
|
||||
fi
|
||||
done
|
||||
|
||||
log "INFO" "Created ${#group_rep[@]} groups after fuzzy similarity matching"
|
||||
|
||||
echo "=== Resolution Preference Filtering ==="
|
||||
log "INFO" "Starting resolution preference filtering"
|
||||
|
||||
# For each group, if one directory contains "2160p" and another contains "1080p",
|
||||
# remove the 1080p directory(ies).
|
||||
for key in "${!groups[@]}"; do
|
||||
IFS=$'\n' read -r -a paths <<< "${groups[$key]}" || true
|
||||
has_2160p=false
|
||||
has_1080p=false
|
||||
for path in "${paths[@]}"; do
|
||||
base=$(basename "$path")
|
||||
if echo "$base" | grep -qi "2160p"; then
|
||||
has_2160p=true
|
||||
log "DEBUG" "Found 2160p in group $key: $path"
|
||||
fi
|
||||
if echo "$base" | grep -qi "1080p"; then
|
||||
has_1080p=true
|
||||
log "DEBUG" "Found 1080p in group $key: $path"
|
||||
fi
|
||||
done
|
||||
if $has_2160p && $has_1080p; then
|
||||
log "INFO" "Group $key (representative: ${group_rep[$key]:-unknown}) has both 1080p and 2160p directories"
|
||||
echo "Group (representative: ${group_rep[$key]:-unknown}) has both 1080p and 2160p directories."
|
||||
new_group=()
|
||||
for path in "${paths[@]}"; do
|
||||
base=$(basename "$path")
|
||||
if echo "$base" | grep -qi "1080p"; then
|
||||
log "INFO" "Removing '$path' because a 2160p version is present"
|
||||
echo "Removing '$path' because a 2160p version is present."
|
||||
if $DRY_RUN; then
|
||||
echo "Dry-run: would remove '$path'"
|
||||
log "INFO" "Dry-run: would remove '$path'"
|
||||
else
|
||||
rm -rf "$path"
|
||||
log "INFO" "Removed '$path'"
|
||||
echo "Removed '$path'"
|
||||
fi
|
||||
else
|
||||
new_group+=("$path")
|
||||
fi
|
||||
done
|
||||
groups["$key"]=$(printf "%s\n" "${new_group[@]}")
|
||||
fi
|
||||
done
|
||||
|
||||
echo "=== Interactive Duplicate Resolution ==="
|
||||
log "INFO" "Starting interactive duplicate resolution"
|
||||
|
||||
# For each group that still contains more than one directory, prompt the user to select one to remove.
|
||||
for key in "${!groups[@]}"; do
|
||||
IFS=$'\n' read -r -a paths <<< "${groups[$key]}" || true
|
||||
# Filter out directories that no longer exist.
|
||||
existing=()
|
||||
for path in "${paths[@]}"; do
|
||||
if [ -d "$path" ]; then
|
||||
existing+=("$path")
|
||||
fi
|
||||
done
|
||||
if [ "${#existing[@]}" -gt 1 ]; then
|
||||
log "INFO" "Prompting user for duplicate group: ${group_rep[$key]:-unknown}"
|
||||
echo "Duplicate group (representative: ${group_rep[$key]:-unknown}):"
|
||||
i=1
|
||||
for p in "${existing[@]}"; do
|
||||
echo " [$i] $p"
|
||||
((i++))
|
||||
done
|
||||
echo -n "Enter the number of the directory you want to remove (or 0 to skip): "
|
||||
read -r choice
|
||||
if [[ "$choice" =~ ^[0-9]+$ ]] && [ "$choice" -gt 0 ] && [ "$choice" -le "${#existing[@]}" ]; then
|
||||
dir_to_remove="${existing[$((choice-1))]}"
|
||||
log "INFO" "User selected to remove: $dir_to_remove"
|
||||
if $DRY_RUN; then
|
||||
echo "Dry-run: would remove '$dir_to_remove'"
|
||||
log "INFO" "Dry-run: would remove '$dir_to_remove'"
|
||||
else
|
||||
rm -rf "$dir_to_remove"
|
||||
log "INFO" "Removed '$dir_to_remove'"
|
||||
echo "Removed '$dir_to_remove'"
|
||||
fi
|
||||
else
|
||||
log "INFO" "User skipped removal for group: ${group_rep[$key]:-unknown}"
|
||||
echo "No removal selected for this group."
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
log "INFO" "Script completed successfully"
|
||||
echo "Script completed. See $LOG_FILE for detailed log."
|
Loading…
x
Reference in New Issue
Block a user