2021-05-17

Manually Creating a Rudimentary Searchable Image Tagging System

This post is the third in a series of three posts about some changes I have been making in my personal life with respect to how I interact with online social media platforms. When I published the second post in this series, I was on the lookout for secure privacy-respecting cloud storage services. As of this post, I still haven't committed to a specific service. One of my requirements has been that the service should allow me to share certain files or folders securely with others. Unfortunately, unless I use a service like Google Drive which has just as little respect for data privacy as Facebook does, it isn't clear how I can easily tag images with details about people, location, and other comments in a way that I or others can easily search. My proposed solution, involving BASH scripts, is far from perfect, it is very much a work in progress, and it is arguably somewhat specific to my particular situation. Follow the jump to see more details.

I don't take that many photos, so my collection of photos has been relatively easy to manage. In particular, I have grouped files according to the following folder structure. The top-level folder (which in my case is called "Pictures" but whose name is ultimately irrelevant) contains the scripts in question and a bunch of subfolders labeled only by a year number. Each subfolder numbered by a year contains more subfolders which will generally be organized according to particular events that year. Each of those lower-level subfolders in turn will contain pictures correspond to that event (or those events, if pictures from multiple events were organized together in a single folder).

My rudimentary image tagging system depends on such a file structure. In particular, tags are not stored separately for each individual image file. Instead, tags are stored for all image files within a given lower-level subfolder in a single text file. These text files are created using the following BASH script. Note that the script, as currently written, only works if run from one of the upper-level subfolders in the aforementioned structure. It can easily be modified to work from the top-level folder; I've just been too lazy to do that.

#!/bin/bash

# get list of all folders, treating spaces in names carefully
IFS=$'\n'; dirlist=($(ls -d */ | sed '/\./d;s%/$%%')); unset IFS

# add current folder to the list too (off for now)
#dirlist+=(./)

# ensure consistent parent folder
parentdir=$(pwd)

# iterate over list of all folders
for currdir in "${dirlist[@]}"; do

  # change current folder to child folder
  cd "${parentdir}/${currdir}"

  # get all image files in current folder (with common extensions)
  IFS=$'\n'; currimgfilelist=($(ls *.bmp *.gif *.jpg *.jpeg *.JPG *.JPEG *.png *.svg *.tiff)); unset IFS

  # if there are any image files in current folder
  if [ -n "${currimgfilelist[0]}" ]; then

    imgdatafile="${parentdir}/${currdir}/${currdir}_imgdata.txt"

    # if there isn't a file containing image data, create it as empty
    if [ ! -f "${imgdatafile}" ]; then
      :> "${imgdatafile}"
    fi
    
    for currimgfile in "${currimgfilelist[@]}"; do
      if [ -z "$(grep "${currimgfile}" "${imgdatafile}")" ]; then
        printf "{FILE:%s} {PEOPLE:} {DATE:} {LOCATION:} {OCCASION:} {COMMENTS:} {PHOTOGRAPHER:}\n" "${currimgfile}" >> "${imgdatafile}"
      fi
    done
  fi
 
  # change current folder to consistent parent folder
  cd "${parentdir}"
done

It is easy to see that each line of the file is of the form "{FILE:filename} {PEOPLE:} {DATE:} {LOCATION:} {OCCASION:} {COMMENTS:} {PHOTOGRAPHER:}", in which "filename" is replaced by the name (including the file type extension) of the current file. None of the other fields are automatically filled; it is incumbent upon the user to manually edit the text file to fill in these fields, though things like the date, location, and occasion can perhaps be filled in more easily by doing find-and-replace operations on large chunks of text.

It is critical that the created text files containing image tags stay associated with the same set of images in the same folder, because searching for images later based on those tags would otherwise become difficult (if not impossible). In particular, searching is done through the following BASH script. Note that the script, as currently written, only works if run from one of the upper-level subfolders in the aforementioned structure. It can easily be modified to work from the top-level folder; I've just been too lazy to do that. Furthermore, it depends on using the image viewer Viewnior (which is what I use in my installation of Linux Mint 20 "Ulyana" MATE). It can easily be modified to work with a different image viewer or with a user-specified program; I've just been too lazy to do that.

#!/bin/bash

# this script should be placed in a folder which contains
# multiple folders, each of which contains at least one
# image file and a corresponding image data text file which
# should have been created by psv_imagetagcreation.sh
# (i.e. it should be in the same format)

# $mystr should be a combination of strings between slashes
# & logical operators, all between a pair of double quotes,
# like "/person1/" or "/person1/ && /person2/" or
# "/person1/ && (! (/person2/) || /person3/)"

# this is now in working order, but currently lacks the
# ability to distinguish searches tagged as PEOPLE, DATE,
# LOCATION, OCCASION, or COMMENTS

# get the logical combination of people
mystr=$1
echo $mystr

# get list of all folders, treating spaces in names
# carefully
IFS=$'\n'; dirlist=($(ls -d */ | sed '/\./d;s%/$%%')); unset IFS

# ensure consistent parent folder
parentdir=$(pwd)

declare -a finalfilearr=()

# iterate over list of all folders
for currdir in "${dirlist[@]}"; do

  # change current folder to child folder
  cd "${parentdir}/${currdir}"

  # get image data file
  currfile=${currdir}_imgdata.txt

  # prepend all file names from the current folder with the
  # current folder
  toprepend="${currdir}/"

  # read the file listing and store all lines corresponding
  # to the logic of $mystr into the array ${mylinearr[@]}
  # (one element per line)
  IFS=$'\n' read -r -d '' -a mylinearr < <(awk "$mystr" ${currfile} && printf '\0')

  # extract file names from corresponding lines, and
  # prepend appropriately
  currfilearr=( $( printf "%s\n" "${mylinearr[@]}" | cut -f1 -d'}' | cut -c 7- ) )
  currfilearr=(${currfilearr[@]/#/$toprepend})
  finalfilearr+=( ${currfilearr[@]} )

  # change current folder to consistent parent folder
  cd "${parentdir}"
done

# open all files as selected in the loop
viewnior "${finalfilearr[@]}"

As described in the comments of this script, the input for this script comes as an input for AWK to specify what strings to search for or not. (Typically, these would be people's names.) Apart from the aforementioned limitation with respect to folder structure (owing to my laziness), this search program cannot distinguish people's names when they occur in the PEOPLE, LOCATION, OCCASION, or COMMENTS fields, so searching for a person's name (without specifying intersections to narrow the search results) may yield many pictures that are related to the person but don't actually include the person directly. This is because it is in principle possible to have arbitrary Boolean logical combinations of these fields for search, and I haven't figured out how to allow for all of these combinations or whether to restrict the allowable combinations in a particular way.

Thus, the aforementioned limitations mean there is much work that remains to be done for both of these BASH scripts. I welcome any suggestions that any readers may have for improvements to these scripts or to the image tagging system in general (that can be readable with free software across different machines without much else to install).