Hey Linux community,

I’m struggling with a file management issue and hoping you can help. I have a large media collection spread across multiple external hard drives. Often, when I’m looking for a specific file, I can’t remember which drive it’s on.

I’m looking for a file indexing and search tool that meets the following requirements:

  • Ability to scan multiple locations
  • Option to exclude specific folders or subfolders from both scan and search
  • File indexing for quicker searches
  • Capability to search indexed files even when the original drive is disconnected
  • Real-time updates as files change

Any recommendations for tools that meet most or all of these criteria? It would be a huge help in organizing and finding my media files.

Thanks in advance for any suggestions!

  • @solrize
    link
    3
    edit-2
    1 month ago

    [search indexed files that are offline] One would hope this is not possible.

    I think the idea is store the search index in a separate place from the file. For indexing text though, I’ve found that the index is comparable in size to the file itself. It’s not entirely clear to me what OP wants to search. Something like email? Obviously if it’s just metadata for media files (kilobyte text description of a gigabyte video) then the search index can be tiny.

    Real-time updates as files change

    Would require non-portable script that stores each file’s mtime in an array and compares the old mtime against the new mtime using stat, and then loop. Maybe implement as a daemon.

    That is what inotify is for.

    I realize your overall answer was mostly snark, but the problems mentioned really do take some work to solve. For example, if you want to index email, you want the indexer to understand email headers so it can do the right things with the timestamps and other fields. You can’t just chuck everything into a big generic search engine and press “blend”.

    I will mention git-annex which is for sort of a different problem, but it can help you manually track where your offline files are, more or less.

    • @[email protected]
      link
      fedilink
      1
      edit-2
      1 month ago

      Sorry I have .world blocked so I didn’t see your reply until now (wish I could block instances without blocking instance replies, but whatever)

      It’s not entirely clear to me what OP wants to search. Something like email? Obviously if it’s just metadata for media files (kilobyte text description of a gigabyte video) then the search index can be tiny.

      Yeah I amended my post earlier to recommend logging with a domain specific unmount script, but I don’t know why they want to do this.

      I realize your overall answer was mostly snark

      Apparently I’m so good at trolling I troll people even when I’m not trying to troll. :<

      This is what inotify is for

      If inotify works for you, that’s fine. I don’t have any experience with it, maybe I’ll look into it after this, if the usecase ever comes up.

      You can’t just chuck everything into a big generic search engine and press “blend”

      Eh, regex (EREs) is good enough for 99% of usecases honestly. For the 1%, consider using an easier to parse file format.

      • @solrize
        link
        1
        edit-2
        1 month ago

        Yeah I amended my post earlier to recommend logging with a domain specific unmount script, but I don’t know why they want to do this.

        They have umpty jillion terabytes of video on a shelf full of external HDD’s and they want to know what files are on which drives. In the old days we had racks full of mag tapes and had the same issue. It’s not something new.

        For info about inotify, try web search.

        For text search, you start needing real indexing once you’re over maybe a GB of text. Before that, you can live with grep or SQL tables or whatever.