Does anyone know of any off the shelf tool (online or offline) to find duplicates in several DNS blocklists and merge them into one?

Context: I am running AdGuard on one GL.iNet router with ~10 blocklists some of them pretty huge and most of the times the lists are updated the router comes to one halt while doing so, having to often times reboot it through the old power-off-and-on.

I would rather download the lists myself from time to time and merge them into one file but with duplicates extracted somehow.

  • @easeKItMAn
    link
    English
    7
    edit-2
    8 months ago

    If I’m understanding you correctly, you could make use of a shell script for this. Use WGET to download lists, then combine them into a single large file, and finally create a new file with no duplicates by using “awk ‘!visited[$0]++’”

    wget URL1 URL2 URL3
    cat *.txt > all.txt (This overwrites all.txt)
    awk ‘!visited[$0]++’ all.txt > no_duplicates.txt

    • @BinaryUnitOP
      link
      English
      28 months ago

      When no tool is available bash to the rescue, thank you for this it seems actually simpler then I thought :)

  • @CarbonatedPastaSauce
    link
    English
    38 months ago

    I doubt you’ll find something off the shelf for this. I wrote a powershell script that deduplicates lists and also does a pass over the results to convert any blocks to CIDR notation. If you’re interested I’ll share it.

    But honestly you could probably have ChatGPT whip this up for you in your language of choice. It’s pretty straightforward.

    • @nyar
      link
      English
      08 months ago

      I’d like to see your script.

      • @CarbonatedPastaSauce
        link
        English
        2
        edit-2
        8 months ago

        Sorry it took a while, I’m currently on vacation! But I had some time to reread it and sanitize it for public sharing. Here you go:

        ok yikes, Lemmy really didn’t like me pasting all that code even in a code block. I’ll have to put it up somewhere else, stand by.

        Hopefully this works better: Pastebin link

    • @BinaryUnitOP
      link
      English
      18 months ago

      Thank you this looks promising

    • @BinaryUnitOP
      link
      English
      28 months ago

      This is very helpful thank you :)