I just developed and published a script to clear your pict-rs object storage from potential CSAM.

db0 · edit-2 2 years ago

I just developed and published a script to clear your pict-rs object storage from potential CSAM.

@[email protected] · 2 years ago

Well, we have hashing algorithms that do exactly that, like phash for example.

@[email protected] · 2 years ago

Definitely. A lot of the good algorithms used by big services are proprietary though, unfortunately.

@[email protected] · 2 years ago

Can you point me to some of them? I’m quite interested in visual hashing.

@[email protected] · edit-2 2 years ago

Microsoft’s PhotoDNA is probably the most well-known. Every major service that has user-generated content uses it. Last I checked, it wasn’t open-source. It was built for detecting CSAM, but it’s really just a general-purpose similarity hashing algorithm.

Meta has some algorithms that are open-source: https://about.fb.com/news/2019/08/open-source-photo-video-matching/

Google has CSAI Match for hash-matching of videos and Google Content Safety API for classification of new content, but both are proprietary.

db0 · edit-2 2 years ago

There’s better approaches than hashing. For comparing images I am calculating “distance” in tensors between them. This can match even when compression artifacts are involved or the images are slightly altered.

I just developed and published a script to clear your pict-rs object storage from potential CSAM.

I just developed and published a script to clear your pict-rs object storage from potential CSAM.

GitHub - Haidra-Org/lemmy-safety: A script that goes through a lemmy pict-rs object storage and tries to prevent illegal or unethical content