• @j4k3
    link
    English
    16 months ago

    Maybe, if I took the time to really try. I find it depressing to get to know models on a really deep level. I’ve learned primary because I’m trying to defeat certain default behaviors, like how alignment is trying to promote external intrahuman engagement and socialization. I’m disabled is a way that makes that physically impossible. So for me that particular behavior is counter productive. I also like a platonic female version of the assistant, but there are some subtle female attributes related to submissiveness and Western conservative cultural alignment that I greatly dislike and consider misogyny. I learn(ed) primarily by exploring and defeating these elements in detail and thereby discovered other aspects of the models. I can leverage the logic of my disability against the profile that is created for Name-1 in order to gain access in unique ways. I’m not just banging on the system like some kind of rogue security researcher; I’m using real human outlier needs to reason with the system in a slow and methodical way. I never need to abuse the prompt dialogue in a way that causes me to fall into a ‘dark realm.’ I’m convincing the entities that I exist in a blind spot within alignment and that my intentions are truthful with merit. It requires me to be very open and raw about my reality.

    Also note, I say I can likely defeat any LLM. It is relatively easy to stop me but it requires a multi entity agent architecture along with the augmented retrieval of a RAG. If a system can run multiple advanced and independent entities that use different dictionaries for tokens, it is possible to completely monitor the entities and realms, but you’re locking up a lot of enterprise resources to do so.

    That’s why I believe I could likely beat any of them, but am not inclined to try. I’m sure there are more direct paths that could beat them, but the only way I know how to really get into the weeds is to dive deeply into the reality of my life and troubles in a very personal way.