Most AI tools try to replace your thinking. I built one that doesn't

SuspciousCarrot78 · edit-2 3 days ago

Most AI tools try to replace your thinking. I built one that doesn't

SuspciousCarrot78 · edit-2 1 day ago

Everything you see - every feature - is everything I use. None of it is ornamental.

But my head is in the code right now, so I don’t “use it” so much as try to break it and then fix it.

The end game is a local, expert system, that I can rely on, automate and audit. Because I built it and know exactly how it works.

If you’re asking for my most common uses for it right now (outside of kicking it and then picking it back up)

sentiment analysis ("what did they mean in this email by…)
document analysis
word etymology (I got the language thing with my ASD)
pilot project (see: https://lemmy.world/comment/22058968)
To-do lists
THINKING (and this is a big one for me: I’ll pose a problem, it will rubber duck it with me)
all the side cars (calculations, currency look ups, weather etc)
drafting ideas and research
shooting the shit when bored (local version of Claude-in-a-can is a bit more advanced then what’s on repo; not stable yet. But when it cooks, fuck me it cooks. Will not push it till it’s 100%).

Basically, all the shit you would ideally like to use an LLM for, but self hosted, private and non-bullshitty. I run on a potato (so don’t really use it for coding very much) but if you have a better rig than mine and can run bigger models - the router is agnostic and it should just work ™.

TLDR;

What I’m building towards: a local expert system that picks its own tools (I coded), executes them (how I taught it to), and gives me a single-line audit receipt for every decision (that I can check if it smells funny). I ask a question, the system decides whether to calculate, look up, search, retrieve from my docs, or reason from scratch - then tells me exactly which path it took and why. Think ChatGPT convenience but with a paper trail you can actually inspect.

And when that’s done…I’m probably stick it in a robot. Because why not? :)

https://github.com/poboisvert/GPTARS_Interstellar

(or tee it up with Home-Assistant)

PS: If you want to know the why behind this whole thing -

https://codeberg.org/BobbyLLM/llama-conductor/src/branch/main/DESIGN.md

PPS: Give me about … 15 mins. I’m just about to push a >>web sidecar. Needs one more tweak to properly parse DOIs / pubmed extraction. I was bored and it’s been on my TO-DO list for too long

PPPS: Those were some Planet Namek 15 minutes…but the deed is done. Enjoy

pound_heap@lemmy.dbzer0.com · 1 day ago

Nice! You kinda answered my next question already with this web tool. I was curious if you are getting any useful results from the model itself without feeding it with good data first or relying on hardcoded tools. 4b model must be really dumb for anything even little complicated. I see you recommend to run two models - is it in parallel or the router can control backend and switch models?

SuspciousCarrot78 · edit-2 24 hours ago

No, actually it’s probably one of the strongest 4Bs that you can run. On par with ChatGPT 4.1 in many benchmarks.

https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507

I use the DavidAU fine tune, which is even a touch better

https://huggingface.co/DavidAU/Qwen3-4B-Hivemind-Instruct-NEO-MAX-Imatrix-GGUF

The two models thing is a router back end switch that reduces hallucinations when using RAG. Separate but extra to the main stuff.

https://codeberg.org/BobbyLLM/llama-conductor/src/branch/main/FAQ.md#what-is-mentats

There are multiple duo / tag team orchestrations like this (eg: the vision model I use is Qwen 3-VL-4B, which does vision stuff and then feeds the output to “thinker” to work with etc).

One of the eventual goals is parallel swarm or model decomposition, with “thinker” acting as the main orchestrator.

The swarm idea is basically: instead of asking one 4B model to do everything at once (understand, retrieve, evaluate, synthesise, check its own work), you decompose the task into tiny (<1B) single-purpose workers -evidence extractors, contradiction detectors, refusal sentinels, a synthesis worker, and an arbiter (current “critic”) that makes the final call. Then the “thinker” uses that info to reason from.

Each worker is small and stupid at exactly one thing, which means it’s auditable and replaceable.

Think of it as breaking the 4B metacognitive ceiling by not asking any single model to be metacognitive.

The deterministic routing backbone stays -workers only handle the ambiguous semantic stuff that can’t be solved with pure Python. It’s not “more models = better” - it’s “right model, right job, fail-loud if they disagree.”

Basically, similar reasoning as to the research I cited in the Mentats section.

PS: when you load it up, you might notice it refers to itself internally as MoA router. That’s pulling double duty. In normal llm circles that means Mixture of Agents. In my world that means “Mixture of Assholes”. See below -

YOU (question) → ROUTER+DOCS (Ah shit, here we go again. I hate my life)

|

ROUTER+DOCS → Asshole 1: SmolLM2-135M (“I’m right”)

|

ROUTER+DOCS → Asshole 2: SmolLM2-360M (“No, I’m right”)

|

ROUTER+DOCS → Asshole 3: Gemma-3-270M (“Idiots, I’m right!”)

|

ROUTER+DOCS → Asshole 4: Qwen3-1.7B (“You’re all beneath me”)

|

ARBITER: Phi-4-mini (“Shut up, all of you.”) ← (all assholes)

|

→ THINKER: Qwen-4B (“I’m surrounded by idiots. Fine, I’ll do it myself.”)

|

ROUTER (please, let me die)

|

YOU (answer + mad cackle)