I’ve been using Qwen 2.5 Coder (bartowski/Qwen2.5.1-Coder-7B-Instruct-GGUF) for some time now, and it has shown significant improvements compared to previous open weights models.

Notably, this is the first model that can be used with Aider. Moreover, Qwen 2.5 Coder has made notable strides in editing files without requiring frequent retries to generate in the proper format.

One area where most models struggle, including this one, is when the prompt exceeds a certain length. In this case, it appears that the model becomes unable to remember the system prompt when the prompt length is above ~2000 tokens.

  • @[email protected]OP
    link
    fedilink
    English
    38 days ago

    I have found the problem with the cut off, by default aider only sends 2048 tokens to ollama, this is why i have not noticed it anywhere else except for coding.

    When running /tokens in aider:

    $ 0.0000   16,836 tokens total
               15,932 tokens remaining in context window
               32,768 tokens max context window size
    

    Even though it will only send 2048 tokens to ollama.

    To fix it i needed to add a file .aider.model.settings.yml to the repository:

    - name: aider/extra_params
      extra_params:
        num_ctx: 32768
    
    • @brucethemoose
      link
      English
      17 days ago

      That’s because ollama’s default max ctx is 2048, as far as I know.