A lot lower success rate than I suspected, I guess a lot of the scoreboard times were probably legit?

  • @Acters
    link
    35 days ago

    I found taking the straight text from the website to GPT o1 can solve it but sometimes GPT o1 produces code that fails to be efficient. so challenges that had scaling on part 2( blinking rocks and lanternfish ) or other ways to cause you to have a hard time creating a fast solution(like the towel one and day 22) are places where they would struggle a lot.

    day 12 with the perimeter and all the extra minute details also causes GPT trouble. So does the day 14, especially the easter egg where you need to step through to find it but GPT can’t really solve it because there is not enough context for the tree unless you do some digging on how it should look like.

    these were some casual observations. clearly there is more to do to test out, but it shows that these are big struggle points. If we are talking about getting on the leaderboard, then I am glad there are challenges like these in AoC where not relying on an llm is better.