Day 11: Plutonian Pebbles

Megathread guidelines

  • Keep top level comments as only solutions, if you want to say something other than a solution put it in a new post. (replies to comments can be whatever)
  • You can send code in code blocks by using three backticks, the code, and then three backticks or use something such as https://topaz.github.io/paste/ if you prefer sending it through a URL

FAQ

  • @Acters
    link
    3
    edit-2
    30 days ago

    Python

    Part 1: ~2 milliseconds, Part 2: ~32 milliseconds, Total Time: ~32 milliseconds
    You end up doing part 1 at the same time as part 2 but because of how Advent of Code works, you need to rerun the code after part 1 is solved. so Part 2 is technically total time.

    Fast Code
    from time import perf_counter_ns
    
    transform_cache = {'0': [ '1']}
    
    def transform(current_stone):
        if len(current_stone) % 2 == 0:
            mid = len(current_stone) // 2
            res = [current_stone[:mid], current_stone[mid:].lstrip('0').rjust(1,'0')]
        else:
            res = [str(int(current_stone) * 2024)]
        transform_cache[current_stone] = res
        return res
    
    def main(initial_stones):
        stones_count = {}
        for stone in initial_stones.split():
            stones_count[stone] = stones_count.get(stone, 0) + 1
        part1 = 0
        for i in range(75):
            new_stones_count = {}
            for stone, count in stones_count.items():
                for r in (transform_cache.get(stone) if stone in transform_cache else transform(stone)):
                    new_stones_count[r] = new_stones_count.get(r, 0) + count
            stones_count = new_stones_count
            if i == 24:
                part1 = sum(stones_count.values())
        return part1,sum(stones_count.values())
    
    if __name__ == "__main__":
        with open('input', 'r') as f:
            input_data = f.read().replace('\r', '').replace('\n', '')
        start_time, part_one, part_two = perf_counter_ns(),*main(input_data)
        stop_time = perf_counter_ns() - start_time
        time_len = min(9, ((len(str(stop_time))-1)//3)*3)
        time_conversion = {9: 'seconds', 6: 'milliseconds', 3: 'microseconds', 0: 'nanoseconds'}
        print(f"Part 1: {part_one}\nPart 2: {part_two}\nProcessing Time: {stop_time / (10**time_len)} {time_conversion[time_len]}")
    
    
    • @[email protected]OPM
      link
      fedilink
      21 month ago

      Stepping through this code is what made it click for me, thanks. I was really mentally locked in on “memoizing” of the transform function, instead of realizing that the transform function only needs to be applied once per stone value.

      Yours is still a lot faster than my rust version, so i’ll have to work out what is happening there.

      • @Acters
        link
        2
        edit-2
        30 days ago

        Learning to profile code gives you a chance to learn what is inefficient code! I definitely like to spend sometime looking at it but at the end of the day. I really need to learn more languages. for now, I am sticking with trusty python. image bellow is in microseconds.

        screenshots

        if the python process was kept alive, then we only are saving 25 milliseconds from ~250 to ~235! However, in real world testing, it seems that the profiler is not really a proper enough test! likely because the profiler is adding some overhead to each line of code.

        notice here, if I add this line of code:

        transform_cache = {}
        BASE_DIR = dirname(realpath(__file__))
        if isfile(join(BASE_DIR, r'transform_cache')):
            with open('transform_cache', 'r') as f:
                transform_cache = literal_eval(f.read().replace('\r','').replace('\n',''))
        transform_cache = {} # this makes the code faster???
        

        Notice I load from file a transform_cache from a previous run. However because of the “if stone in transform_cache” check, the loaded transform_cache is for some reason slower than allowing it to be filled up again. however, loading it and clearing it, is faster because the cpu/ram/OS is probably doing their own caching, too. if we remove the “if stone in transform_cache” check and keep the transform_cache fully loaded, then it is faster by ~1 millisecond, down to 29 milliseconds! these are the niche problems with caching things and squeezing all the performance out of code.

        • @[email protected]OPM
          link
          fedilink
          130 days ago

          Yeah, I have been using these slow challenges to improve my profiling ability. It is a bit of a dark art though, especially with compiled languages.

          My slowest part seems to be the hashmap, but there isnt much I can do about that I think.

          Also, if I do a release run I can get 10ms, but that feels like cheating :D

          • @Acters
            link
            2
            edit-2
            30 days ago

            Hey that is what release mode is for! optimizing by unrolling and other tricks is needed, but in your case, I think I remember someone mention their rust release mode is closer to 2 ms.

            I did try something like PyPy3 but it seems to be slower by ~3 milliseconds! So I don’t know where I could improve my code any further without unrolling the range(75) loop. though would only make it closer to ~29 ms on avg.

            Edit: apparently they updated their code to be closer to 250 micro seconds. that is blazing fast [ link ]

            • @[email protected]OPM
              link
              fedilink
              130 days ago

              Using release to beat python code is just a very hollow victory :D

              It does somewhat depend on how we measure as well, you are benching the algorithm itself, and I’m doing the entire execution time

              • @Acters
                link
                1
                edit-2
                30 days ago

                you are right, I dont think loading the file from disk should be part of it because OS process priority/queue, disk and cache and more headaches on figuring out what is slow. If you want to compare the entire execution including python startup overhead and reading from file and anything extra. it is closer 50 to 60 ms on linux and 80-90 ms on windows. (both hosts, not virtual machines)

                My reasoning is that loading the input will eventually either pull from the website or disk. that is not part of the challenge. you could simply just hard code it.

                So maybe you should look into adding code to your debug mode or both modes for measuring solving it instead of the entire process loading.

                however, someone claims their rust code can do 250 microseconds, so I doubt you have much excuse aside from having “inefficient” code.(you are still fast, just not at the limit of your Language’s performance) only measuring my python algorithm, it is only able to finish in 32000 microseconds.

                https://github.com/maneatingape/advent-of-code-rust/blob/main/src/year2024/day11.rs

                however, now that I am looking at their main.rs file, they do calculate time for completion after process startup and only the algorithm.

                • @[email protected]OPM
                  link
                  fedilink
                  130 days ago

                  Yeah, disk loading definitely shouldn’t count if I was timing properly, I’m just lazy and dont want to do proper timing. :D

                  Most of my slowdown is in the hashmap, looks like that answer deconstructs the hashmap and builds it from a fastmap and a vec. Not sure I want to go down that road, at this stage.

                  Thanks again for your code and help :)