Day 11: Plutonian Pebbles
Megathread guidelines
- Keep top level comments as only solutions, if you want to say something other than a solution put it in a new post. (replies to comments can be whatever)
- You can send code in code blocks by using three backticks, the code, and then three backticks or use something such as https://topaz.github.io/paste/ if you prefer sending it through a URL
FAQ
- What is this?: Here is a post with a large amount of details: https://programming.dev/post/6637268
- Where do I participate?: https://adventofcode.com/
- Is there a leaderboard for the community?: We have a programming.dev leaderboard with the info on how to join in this post: https://programming.dev/post/6631465
Python
Part 1: ~2 milliseconds, Part 2: ~32 milliseconds, Total Time: ~32 milliseconds
You end up doing part 1 at the same time as part 2 but because of how Advent of Code works, you need to rerun the code after part 1 is solved. so Part 2 is technically total time.
Fast Code
from time import perf_counter_ns transform_cache = {'0': [ '1']} def transform(current_stone): if len(current_stone) % 2 == 0: mid = len(current_stone) // 2 res = [current_stone[:mid], current_stone[mid:].lstrip('0').rjust(1,'0')] else: res = [str(int(current_stone) * 2024)] transform_cache[current_stone] = res return res def main(initial_stones): stones_count = {} for stone in initial_stones.split(): stones_count[stone] = stones_count.get(stone, 0) + 1 part1 = 0 for i in range(75): new_stones_count = {} for stone, count in stones_count.items(): for r in (transform_cache.get(stone) if stone in transform_cache else transform(stone)): new_stones_count[r] = new_stones_count.get(r, 0) + count stones_count = new_stones_count if i == 24: part1 = sum(stones_count.values()) return part1,sum(stones_count.values()) if __name__ == "__main__": with open('input', 'r') as f: input_data = f.read().replace('\r', '').replace('\n', '') start_time, part_one, part_two = perf_counter_ns(),*main(input_data) stop_time = perf_counter_ns() - start_time time_len = min(9, ((len(str(stop_time))-1)//3)*3) time_conversion = {9: 'seconds', 6: 'milliseconds', 3: 'microseconds', 0: 'nanoseconds'} print(f"Part 1: {part_one}\nPart 2: {part_two}\nProcessing Time: {stop_time / (10**time_len)} {time_conversion[time_len]}")
Stepping through this code is what made it click for me, thanks. I was really mentally locked in on “memoizing” of the transform function, instead of realizing that the transform function only needs to be applied once per stone value.
Yours is still a lot faster than my rust version, so i’ll have to work out what is happening there.
Learning to profile code gives you a chance to learn what is inefficient code! I definitely like to spend sometime looking at it but at the end of the day. I really need to learn more languages. for now, I am sticking with trusty python. image bellow is in microseconds.
screenshots
if the python process was kept alive, then we only are saving 25 milliseconds from ~250 to ~235! However, in real world testing, it seems that the profiler is not really a proper enough test! likely because the profiler is adding some overhead to each line of code.
notice here, if I add this line of code:
transform_cache = {} BASE_DIR = dirname(realpath(__file__)) if isfile(join(BASE_DIR, r'transform_cache')): with open('transform_cache', 'r') as f: transform_cache = literal_eval(f.read().replace('\r','').replace('\n','')) transform_cache = {} # this makes the code faster???
Notice I load from file a transform_cache from a previous run. However because of the “if stone in transform_cache” check, the loaded transform_cache is for some reason slower than allowing it to be filled up again. however, loading it and clearing it, is faster because the cpu/ram/OS is probably doing their own caching, too. if we remove the “if stone in transform_cache” check and keep the transform_cache fully loaded, then it is faster by ~1 millisecond, down to 29 milliseconds! these are the niche problems with caching things and squeezing all the performance out of code.
Yeah, I have been using these slow challenges to improve my profiling ability. It is a bit of a dark art though, especially with compiled languages.
My slowest part seems to be the hashmap, but there isnt much I can do about that I think.
Also, if I do a release run I can get 10ms, but that feels like cheating :D
Hey that is what release mode is for! optimizing by unrolling and other tricks is needed, but in your case, I think I remember someone mention their rust release mode is closer to 2 ms.
I did try something like PyPy3 but it seems to be slower by ~3 milliseconds! So I don’t know where I could improve my code any further without unrolling the range(75) loop. though would only make it closer to ~29 ms on avg.
Edit: apparently they updated their code to be closer to 250 micro seconds. that is blazing fast [ link ]
Using release to beat python code is just a very hollow victory :D
It does somewhat depend on how we measure as well, you are benching the algorithm itself, and I’m doing the entire execution time
you are right, I dont think loading the file from disk should be part of it because OS process priority/queue, disk and cache and more headaches on figuring out what is slow. If you want to compare the entire execution including python startup overhead and reading from file and anything extra. it is closer 50 to 60 ms on linux and 80-90 ms on windows. (both hosts, not virtual machines)
My reasoning is that loading the input will eventually either pull from the website or disk. that is not part of the challenge. you could simply just hard code it.
So maybe you should look into adding code to your debug mode or both modes for measuring solving it instead of the entire process loading.
however, someone claims their rust code can do 250 microseconds, so I doubt you have much excuse aside from having “inefficient” code.(you are still fast, just not at the limit of your Language’s performance) only measuring my python algorithm, it is only able to finish in 32000 microseconds.
https://github.com/maneatingape/advent-of-code-rust/blob/main/src/year2024/day11.rs
however, now that I am looking at their
main.rs
file, they do calculate time for completion after process startup and only the algorithm.Yeah, disk loading definitely shouldn’t count if I was timing properly, I’m just lazy and dont want to do proper timing. :D
Most of my slowdown is in the hashmap, looks like that answer deconstructs the hashmap and builds it from a fastmap and a vec. Not sure I want to go down that road, at this stage.
Thanks again for your code and help :)