Why Python Is So Slow (And What Is Being Done About It)

@dominiquec · 7 months ago

Why Python Is So Slow (And What Is Being Done About It)

@[email protected] · edit-2 7 months ago

While I agree with most of what you say, I have a personal anecdote that highlights the importance of performance as a feature.

I have a friend that studies economics and uses python for his day to day. Since computer science is not his domain, he finds it difficult to optimize his code, and learning a new language (C in this case) is not really an option.

Some of his experiments take days to run, and this is becoming a major bottleneck in his workflow. Being able to write faster code without relying on C is going to have a significant impact on his research.

Of course, there are other ways to achieve similar results, for example another friend is working on DIAS a framework that optimizes pandas in the runtime. But, the point still stands, there are a tonne of researchers relying on python to get quick and dirty results, and performance plays a significant in that when the load of data is huge.

@[email protected] · 7 months ago

Sure, I was being mildly facetious, but pointing to a better pattern, the nature of python means it is, barring some extreme development, always going to be an order of magnitude slower than compiled. If you’re not going to write even a little C, then you need to look for already written C / FORTRAN / (SQL for data) / whatever that you can adapt to reap those benefits. Perhaps a general understanding of C and a good knowledge of what your Python is doing is enough to get a usable result from a LLM.

@[email protected] · edit-2 7 months ago

I have an alternative anecdote.

My coworker is a Ph.D in something domain specific, and he wrote an app to do some complex simulation. The simulation worked well on small inputs (like 10), but took minutes on larger inputs (~100), and we want to support very large inputs (1000+) but the program would get killed with out of memory errors.

I (CS background) looked over the code and pointed out two issues:

bubble sort in a hot path
allocated all working memory at the start and used 4D arrays, when 3D arrays and a 1D result array would’ve sufficed (O(n⁴) -> O(n³))

Both problems would have been solved had they used Python, but they used Fortran because “fast,” but it doesn’t have builtin sort or data structures. Python provides classes, sortable lists (with quicksort!), etc, so they could’ve structured their code better and avoided the architectural mistakes that caused runtime and memory to explode. Had they done that, I could’ve solved performance problems by switching lists to numpy arrays and throwing numba on the hot loops and been done in a day, but instead we spent weeks rewriting it (nobody understands Fortran, and that apparently included the original dev).

Python lets you focus on the architecture. Compiled languages often get you stuck in the weeds, especially if you don’t have a strong CS background and just hack things until it works.