Any tips to help a scientist become a better programmer?

@[email protected] · 11 months ago

Any tips to help a scientist become a better programmer?

MxM111 · 11 months ago

As one physicist to another, the most important thing in the code are long variable names (descriptive) and comments.

We usually do not do multi-people multi year projects, so all other comments in this page especially the ones coming from programmers are not that relevant. Classes are cool, but they are not needed and often obscure clarity of algorithmic/functional programming.

S. Wolfram (creator of Mathematica) said something along these lines (paraphrasing) if you are writing real code in Mathematica - you are doing something wrong.

@abhibeckert · edit-2 11 months ago

We usually do not do multi-people multi year projects

Seriously - why not?

Say you’re doing an experiment, wouldn’t it be nice if someone else could repeat that experiment? Maybe in 3 years? in 30 years? in 3,000 years time? And maybe they could use your code instead of writing it themselves and possibly getting it wrong?

If something is worth doing, then it is worth doing properly.

Classes are cool, but they are not needed and often obscure clarity

I write code all day professionally. A lot of my code doesn’t use classes. I agree they often “obscure clarity”.

But sometimes they do the opposite - they make things crystal clear. It’s important to know how to use classes and even more important to know when to use them. I guarantee some of the work you do could benefit from a few simple classes. They don’t need to be complex - I wrote a class the earlier today that is only four lines of code. And yes, a class was apropriate.

Fushuan [he/him] · 11 months ago

The reason they don’t do multi people and multi year coding projects has nothing to do with repeatability of the experiments, most science coding is done through simple-ish code that uses existing libraries, doesn’t code them. That code is usually stored in notebooks (jupyter, zeppelin) or simple scripts.

For science code, it usually falls in the realm of data analysis, and as a data engineer, let me tell you that the analysis part of the job is usually very ad-hoc modifications of the script and live coding though notebooks and such.

The part where whatever conclusion of the research is then transformed into a functioning application, taking care of naming conventions, the architecture of the system where the input data, the transformations, the postprocessing and such is done, is usually done by another team of dedicated data engineers or software developers.

I guess that it would be helpful for the analysis part to have standardized templates for data extraction and such, but usually the tools used in the research portion of the process and the implementation portion are completely different (python with tensorflow vs C++ with openvino or whatever cloud based) so it’s not really fair to load the architecture design since the beginning.

Turun · 11 months ago

You know how changing requirements is the bane of real™ production grade™ software?

In science requirements change all the time. You write some 50-100 lines to plot your results. You realize that the effect is not visible, so you go back to the lab, change 5 variables and run the test again. Some quick code changes and you see the effect. Perfect. Now you do the measurement as a function of temperature. You adapt your script, you indent the data processing code to turn your list of files into a list of characteristics parameters and adapt the plotting. You run the experiment for the three samples you have prepared and compare their plots. Some more experiments tuning and corresponding script tuning is required. You take the characteristic parameters that your code (grew to 200 lines now, but whatever) calculated and write a new script to take that array and plot it nicely.

Now someone wants to repeat the experiment 4 years later. The measurement equipment changed and the data format is slightly different. It’s impossible to document the exact state of the hardware in your code anyway. They are actually interested in a different effect and want to plot that as well, but they need the effect/characteristics parameters that are shown by your code as a sanity check. They need to rewrite 125 of the 200 lines.

There never is a finished product that is worth maintaining long term. Everyone using the script has to understand the domain precisely anyway. Is it worth it to reuse the old code when you need to rewrite more than half of it anyway?

Don’t get me wrong, code reuse does happen. But it’s much more “oh, I wrote that three months ago somewhere else” ctrl-c ctrl-v. “Ok, now I need to change five lines in this function to adapt to the new thing I’m trying to do.” It makes absolutely no sense to write that function in a abstracted way. Every time you use it the requirements changed and the abstraction is no longer valid anyway.

MxM111 · 11 months ago

They need to rewrite 125 of the 200 lines.

And I guarantee, it is much easier to write 200 new lines than change 125 out of 200 lines in somebodies code. No matter how nicely that code is written.

UFO · 11 months ago

Great potatoes… This is not very good advice. Ok for prototypes that are intended to be discarded shortly after writing. Nothing more.

Turun · 11 months ago

Yes, those prototypes are the goal here.

UFO · 11 months ago

Cool! Have fun! I wouldn’t worry about a lot of code quality opinions then. Especially if somebody is looking at prototypes and thinking they are not prototypes haha