A company not making self-serving predictions & studies.

  • entwine@programming.dev
    link
    fedilink
    arrow-up
    2
    ·
    5 hours ago

    In a randomized controlled trial, we examined 1) how quickly software developers picked up a new skill (in this case, a Python library) with and without AI assistance; and 2) whether using AI made them less likely to understand the code they’d just written.

    We found that using AI assistance led to a statistically significant decrease in mastery. On a quiz that covered concepts they’d used just a few minutes before, participants in the AI group scored 17% lower than those who coded by hand, or the equivalent of nearly two letter grades. Using AI sped up the task slightly, but this didn’t reach the threshold of statistical significance.

    Who designed this study? I assume it wasn’t a software engineer, because this doesn’t reflect real world “coding skills”. This is just a programming-flavored memory test. Obviously, the people who coded by hand remembered more about the library in the same way students who take notes by hand as opposed to typing tend to remember more.

    A proper study would need to evaluate critical thinking and problem solving skills using real world software engineering tasks. Maybe find some already-solved, but obscure bug in an open source project and have them try to solve it in a controlled environment (so they don’t just find the existing solution already).

  • PolarKraken@programming.dev
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 hours ago

    Interesting read and feels intuitively plausible. Also matches my growing personal sense that people are using these things wildly differently and having completely different outcomes as a result. Some other random disconnected thoughts:

    1. I’m surprised they’re publishing this, it seems to me like a pretty stark condemnation of the technology. Like what are the benefits they anticipate that made them decide this should be published, vs. quietly kept aside “pending further research”? Obviously people knowing how to use the tools better is good for longevity, but that’s just not what our idiotic investment cycles prioritize.

    2. I’m no scientist or expert in experimental design, but this seems like way too few people for the level of detail they’re bringing to the conclusions they’re drawing. That plus the way it all just feels intuitively plausible has a very “just so” feeling to the interpretation rather than true exploration. I mean, cmon - the behavioral buckets they are talking about range from 2-7 people apiece, most commonly just 4 individuals. “Four junior engineers behaved kinda like this and had that average outcome” MIGHT reflect a broader pattern but it sure doesn’t feel compelling or scientific.

    Nonetheless I selfishly enjoyed having my own vague subconscious observations validated lol, would like to see more of this (and anything else that seems to work against the crazy bubble being inflated).

  • Kissaki@programming.dev
    link
    fedilink
    English
    arrow-up
    12
    ·
    9 hours ago

    From the paper abstract:

    […] Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI.

    We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library.

    We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation – particularly in safety-critical domains.

  • eleijeep@piefed.social
    link
    fedilink
    English
    arrow-up
    7
    ·
    10 hours ago

    Discussion
    Our main finding is that using AI to complete tasks that require a new skill (i.e., knowledge of a new Python
    library) reduces skill formation.

    The erosion of conceptual understanding, code reading, and debugging skills that we measured among participants using AI assistance suggests that workers acquiring new skills should be mindful of their reliance on AI during the learning process.

    Among participants who use AI, we find a stark divide in skill formation outcomes between high-scoring interaction patterns (65%-86% quiz score) vs low-scoring interaction patterns (24%-39% quiz score). The high scorers only asked AI conceptual
    questions instead of code generation or asked for explanations to accompany generated code; these usage
    patterns demonstrate a high level of cognitive engagement.
    Contrary to our initial hypothesis, we did not observe a significant performance boost in task completion
    in our main study.

    Our qualitative analysis reveals that our finding is largely due to the heterogeneity in how participants decide to use AI during the task.

    These contrasting patterns of AI usage suggest that accomplishing a task with new knowledge or skills does not necessarily lead to the same productive gains as tasks that require only existing knowledge.
    Together, our results suggest that the aggressive incorporation of AI into the workplace can have negative impacts on the professional development workers if they do not remain cognitatively [ sic ] engaged. Given time constraints and organizational pressures, junior developers or other professionals may rely on AI to complete tasks as fast as possible at the cost of real skill development. Furthermore, we found that the biggest difference in test scores is between the debugging questions. This suggests that as companies transition to more AI code writing with human supervision, humans may not possess the necessary skills to validate and debug AI-written code if their skill formation was inhibited by using AI in the first place.

    • idriss@lemmy.mlOP
      link
      fedilink
      arrow-up
      4
      ·
      10 hours ago

      yep, they are selling learning models, but they are not pretending medical doctors will be out of work next week like OpenAI is doing

  • troi@techhub.social
    link
    fedilink
    arrow-up
    1
    ·
    12 hours ago

    @idriss Seems predictable to me. Programmers on the left or middle of some distribution identifying “good” programmers or engineers will use AI and be comfortable having completed some task. Those on the right of the distribution may or may not use AI but will insist on understanding what has been created.

    Now, an interesting question for me unrelated to the post is “what would be a good metric to identify
    really good programmers?”