Is there some formal way(s) of quantifying potential flaws, or risk, and ensuring there’s sufficient spread of tests to cover them? Perhaps using some kind of complexity measure? Or a risk assessment of some kind?

Experience tells me I need to be extra careful around certain things - user input, code generation, anything with a publicly exposed surface, third-party libraries/services, financial data, personal information (especially of minors), batch data manipulation/migration, and so on.

But is there any accepted means of formally measuring a system and ensuring that some level of test quality exists?

  • @[email protected]
    link
    fedilink
    English
    34
    edit-2
    2 years ago

    Pit Mutation testing is useful. It basically tests how effective your tests are and tells you missed conditions that aren’t being tested.

    For Java: https://pitest.org

    Edit: corrected to the more general name instead of a specific implementation.

    • @[email protected]
      link
      fedilink
      English
      82 years ago

      The most extreme examples of the problem are tests with no assertions. Fortunately these are uncommon in most code bases.

      Every enterprise I’ve consulted for that had code coverage requirements was full of elaborate mock-heavy tests with a single Assert.NotNull at the end. Basically just testing that you wrote the right mocks!

      • @[email protected]
        link
        fedilink
        English
        62 years ago

        That’s exactly the sort of shit tests mutation testing is designed to address. Believe me it sucks when sonar requires 90% pit test pass rate. Sometimes the tests can get extremely elaborate. Which should be a red flag for design (not necessarily bad code).

        Anyway I love what pit testing does. I hate being required to do it, but it’s a good thing.

    • @[email protected]OP
      link
      fedilink
      English
      52 years ago

      This is really interesting, I’ve never heard of such an approach before; clearly I need to spend more time reading up on testing methodologies. Thank you!

    • @[email protected]
      link
      fedilink
      English
      42 years ago

      I’d never heard of mutation testing before either, and it seems really interesting. It reminds me of fuzzing, except for the code instead of the input. Maybe a little impractical for some codebases with long build times though. Still, I’ll have to give it a try for a future project. It looks like there’s several tools for mutation testing C/C++.

      The most useful tests I write are generally regression tests. Every time I find a bug, I’ll replicate it in a test case, then fix the bug. I think this is just basic Test-Driven-Development practice, but it’s very useful to verify that your tests actually fail when they should. Mutation/Pit testing seems like it addresses that nicely.

      • @[email protected]
        link
        fedilink
        English
        22 years ago

        We are running the above pi tests with an extra (Gradle based) build plugin so that it only runs mutations for the changed lines in that pull request. That drastically reduces runtime and still ensures that new code is covered to the mutation test level we want. Maybe something similar can be done for C or C++ projects.

        • @[email protected]
          link
          fedilink
          English
          22 years ago

          I’m currently working on a C++ project that takes about 10 minutes to do a clean build (Plus another 5 minutes in CI to actually run the tests). Incremental builds are set up, and work quite well, but any header changes can easily result in a 5 minute incremental build.

          As much as I’d like to try, I don’t see mutation testing being worthwhile for this project outside of maybe a few isolated modules that could be tested independently. It’s a highly interconnected codebase, and I’ve personally reviewed (or written) every test, so I already know they’re of fairly high quality, but it would be nice to be able to measure.

    • robotdna
      link
      English
      32 years ago

      Does something like this exist for Python?

  • @[email protected]
    link
    fedilink
    English
    182 years ago

    80%. Much beyond that and you get into a decreasing return on the investment of making the tests.

    • @[email protected]
      link
      fedilink
      English
      10
      edit-2
      2 years ago

      I think this is a good rule-of-thumb in general. But I think the best way to decide on the correct coverage is to go through uncovered code and make a conscious decision about it. In some classes it may be OK to have 30%, in others one wants to go all the way up to 100%. That’s why I’m against having a coverage percentage as a build/deployment gate.

      • @[email protected]
        link
        fedilink
        English
        52 years ago

        Bingo, exactly this. I said 80 because that’s typically what I see our projects get to after writing actually useful tests. But if your coverage is 80% and it’s all just tests verifying that a constant is still set to whatever value, then yeah, thats a useless metric.

    • @[email protected]
      link
      fedilink
      English
      12 years ago

      The 80-20 rule is for everything. Don’t waste 80% of effort to get that last 20% of coverage.

  • @[email protected]
    link
    fedilink
    English
    92 years ago

    But is there any accepted means of formally measuring a system and ensuring that some level of test quality exists?

    Formally? No, this is basically impossible by Rice’s Theorem. There is not even a guarantee that if you have 100% test coverage, the program is good (the tests could be flawed).

    This is just a natural limitation of turing completeness. You can’t decide these properties while also having full computational power. In order to decide such things, you need a less powerful mode of computation (something not turing complete) that can be analyzed more thoroughly and with more guarantees.

    • @[email protected]OP
      link
      fedilink
      English
      22 years ago

      That makes sense, thank you. Yes, it’s specifically “test quality” I’m looking to measure, as 100% coverage is effectively meaningless if the tests are poor.

      • @[email protected]
        link
        fedilink
        English
        52 years ago

        Yea I’m afraid the only real way to “measure” that is to read through the tests and the code and make a good ol human value judgement on the state of the code and tests. But it won’t give you a number.

  • snoweM
    link
    fedilink
    English
    92 years ago

    Mutation testing. Someone else mentioned it as PIT testing, but its actual name is mutation testing. It accomplished exactly what you’re looking for here.

  • fades
    link
    fedilink
    English
    62 years ago

    So true lol. Mgmt just announced a directive at my work last week that code must have 95-100% coverage.

    Meanwhile they hire contractors from india that write the dumbest, most useless tests possible. I’ve worked with many great Indian devs but the contractors we use today all seem like a step down in quality. More work for me I guess

    • @phoneymouse
      link
      English
      42 years ago

      It’s always fun to hear management pushing code coverage. It’s a fairly useless metric. It’s easy to get coverage without actually testing anything. I’ve seen unit tests that consist simply of starting the whole program and running it without asserting anything or checking outputs.

  • Zoe Codez
    link
    fedilink
    English
    42 years ago

    There are tools to detail the code coverage if your tests. I’ve worked with Istanbul in the past, and it’s helped to point out parts of the code that could use more attention

    https://istanbul.js.org/

    • @[email protected]OP
      link
      fedilink
      English
      22 years ago

      I use coverage tools like nyc/c8, but I can easily get 100% coverage on buggy, exploitable, and unstable code. You can have two projects, both with 100% coverage, and one be a shit show and the other be rock solid - so I was wondering if there’s a way to measure quality of tests, or to identify code that really needs extra attention (despite being 100%). Mutation testing has been suggested and that’s really interesting, I’m going to give it a go tomorrow and see what it throws up!

  • @lordxakio
    link
    English
    42 years ago

    I am not sure what the common/agreed upon rules are. Seems like it depends on the team lead or manager to decide. Some orgs have better engineers, experience, systems and others don’t.

    I used to follow the 100% coverage because I was told to do so in my start. I found myself chasing semi-colons rather than null references. Luckily, I had a team mate with which we argued a lot about what we did, do, and will do and he helped me. (In a friendly manner, not like Dinesh and Gilfoyd from Silicon Valley).

    Now, I start my tests by going over how the user will use it, e.g. the happy path. Then happy path away. It seems to cover most cases. It helps if you know the business too. (Think messaging system that is intentionally and strictly simple, or one that has a lot of Unicode and language support… fucking emojis hurt me cause I forget they exist even though I use them all the time, I always forget).

    Alas, no matter what, I always miss some test case or a very imaginative user will find a way to show me how wrong I am.

    In the end, I think the best, no matter how big or small the project you’re building is, to do many small PRs (with tests) to your team. This way, things are tested in increments and helps prevent PR burnouts. This I need to get better at myself.

  • @MR_GABARISE
    link
    English
    32 years ago

    On top or, better, in addition to mutation testing, some amount of property-based testing is always great where it counts.

    • @[email protected]
      link
      fedilink
      English
      22 years ago

      Additionally I like roundtrip tests.

      For example we have two data formats and support conversion between both of them.

      So I have tests that convert from A to B and back to A. Then I can go and call assertEquals on them.

      It’s a very cheap test, that tests all functionality of the conversion itself.

  • @[email protected]
    link
    fedilink
    English
    32 years ago

    Different applications require different tests, so no measure is going to please everyone. If you’re making embedded devices for an airplane, the buyer might ask you to provide a formal proof that the program works. In contrast, web apps tend to simply use end users as testers, since it’s cheaper.

  • UFO
    link
    fedilink
    English
    32 years ago

    I’d like to see state space coverage instead of line coverage. That, at least, catches silly “100%” cases.

    I don’t know of a tool that provides this metric. I don’t even think such a thing could be made for most languages. still, useful to think about when reviewing code.

  • bluGill
    link
    fedilink
    22 years ago

    I.prefer to count and report total tests run as part of each build. We get impressive large numbers, but there is no way to put any specific goal on the exact number, we can always go higher.

  • @[email protected]
    link
    fedilink
    English
    22 years ago

    This might not be exactly what you’re looking for, but there is verifiably correct software. You can use proof assistants or work in limited computational models (i.e. always-terminating, non-Turing-complete).

    One example: https://statebox.org/what-is/

  • Sibbo
    link
    fedilink
    English
    22 years ago

    Maybe fraction between money spent on writing code versus money spent on testing code?