• @cheese_greater
    link
    English
    109
    edit-2
    9 months ago

    I would be in trouble if this was a thing. My writing naturally resembles the output of a ChatGPT prompt when I’m not joke answering or shitposting.

    • @TropicalDingdong
      link
      English
      25
      edit-2
      9 months ago

      I would be in trouble if this was a thing. My writing naturally resembles the output of a ChatGPT prompt when I’m not joke answering.

      It’s not unusual for well-constructed human writing to resemble the output of advanced language models like ChatGPT. After all, language models like GPT-4 are trained on vast amounts of human text, and their main goal is to replicate and generate human-like text based on the patterns they’ve observed.

      /gpt-4

      • @cheese_greater
        link
        English
        39 months ago

        I need a lotta help, just not from a friend and about anything robot-related 😮‍💨

  • @cheesorist
    link
    English
    729 months ago

    they never did, they never will.

    • stevedidWHAT
      link
      English
      69 months ago

      Why tho or are you trying to be vague on purpose

      • bioemerl
        link
        fedilink
        729 months ago

        Because you’re training a detector on something that is designed to emulate regular languages closest possible, and human speech has so much incredible variability that it’s almost impossible to identify if someone or something has been written by an AI.

        You can detect maybe your typical generic chat GPT type outputs, but you can characterize a conversation with chat GPT or any of the other much better local models (privacy and control are aspects which make them better) and after doing that you can get radically human seeming outputs that are totally different from anything chat GPT will output.

        In short, given a static block of text it’s going to be nearly impossible to detect if it’s coming from an AI. It’s just too difficult to problem, and if you’re going to solve it it’s going to be immediately obsolete the next time someone fine tunes their own model

        • stevedidWHAT
          link
          English
          69 months ago

          Yeah this makes a lot of sense considering the vastness of language and it’s imperfections (English I’m mostly looking at you, ya inbred fuck)

          Are there any other detection techniques that you know of? Wb forcing AI models to have a signature that is guaranteed to be indentifiable, permanent, and unique for each tuning produced? It’d have to be not directly noticeable but easy to calculate in order to prevent any “distractions” for the users.

          • @Grimy
            link
            English
            189 months ago

            The output is pure text so you would have to hide the signature in the response itself. On top of being useless since most users slightly modify the text after receiving it, it would probably have a negative effect on the quality. It’s also insanely complicated to train that kind of behavior into an llm.

            • stevedidWHAT
              link
              English
              29 months ago

              Your implementation of my concept might be useless, but that doesn’t mean the concept is.

              One possible solution would be to look at how responses are structured, letter frequencies, etc. The flexibility/ambiguous nature natural language is that you can word things in many many different ways which allows for some creative meta techniques to accomplish a fingerprint.

              • Terrasque
                link
                fedilink
                English
                39 months ago

                It is a valid idea, and not impossible. When generating text, a language model gives a list of possible tokens… or more correctly it gives a weight to every possible token where most would be 0 weight. Then there’s multiple ways to pick the next token, from always picking top one to select random from top X tokens to mirostat and so on. You could probably do some extra weighting to embed a sort of signature. At some quality loss

              • Balder
                link
                English
                29 months ago

                The idea itself is valid, but wouldn’t that just make it more dangerous when malicious agents use the technology without fingerprinting?

                • stevedidWHAT
                  link
                  English
                  19 months ago

                  Cats out of the bag my friend. Just like the nuke, the ideas are always out there. Once it’s been discovered and shared that’s that.

                  We can huff and puff and come up with all the cute little laws we want but the fact of the matter is we know the recipe now. All we can do is dive deeper into the technology to understand it even better, make new findings and adapt as we always do.

          • bioemerl
            link
            fedilink
            109 months ago

            forcing AI models to have a signature that is guaranteed to be indentifiable, permanent, and unique for each tuning produced

            Either AI remains entirely in the hands of fucks like open AI or this is impossible and easily removed. AI should be a free common use tool, not an extension of corporate control.

            • stevedidWHAT
              link
              English
              49 months ago

              Agreed, such power should belong to everyone or has yet to be discovered. Even Oppenheimer knew, once the cats out of the bag…

            • roguetrick
              link
              fedilink
              29 months ago

              Owning the means of AI production huh? I guess anarchists will win after all.

              • bioemerl
                link
                fedilink
                69 months ago

                It’s no different than owning your computer. Something is absolutely a central and productivity boosting is artificial intelligence should not be kept in the hands of the few.

                The only way that it could be is through government intervention, you don’t need an anarchist to be against an open AI monopoly.

      • @[email protected]
        link
        fedilink
        English
        229 months ago

        Because AIs are (partly) trained by making AI detectors. If an AI can be distinguished from a natural intelligence, it’s not good enough at emulating intelligence. If an AI detector can reliably distinguish AI from humans, the AI companies will use that detector to train their next AI.

        • stevedidWHAT
          link
          English
          -29 months ago

          I’m not sure I’m following your argument here - you keep switching between talking about AI and AI detectors. Each of the below are just numbered according to the order of your prior responses as sentences:

          1. Can you provide any articles or blog posts from AI companies for this or point me in the right direction?
          2. Agreed
          3. Right…

          I’m having trouble finding your support for your claim

          • TheHarpyEagle
            link
            English
            89 months ago

            See Generative Adversarial Network (GAN). Basically, making new AI detectors will always be harder than beating current ones. AI detectors have to somehow find a new “tell”, the target AI need only train itself on the output of the detector to figure out how to trick it.

            • stevedidWHAT
              link
              English
              39 months ago

              ChatGPT isn’t a GAN network.

          • @dack
            link
            English
            79 months ago

            At a very high level, training is something like:

            • generate some output
            • give the output a score based on how much it looks like real human text
            • adjust the parameters slightly to improve the score
            • repeat

            Step #2 is also exactly what an “AI detector” does. If someone is able to write code that reliably distinguishes between AI and human text, then AI developers would plug it in to that training step in order to improve their AI.

            In other words, if some theoretical machine perfectly “knows” the difference between generated and human text, then the same machine can also be used to make text that is indistinguishable from human text.

            • stevedidWHAT
              link
              English
              3
              edit-2
              9 months ago

              Exactly right, I mentioned this in a comment elsewhere but basically we can’t have our cake and eat it too.

              We can’t have a perfect NL impersonator that can also be detected as not NL. (Best case, obviously things arent perfect for any AI model so technically detecting those mistakes could be used to help identify perhaps, but who’s to say what the FP rate would look like!)

              Ultimately the cat is out of the bag and I’m not quite sure there is anything we can do now. Ultimately some smart fingerprinting solution would be ideal but I just don’t know how feasible that would remain.

              Edit: source: I took a few 600 level ai classes in college and have made several of my own of varying types and what not

      • sebi
        link
        English
        -19 months ago

        Because generative Neural Networks always have some random noise. Read more about it here

        • stevedidWHAT
          link
          English
          39 months ago

          Isn’t that article about GANs?

          Isn’t GPT not a GAN?

          • @PetDinosaurs
            link
            English
            69 months ago

            It almost certainly has some gan-like pieces.

            Gans are part of the NN toolbox, like cnns and rnns and such.

            Basically all commercial algorithms (not just nns, everything) are what I like to call “hybrid” methods, which means keep throwing different tools at it until things work well enough.

            • stevedidWHAT
              link
              English
              3
              edit-2
              9 months ago

              The findings were for GAN models, not GAN like components though.

              • @PetDinosaurs
                link
                English
                19 months ago

                It doesn’t matter. Even the training process makes it pretty much impossible to tell these things apart.

                And if we do find a way to distinguish, we’ll immediately incorporate that into the model design in a GAN like manner, and we’ll soon be unable to distinguish again.

                • stevedidWHAT
                  link
                  English
                  09 months ago

                  Which is why hardcoded fingerprints/identifications are required to identify the individual as a speaker rather than as an AI vs Human. Which is what we’re ultimately agreeing on here outside of the pedantics of the article and scientific findings:

                  Trying to find the model who is supposed to be human as an AI is counter intuitive. They’re direct opposites if one works, both can’t be exist in this implementation.

                  The hard part will obviously be making sure that such a “fingerprint” wouldn’t be removable which will take some wild math and out of the box thinking I’m sure.

                  Tough problem!

          • bioemerl
            link
            fedilink
            29 months ago

            It’s not even about diffusion models. Adversarial networks are basically obsolete

  • ReallyKinda
    link
    fedilink
    589 months ago

    I know a couple teachers (college level) that have caught several gpt papers over the summer. It’s a great cheating tool but as with all cheating in the past you still have to basically learn the material (at least for narrative papers) to proof gpt properly. It doesn’t get jargon right, it makes things up, it makes no attempt to adhere to reason when it’s making an argument.

    Using translation tools is extra obvious—have a native speaker proof your paper if you attempt to use an AI translator on a paper for credit!!

    • @[email protected]
      link
      fedilink
      English
      149 months ago

      it makes things up, it makes no attempt to adhere to reason when it’s making an argument.

      It doesn’t hardly understand logic. I’m using it to generate content and it continuously will assert information in ways that don’t make sense, relate things that aren’t connected, and forget facts that don’t flow into the response.

      • @[email protected]
        link
        fedilink
        English
        10
        edit-2
        9 months ago

        As I understand it as a layman who uses GPT4 quite a lot to generate code and formulas, it doesn’t understand logic at all. Afaik, there is currently no rational process which considers whether what it’s about to say makes sense and is correct.

        It just sort of bullshits it’s way to an answer based on whether words seem likely according to its model.

        That’s why you can point it in the right direction and it will sometimes appear to apply reasoning and correct itself. But you can just as easily point it in the wrong direction and it will do that just as confidently too.

        • @Aceticon
          link
          English
          79 months ago

          It has no notion of logic at all.

          It roughly works by piecing together sentences based on the probability of the various elements (mainly words but also more complex) being there in various relations to each other, the “probability curves” (not quite probability curves but that’s a good enough analog) having been derived from the very large language training sets used to train them (hence LLM - Large Language Model).

          This is why you might get things like pieces of argumentation which are internally consistent (or merelly familiar segments from actual human posts were people are making an argument) but they’re not consistent with each other - the thing is not building an argument following a logic thread, it’s just putting together language tokens in common ways which in its training set were found associate with each other and with language token structures similar to those in your question.

          • Cosmic Cleric
            link
            English
            29 months ago

            That’s a great summary of how it works. Well done.

    • @[email protected]
      link
      fedilink
      English
      -27
      edit-2
      9 months ago

      Any teacher still issuing out of class homework or assignments is doing a disservice IMO.

      Of coarse people will just GPT it… you need to get them off the computer and into an exam room.

      • @SmoothLiquidation
        link
        English
        399 months ago

        GPT is a tool that the students will have access to their entire professional lives. It should be treated as such and worked into the curriculum.

        Forbidding it would be like saying you can’t use Photoshop in a photography class.

        • @[email protected]
          link
          fedilink
          English
          239 months ago

          It can definitely be a good tool for studying or for organizing your thoughts but it’s also easily abused. School is there to teach you how to take in and analyze information and chat AIs can basically do that for you (whether or not their analysis is correct is another story). I’ve heard a lot of people compare it to the advent of the calculator but I think that’s wrong. A calculator spits out an objective truth and will always say the same thing. Chat GPT can take your input and add analysis and context in a way that circumvents the point of the assignment which is to figure out what you personally learned.

          • @[email protected]
            link
            fedilink
            English
            -89 months ago

            Where it gets really challenging is that LLMs can take the assignment input and generate an answer that is actually more educational for the student than what they learned d in class. A good education system would instruct students in how to structure their prompts in a way that helps them learn the material - because the LLMs can construct virtually limitless examples and analogies and write in any kind of style, you can tailor them to each student with the correct prompts and get a level of engagement equal to a private tutor for every student.

            So the act of using the tool to generate an assignment response could, if done correctly and with guidance, be more educational than anything the student picked up in class - but if its not monitored, if students don’t use the tool the right way, it is just going to be seen as a shortcut for answers. The education system needs to move quickly to adapt to the new tech but I don’t have a lot of hope - some individual teachers will do great as they always have, others will be shitty, and the education departments will lag behind a decade or two as usual.

            • @[email protected]
              link
              fedilink
              English
              59 months ago

              Where it gets really challenging is that LLMs can take the assignment input and generate an answer that is actually more educational for the student than what they learned d in class.

              That’s if the LLM is right. If you don’t know the material, you have no idea if what it’s spitting out is correct or not. That’s especially dangerous once you get to undergrad level when learning about more specialized subjects. Also, how can reading a paper be more informative than doing research and reading relevant sources? The paper is just the summary of the research.

              and get a level of engagement equal to a private tutor for every student.

              Eh. Even assuming it’s always 100% correct, there’s so much more value to talking to a knowledgeable human being about the subject. There’s so much more nuance to in person conversations than speaking with an AI.

              Look, again, I do think that LLMs can be great resources and should be taken advantage of. Where we disagree is that I think the point of the assignment is to gain the skills to do research, analysis, and generally think critically about the material. You seem to think that the goal is to hand something in.

        • @MrMcGasion
          link
          English
          109 months ago

          I’ve been in photography classes where Photoshop wasn’t allowed, although it was pretty easily enforced because we were required to use school provided film cameras. Half the semester was 35mm film, and the other half was 3x5 graphic press cameras where we were allowed to do some editing - providing we could do the edits while developing our own film and prints in the lab. It was a great way to learn the fundamentals and learning to take better pictures in the first place. There were plenty of other classes where Photoshop was allowed, but sometimes restricting which tools can be used, can help push us to be better.

        • ReallyKinda
          link
          fedilink
          69 months ago

          Depends on how it’s used of course. Using it to help brainstorm phrasing is very useful. Asking it to write a paper and then editing and turning it in is no different than regular plagiarism imo. Bans will apply to the latter case and the former case should be undetectable.

      • ReallyKinda
        link
        fedilink
        109 months ago

        Even in college? I never had a college course that allowed you to work on assignments in class

        • @[email protected]
          link
          fedilink
          English
          19 months ago

          I studied engineering. Most classes were split into 2 hours of theory, followed by 2 hours of practical assignments. Both within the official class hours, so teachers could assist with the assignments. The best college-class structure by far imo.

  • @[email protected]
    link
    fedilink
    English
    35
    edit-2
    9 months ago

    I have to hand in a short report

    I wrote parts of it and asked chatgpt for a conclusion.

    So i read that, adjusted a few points. Added another couple points…

    Then rewrote it all in my own wording. (Chatgpt gave me 10 lines out of 10 pages)

    We are allowed to use chatgpt though. Because we would always have internet access for our job anyway. (Computer science)

    • @TropicalDingdong
      link
      English
      139 months ago

      I found out on the last screen of a travel grant application I needed a coverletter.

      I pasted in the requirements for the cover letter and what I had put in my application.

      I pasted the results in as the cover letter without review.

      I got the travel grant.

      • @Blurrg
        link
        English
        89 months ago

        Who reads cover letters? At most they are skimmed over.

        • @TropicalDingdong
          link
          English
          109 months ago

          Exactly. But they still need to exist. That’s what chat gpt is for. Letters, bullshit emails, applications. The shit that’s just tedious.

  • @Boddhisatva
    link
    English
    289 months ago

    OpenAI discontinued its AI Classifier, which was an experimental tool designed to detect AI-written text. It had an abysmal 26 percent accuracy rate.

    If you ask this thing whether or not some given text is AI generated, and it is only right 26% of the time, then I can think of a real quick way to make it 74% accurate.

    • @[email protected]
      link
      fedilink
      English
      149 months ago

      I feel like this must stem from a misunderstanding of what 26% accuracy means, but for the life of me, I can’t figure out what it would be.

      • @[email protected]
        link
        fedilink
        English
        10
        edit-2
        9 months ago

        Looks like they got that number from this quote from another arstechnica article ”…OpenAI admitted that its AI Classifier was not “fully reliable,” correctly identifying only 26 percent of AI-written text as “likely AI-written” and incorrectly labeling human-written works 9 percent of the time”

        Seems like it mostly wasn’t confident enough to make a judgement, but 26% it correctly detected ai text and 9% incorrectly identified human text as ai text. It doesn’t tell us how often it labeled AI text as human text or how often it was just unsure.

        EDIT: this article https://arstechnica.com/information-technology/2023/07/openai-discontinues-its-ai-writing-detector-due-to-low-rate-of-accuracy/

        • @cmfhsu
          link
          English
          2
          edit-2
          9 months ago

          In statistics, everything is based off probability / likelihood - even binary yes or no decisions. For example, you might say “this predictive algorithm must be at least 95% statistically confident of an answer, else you default to unknown or another safe answer”.

          What this likely means is only 26% of the answers were confident enough to say “yes” (because falsely accusing somebody of cheating is much worse than giving the benefit of the doubt) and were correct.

          There is likely a large portion of answers which could have been predicted correctly if the company was willing to chance more false positives (potentially getting studings mistakenly expelled).

    • @notatoad
      link
      English
      49 months ago

      it seemed like a really weird decision for OpenAI to have an AI classifier in the first place. their whole business is to generate output that’s good enough that it can’t be distinguished from what a human might produce, and then they went and made a tool to try and point out where they failed.

      • @Boddhisatva
        link
        English
        29 months ago

        That may have been the goal. Look how good our AI is, even we can’t tell if its output is human generated or not.

  • @doublejay1999
    link
    English
    279 months ago

    AI company says their AI is smart, but other companies are sell snake oil.

    Gottit

    • @canihasaccount
      link
      English
      269 months ago

      They tried training an AI to detect AI, too, and failed

      • @[email protected]
        link
        fedilink
        English
        59 months ago

        Typically for generative AI. I think during their training of the Nobel, they must have developed another model that detect if GPT produce a more natural language. I think that other model may reached the point where it couldn’t flag it with acceptable false positive.

    • Max Demon
      link
      fedilink
      English
      129 months ago

      True -

      1. Write points/summary
      2. Have AI expand in many words
      3. Post
      4. Reader uses AI to generate summarize post preferably in points
      5. Profit??
    • @[email protected]
      link
      fedilink
      English
      39 months ago

      Terence Tao just did a thread on Mathstodon talking about jow ChatGPT help him program a algorithm for looking for numbers.

  • @Matriks404
    link
    English
    229 months ago

    Did human-generated content really become so low quality that it is distinguishable from AI-generated content?

    • tech
      link
      English
      179 months ago

      Should I be able to detect whether or not this is an AI generated comment?

      • @nodsocket
        link
        English
        17
        edit-2
        3 months ago

        deleted by creator

    • @[email protected]
      link
      fedilink
      English
      89 months ago

      Not necessarily. It’s just that AI’s can’t tell the difference.

      Although I don’t know whether humans can.

    • @[email protected]
      link
      fedilink
      English
      89 months ago

      People kind of just suck at writing in general. It’s not a skill that’s valued so much, otherwise writers, editors, and proofreaders would be paid more.

    • @mind
      link
      English
      49 months ago

      deleted by creator

  • HelloThere
    link
    fedilink
    English
    209 months ago

    Regardless of if they do or don’t, surely it’s in the interests of the people making the “AI” to claim that their tool is so good it’s indistinguishable from humans?

    • stevedidWHAT
      link
      English
      159 months ago

      Depends if they’re more researchers or a business imo. Scientists generally speaking are very cautious about making shit claims bc if they get called out that’s their career really.

      • HelloThere
        link
        fedilink
        English
        6
        edit-2
        9 months ago

        It’s literally a marketing blog posted by OpenAI on their site, not a study in a journal.

      • @BetaDoggo_
        link
        English
        59 months ago

        OpenAI hasn’t been focused on the science since the Microsoft investment. A science focused company doesn’t release a technical report that doesn’t contain any of the specs of the model they’re reporting on.

      • @Zeth0s
        link
        English
        3
        edit-2
        9 months ago

        Few decades ago probably, nowadays “scientists” make a lot of bs claims to get published. I was in the room when a “scientist” publishing several nature per year asked to her student to write a paper for a research without any result in a way that it looked like it had something important for a relatively good IF publication.

        That day I decided I was done with academia. I had seen enough.

    • pewter
      link
      English
      09 months ago

      Yes, but it’s such a falsifiable claim that anyone is more than welcome to prove them wrong. There’s a lot of slightly different LLMs out there. If you or anyone else can definitively show there’s a machine that can identify AI writing vs human writing, it will either result in better AI writing or it would be an amazing breakthrough in understanding the limits of AI.

      • HelloThere
        link
        fedilink
        English
        29 months ago

        People like to view the problem as a paradox - can an all powerful God create a rock they cannot lift? - but I feel that’s too generous, it’s more marking your own homework.

        If a system can both write text, and detect whether it or another system wrote that text, then “all” it needs to do is change that text to be outside of the bounds of detection. That is to say, it just needs to convince itself.

        I’m not wanting to imply that that is easy, because it isn’t, but it’s a very different thing to convincing someone else, especially a human, that understands the topic.

        There is also a false narrative involved here, that we need an AI to detect AI which again serves as a marketing benefit to OpenAI.

        We don’t, because they aren’t that good, at least, not yet anyway.

  • @irotsoma
    link
    English
    189 months ago

    A lot of these relied on common mistakes that “AI” algorithms make but humans generally don’t. As language models are improving, it’s harder to detect.

    • Cethin
      link
      fedilink
      English
      149 months ago

      They’re also likely training on the detector’s output. That why they build detectors. It isn’t for the good of other people. It’s to improve their assets. A detector is used to discard some inputs it knows are written by AI so it doesn’t train on that data, which leads to it out competing the detection AI.

  • shameless
    link
    English
    169 months ago

    deleted by creator

    • Turun
      link
      fedilink
      English
      159 months ago

      Or, because you can’t rely on computers to tell you the truth. Which is exactly the issue with LLMs as well.

      • @sfgifz
        link
        English
        59 months ago

        You can’t rely on books or people tell you the truth either.

        • Turun
          link
          fedilink
          English
          29 months ago

          I was mostly referring to the top comment. If you need to write an essay on Hamlet, the book can in fact not lie, because the entire exercise is to read the book and write about the contents of it.

          But in general, you are right. (Which is why it is proper journalistic procedure to talk to multiple experts about a topic you write about. Also a good article does not present a forgone conclusion, but instead let’s readers form their own opinion on a topic by providing the necessary context and facts without the author’s judgement. LLMs as a one-stop-shop do not provide this and are less reliable than listening to a single expert would be)

        • @atrielienz
          link
          English
          19 months ago

          Which is why bibliographies exist.

  • @Absolutemehperson
    link
    English
    129 months ago

    mfw just asking ChatGPT to write an undetectable essay.

    Later, losers!

  • @Jargus
    link
    English
    6
    edit-2
    8 months ago

    deleted by creator

    • @robbotlove
      link
      English
      69 months ago

      this comment could have been written in 2005 and still have been true.

    • @[email protected]
      link
      fedilink
      English
      29 months ago

      AI might democratize grifting. You no longer will have to have the resources that Russia and China have devoted to this kind of thing. Anyone will be able to generate vast amounts of fake inflammatory rhetoric.

      Then once there’s a 99.9% chance that the person you’re talking to on social media is an AI, people might realize how stupid it is to believe anything they read on the internet.

  • @[email protected]
    link
    fedilink
    English
    39 months ago

    Aren’t there very few student priced ai writers? And isn’t the writing done on their servers? And aren’t they saving all the outputs?

    Can’t the ai companies sell to schools the ability to check paper submissions against recent outputs?

    • @dyc3
      link
      English
      99 months ago

      Chatgpt 3.5 is free. Can’t get more student priced than that.

      Regarding the second part about outputs: that’s not practical. Suppose you ignore students running their own LLMs offline on their gaming gpus, where these corps wouldn’t have access to the info. It’s still wildly impractical because students can paraphrase LLM output into something that doesn’t look like the original output.

      • @[email protected]
        link
        fedilink
        English
        29 months ago

        Chatgpt 3.5 is free. Can’t get more student priced than that.

        Yeah, my point was I don’t think there are many offering the service for free. And they are probably looking for revenue streams.

        Suppose you ignore students running their own LLMs offline on their gaming gpus

        I actually feel like this is the one that shouldn’t be ignored. But I don’t have a good sense of the computational power vs quality output.

        It’s still wildly impractical because students can paraphrase LLM output into something that doesn’t look like the original output.

        At least doing that is likely to result in the student internalizing the information to some degree. It’s also not so different (not at all different?) from the most benign academic dishonesty that existed when I was a student.

        One issue with the approach I suggested is the copyright issue of profs submitting students’ original work for AI processing without understanding/caring about copyright implications.

        • @dyc3
          link
          English
          19 months ago

          And they are probably looking for revenue streams.

          Yeah of course. As it stands right now gpt 3.5 is free, but gpt 4.0, which has been demonstrated to produce better output and get do more, costs a monthly subscription.

          At least doing that is likely to result in the student internalizing the information to some degree.

          This is a good point, and I agree.