Leaking industry secrets is a much bigger concern that boosting productivity a little bit.
We’re talking about very specialized engineering work, it’s not something you can totally rely on a bot to do, though it might help sometimes, it’s fully understandable for specialized companies to want to ban GPT internally, until there’s a way for them to host a totally internal one.
I don’t think being a customer would work either, language models are still on the training, noone knows exactly how users queries are used, that’s a big no no for every company having to protect their secrets.
A self-hosted instance is a much better solution, if not the only “safe” one from that point of view, we’ll get there.
Interaction data does not become training data, unless you want it to.
I know that how a piece of software created using machine learning works, is an unknowable, but training data and interaction data are not the same thing. ChatGPT in particular is designed to be restored to a known good start state, only using query data for context awareness within a given sessions. Not to train itself.
Each query simply includes all previous queries, for context. That’s part of why it becomes increasingly erratic the longer a session goes on.
And unless you do train with a given piece of data, that data is not entered into the LLM in any way. Not even the undefined unknowable way.
We’re talking about very specialized engineering work,
We’re not though. This isn’t a policy preventing them from disclosing them from talking about specific company IP (which is almost certainly covered by existing NDAs already). This prevents them from using it internally at all.
I use ChatGPT at work all the time, usually for getting very specific information about products I have to integrate with, quickly parsing new API documentation, and learning about unfamiliar processes at a conceptual level before I have to dive deeper for a project. It’s more the context around which I’ll be building the specialized IP. It’s the sort of stuff I can learn via Googling (or sometimes Stack Exchange), but can learn it faster in a more targeted manner by asking detailed questions to the chatbot.
This prevents them from using it internally at all
That’s because it’s too easy to accidentally feed it information you shouldn’t.
Everyone working at big companies knows very well they must not talk about company’s stuff outside of it, but it’s too easy to underestimate how much info you actually give up with queries you think might be “innocent” ones.
Leaking industry secrets is a much bigger concern that boosting productivity a little bit.
We’re talking about very specialized engineering work, it’s not something you can totally rely on a bot to do, though it might help sometimes, it’s fully understandable for specialized companies to want to ban GPT internally, until there’s a way for them to host a totally internal one.
On this I agree entirely. The potential for corporate espionage because of unwitting employees using an LLM through unofficial means is huge.
At the very least, the corporation itself would have to be the customer, so that watertight terms might be negotiated, not the employee.
I don’t think being a customer would work either, language models are still on the training, noone knows exactly how users queries are used, that’s a big no no for every company having to protect their secrets.
A self-hosted instance is a much better solution, if not the only “safe” one from that point of view, we’ll get there.
Interaction data does not become training data, unless you want it to.
I know that how a piece of software created using machine learning works, is an unknowable, but training data and interaction data are not the same thing. ChatGPT in particular is designed to be restored to a known good start state, only using query data for context awareness within a given sessions. Not to train itself.
Each query simply includes all previous queries, for context. That’s part of why it becomes increasingly erratic the longer a session goes on.
And unless you do train with a given piece of data, that data is not entered into the LLM in any way. Not even the undefined unknowable way.
We’re not though. This isn’t a policy preventing them from disclosing them from talking about specific company IP (which is almost certainly covered by existing NDAs already). This prevents them from using it internally at all.
I use ChatGPT at work all the time, usually for getting very specific information about products I have to integrate with, quickly parsing new API documentation, and learning about unfamiliar processes at a conceptual level before I have to dive deeper for a project. It’s more the context around which I’ll be building the specialized IP. It’s the sort of stuff I can learn via Googling (or sometimes Stack Exchange), but can learn it faster in a more targeted manner by asking detailed questions to the chatbot.
That’s because it’s too easy to accidentally feed it information you shouldn’t.
Everyone working at big companies knows very well they must not talk about company’s stuff outside of it, but it’s too easy to underestimate how much info you actually give up with queries you think might be “innocent” ones.