Public facing open-source code & media is going to be learned by language models...

Public facing open-source code & media is going to be learned by language models because they're exposed to them. That's the simple truth. Nothing can stop that, not unless all public repos are made private. Everyone has access to the ability to create their own GPT, thanks to open-source. OpenAI is not actually very far ahead of open source anymore.

The US seems well enough informed. As mentioned in the following report "AI tools are diffusing broadly and rapidly" and "AI is the quintessential “dual use” technology—it can be used for civilian and military purposes.".

https://www.nscai.gov/wp-content/uploads/2021/03/Full-Report...

I'm fully expecting that if I begin a story and put it on my blog or on github, and if I go away for a couple years, I'll see it completed for me when I return. I can use foresight to my advantage or I can pretend like it's still the 1990s as if placing some text at the top of the code I exposed publicly is going to prevent people from training on it.

One thing for sure though, I don't think a large company such as Microsoft should be profiting from training their language model on open-source code.

The best way to release Copilot in my opinion would be to make the entire thing open source and have separate models, even a private paid-for model so long it's trained on their own code.

An open source model trained on code for specific licenses sounds fine, but then the model should also follow that same license as the code it was trained on.

There's just something deeply unsettling about having a computer complete your thoughts for you without being able to question how or why.