Our gitea instance had roughly five minutes of downtime in total over the past year, just to upgrade gitea itself. All in the middle of the night. How much downtime has GitHub seen over the same period, and how many people's work was affected by that?
I've been hosting a git service for quite a while now and it's maybe around half an hour per year of maintenance work. It's totally worth it in my opinion, it's so much better on almost every way ... One big reason is decentralization. Full control of data, change what you want, then things like the current npmjs attack show the downsides of having everyone using the same thing, and so much more
The concerns are valid, but I'd like to point out that managing all that isn't as frightening as it sounds.
If you do small scale (we're talking self-hosted git here after all), all these are either non-issue or a one-time issue.
Figuring out backups and firewall is the latter. Once figured out, you don't worry about that at all. Figuring these out isn't rocket science either.
As for the former. For minimum maintenance, I often run services in docker containers - one service (as in compose stack) per Debian VM. This makes OS upgrades very stable and, given docker is the only "3rd-party" package, they are very unlikely to break the system. That allows to set unattended-upgrades to upgrade everything.
With this approach most of the maintenance comes from managing containers' versions. It's a good practice to use fixed containers' versions which does mean there is some labor involved when it comes to upgrading them, but you don't always have to stick to the exact version. Many containers have tags for major versions and these are fairly safe to rely on for automatic upgrades. The manual part of the upgrades when a new major release comes out can be a really rare occasion.
If your services' containers don't do such versioning (GitLab and YouTrack are the examples of that), then you aren't as lucky, but bumping a version every few months or so shouldn't be too laborsome either.
Now, if DDoS is a concern, there is probably already the staff in place to deal with that. DDoS is mostly for popular public services to worry about, not for a private Gitea instance. Such pranks are costly to randomly poke around and require some actual incentive.
But why keep a private instance out in the open anyway? Put it behind a VPN and then you don't really have to account for security and upgrades as much.
One answer might be to avoid LLMs training off the intellectual property that your humans typed out for you. But as LLM code generation tools take off, it's a losing battle for most orgs to prevent staff from using LLMs to generate the code in the first place, so this particular advantage is being subverted.
Especially as self-hosting means loosing the community aspect of GitHub. Every potential contributor already has an account. Every new team member already knows how to use it.
You’re assuming people are self-hosting open source projects on their gut servers. That’s often not the case. Even if it were, GitHub irked a lot of people using their code to train Copilot.
I self-host gitea. It took maybe 5 minutes to set up on TrueNAS and even that was only because I wanted to set up different datasets so I could snapshot independently. I love it. I have privacy. Integrating into a backup strategy is quite easy —- it goes along with the rest of my off-site NAS backup without me needing to retain local clones on my desktop. And my CI runners are substantially faster than what I get through GitHub Actions.
The complexity and maintenance burden of self-hosting is way overblown. The benefits are often understated and the deficiencies of whatever hosted service left unaddressed.
When I publish open source code, I don't mind if people or companies use it, or maybe even learn from it. What I don't like is feeding it into a giant plagiarism machine that is perpetuating the centralization of power on the internet.
to me plagiarism is a 100% copy of intellectual property or maybe a high percentage, like 80%+
LLMs don't store the code, only the probability chains of tokens (words). AFAIK this is not plagiarism.
I remember the later 2000s, when a German company called "Rocket Internet" was copycatting companies like AirBnB, Zappos and others. Many consider this lame and some kind of moral freeloading, it's not prohibited.
Whether you agree with why someone may be opting to self-host a git server is immaterial to why they've done so. Likewise, I'm not going to rehash the debate over fair use vs software licenses. Pretending like you don't understand why someone that published code under a copyleft license is displeased with it being locked in a proprietary model being used to build proprietary software is willful ignorance. But, again, it makes no difference whether you're right or they're right; no one is obligated to continue pushing open source code to GitHub or any other service.
A better question is why does it take any time to maintain a tool like this? I spend zero time maintaining my open-source browser (Firefox). It just periodically updates itself and everything just works. I maybe spend a bit of time maintaining my IDE by updating settings and installing plugins for it, but nothing onerous.
A tool like this is not fundamentally more complex than a browser or a full-fledged IDE.
I am using 14 different extensions in Firefox. I don't think any of them have broken due to a Firefox update for at least the past 3 years.
The only maintenance I have had to do was when the "I don't care about cookie's" extension got sold out, so had to switch to a fork [1]. That was 2-3 years ago.
because:
1) privacy - don't want projects leaving a closed circle of people
2) compliance - you have to self-host and gitlab/github are way too expensive for what they provide when open-source alternatives exist
3) you just want to say fuck you to coorperate (nothing is free) and join the clippy movement.