I had registered for alerts on https://aurorasaurus.org/. But that alert was sent way too late for me (strongest lights were yesterday around 10-11 PM, and the notification was sent 2 AM today). But I was very lucky and just noticed the lights by accident on my way home.
GPT2 at 774m is considered a LLM. I wouldn't say there's much difference between that and 700m, or even 123M.
Having said that, looking up small language model these days returns tons of results calling 7B models small language models.
------
My understanding of small language models is that they're generally intended for specific purposes, like analysis and classification (whatever you'd call the text equivalent of image interrogation with clip models), translation, etc; that there small because they don't need to be big to do their intended functions, not because they're just smaller versions of bigger models.
Looking through the readme, it seems this is a "matching decomp" type project which does not bundle assets (i.e. those get pulled from your local copy during the build).
I'm no lawyer, and this is no claim that it is/isn't one way or the other, but Super Mario 64 had a CC0 1.0 licensed decomp in 2019 with PC port in 2020 and Nintendo vehemently chased compiled copies of the game being shared and videos of the ports on YouTube but never went after the actual source code repo for either the decomp or port (which do not contain any of the assets). Of course there is nothing saying Nintendo can't wait 6 years and then issue action (just look how long they put up with Yuzu/Ryujinx before going after the decryption and other arguments just before the Switch 2 launched), but they were certainly aware of it when they took action against the resulting binaries/videos and didn't try to touch the code repo yet for one reason or another.
I expect some big court case to happen about this style of project within the next decade. Maybe not as big as Google LLC v. Oracle America, Inc., but still one that makes the news a fews times and gives direct precedent rather than comparisons to similarish cases.
It's not legal unless the person had the rights to begin with. It may be legal for a clean room reimplementation, but not a decompilation project like this. iD/Apogee can totally request a takedown, so I wouldn't recommend republishing that...
Copyright violation. If you write a book and I translate it to a different language you own the copyright on my translation. (except poetry which is artistic enough that it cannot be translated and so your version inspires me but I can't just translate it ). Decompilation doesn't have enough creative work to call it anything other than a translation.
I'm not a lawyer. I'm reasonably sure I'm right so the above is good enough for discussion, but if you need legal advice see a lawyer.
Actually you should look up the info there, you actually don’t which is what a lot of fansubs rely on, they mostly only will own your translation if they chose to formally translate and publish commercially a translation in that country. If they don’t, you can distribute your translation for free. There is a lot of variability on this per country too, with very interesting laws in greece and germany in particular.
I think the point is that for most things, you don't need to call any external tools. Python's standard library comes already with lots of features, and there are many packages you can install.
I am always amused when folks rediscover the bad idea that is `pthread_cancel()` — it’s amazing that it was ever part of the standard.
We knew it was a bad idea at the time it was standardized in the 1990s, but politics — and the inevitable allure of a very convenient sounding (but very bad) idea — meant that the bad idea won.
Funny enough, while Java has deprecated their version of thread cancellation for the same reasons, Haskell still has theirs. When you’re writing code in IO, you have to be prepared for async cancellation anywhere, at any time.
This leads to common bugs in the standard library that you really wouldn’t expect from a language like Haskell; e.g. https://github.com/haskell/process/issues/183 (withCreateProcess async exception safety)
What's crazy is that it's almost good. All they had to do was make the next syscall return ECANCELED (already a defined error code!) rather than terminating the thread.
Musl has an undocumented extension that does exactly this: PTHREAD_CANCEL_MASKED passed to pthread_setcancelstate.
That would have been fantastic. My worry is if we standardized it now, a lot of library code would be unexpectedly dealing with ECANCELED from APIs that previously were guaranteed to never fail outside of programmer error, e.g. `pthread_mutex_lock()`.
Looking at some of my shipping code, there's a fair bit that triggers a runtime `assert()` if `pthread_mutex_lock()` fails, as that should never occur outside of a locking bug of my own making.
You can sort of emulate that with pthread_kill and EINTR but you need to control all code that can call interruptable sys calls to correctly return without retry (or longjmp/throw from the signal handler, but then we are back in phtread_cancel territory)
There's a second problem here that musl also solves. If the signal is delivered in between checking for cancelation and the syscall machine code instruction, the interrupt is missed. This can cause a deadlock if the syscall was going to wait indefinitely and the application relies on cancelation for interruption.
Musl solves this problem by inspecting the program counter in the interrupt handler and checking if it falls specifically in that range, and if so, modifying registers such that when it returns from the signal, it returns to instructions that cause ECANCELED to be returned.
Introspection windows from a interrupting context are a neat technique. You can use it to implement “atomic transaction” guarantees for the interruptee as long as you control all potential interrupters. You can also implement “non-interruption” sections and bailout logic.
It always surprised me that in the path of so many glibc functions are calls to open() items in /etc and then parse their output into some kind of value to use or possibly return.
The initialization of these objects should have been separate and then used as a parameter to the functions that operate on them. Then you could load the /etc/gai.conf configuration, parse it, then pass that to getaddrinfo(). The fact that multiple cancellation points are discreetly buried in the paths of these functions is an element of unfortunate design.
It’s extremely easy to write application code in Haskell that handles async cancellation correctly without even thinking about it. The async library provides high level abstractions. However your point is still valid as I do think if you write library code at a low level of abstraction (the standard library must) it is just as error prone as in Java or C.
`pthread_cancel()` is necessary _only_ to interrupt compute-only code without killing the entire process. That's it. The moment you try to use it to interrupt _I/O_ you lose -- you lose BIG.
there is a better way - in any unbounded compute loop, add some code to check for cancellation. it can be very very very cheap
this is not possible if you are calling third party code that you can't modify. in this case it's probably a better idea to run it on another process and use shared memory to communicate back results. this can even be done in an airtight sandboxed manner (browsers do this for example), something that can't really be done with threads
Right, and then you can kill it, but that's essentially what `pthread_cancel()` is. `pthread_cancel()` is just fine as long as that's all you use it for. The moment you go beyond interruption of 100% compute-bound work, you're in for a world of hurt.
Yes you are right it's applied to the parameters, but other models (like ngcm) applied it to the inputs. IMO it shouldn't make a huge difference main point is you max differences between models.