Hacker Newsnew | past | comments | ask | show | jobs | submit | albertzeyer's commentslogin


I had registered for alerts on https://aurorasaurus.org/. But that alert was sent way too late for me (strongest lights were yesterday around 10-11 PM, and the notification was sent 2 AM today). But I was very lucky and just noticed the lights by accident on my way home.


v0: 16M Parameters

v0.5 123M Parameters

v1: 700M Parameters

v2mini-eval1: 300M Parameters

I would not call this LLM. This is not large. It's just a normal-sized LM. Or even small.

(It's also not a small LLM.)


GPT2 at 774m is considered a LLM. I wouldn't say there's much difference between that and 700m, or even 123M.

Having said that, looking up small language model these days returns tons of results calling 7B models small language models.

------

My understanding of small language models is that they're generally intended for specific purposes, like analysis and classification (whatever you'd call the text equivalent of image interrogation with clip models), translation, etc; that there small because they don't need to be big to do their intended functions, not because they're just smaller versions of bigger models.


Very nice!

This is released under GPL.

I wonder, who is K1n9_Duk3? Does he have the rights to actually release this, and put it under GPL?

What does "reconstructed" mean? Is this disassembled? And if so, is it really ok to put this under GPL then?


Looking through the readme, it seems this is a "matching decomp" type project which does not bundle assets (i.e. those get pulled from your local copy during the build).

I'm no lawyer, and this is no claim that it is/isn't one way or the other, but Super Mario 64 had a CC0 1.0 licensed decomp in 2019 with PC port in 2020 and Nintendo vehemently chased compiled copies of the game being shared and videos of the ports on YouTube but never went after the actual source code repo for either the decomp or port (which do not contain any of the assets). Of course there is nothing saying Nintendo can't wait 6 years and then issue action (just look how long they put up with Yuzu/Ryujinx before going after the decryption and other arguments just before the Switch 2 launched), but they were certainly aware of it when they took action against the resulting binaries/videos and didn't try to touch the code repo yet for one reason or another.

I expect some big court case to happen about this style of project within the next decade. Maybe not as big as Google LLC v. Oracle America, Inc., but still one that makes the news a fews times and gives direct precedent rather than comparisons to similarish cases.


It's not legal unless the person had the rights to begin with. It may be legal for a clean room reimplementation, but not a decompilation project like this. iD/Apogee can totally request a takedown, so I wouldn't recommend republishing that...


> It's not legal

Based on what? Afaik decompilation is a grey area and projects that enforce clean-room design do it to stay out of this grey area.


Copyright violation. If you write a book and I translate it to a different language you own the copyright on my translation. (except poetry which is artistic enough that it cannot be translated and so your version inspires me but I can't just translate it ). Decompilation doesn't have enough creative work to call it anything other than a translation.

I'm not a lawyer. I'm reasonably sure I'm right so the above is good enough for discussion, but if you need legal advice see a lawyer.


> If you write a book and I translate it to a different language you own the copyright on my translation.

Not quite true. You and I both own the copyright to your translation. Neither of us can publish it without the other's permission.


Actually you should look up the info there, you actually don’t which is what a lot of fansubs rely on, they mostly only will own your translation if they chose to formally translate and publish commercially a translation in that country. If they don’t, you can distribute your translation for free. There is a lot of variability on this per country too, with very interesting laws in greece and germany in particular.


This project compiles to identical binary to the original one. Distributing this source is the same as distributing the game in practice.


I think the point is that for most things, you don't need to call any external tools. Python's standard library comes already with lots of features, and there are many packages you can install.


The first linked article was recently discussed here: RIP pthread_cancel (https://news.ycombinator.com/item?id=45233713)

In that discussion, most of the same points as in this article were already discussed, specifically some async DNS alternatives.

See also here the discussion: https://github.com/crystal-lang/crystal/issues/13619


I am always amused when folks rediscover the bad idea that is `pthread_cancel()` — it’s amazing that it was ever part of the standard.

We knew it was a bad idea at the time it was standardized in the 1990s, but politics — and the inevitable allure of a very convenient sounding (but very bad) idea — meant that the bad idea won.

Funny enough, while Java has deprecated their version of thread cancellation for the same reasons, Haskell still has theirs. When you’re writing code in IO, you have to be prepared for async cancellation anywhere, at any time.

This leads to common bugs in the standard library that you really wouldn’t expect from a language like Haskell; e.g. https://github.com/haskell/process/issues/183 (withCreateProcess async exception safety)


What's crazy is that it's almost good. All they had to do was make the next syscall return ECANCELED (already a defined error code!) rather than terminating the thread.

Musl has an undocumented extension that does exactly this: PTHREAD_CANCEL_MASKED passed to pthread_setcancelstate.

It's great and it should be standardized.


That would have been fantastic. My worry is if we standardized it now, a lot of library code would be unexpectedly dealing with ECANCELED from APIs that previously were guaranteed to never fail outside of programmer error, e.g. `pthread_mutex_lock()`.

Looking at some of my shipping code, there's a fair bit that triggers a runtime `assert()` if `pthread_mutex_lock()` fails, as that should never occur outside of a locking bug of my own making.


You can sort of emulate that with pthread_kill and EINTR but you need to control all code that can call interruptable sys calls to correctly return without retry (or longjmp/throw from the signal handler, but then we are back in phtread_cancel territory)


There's a second problem here that musl also solves. If the signal is delivered in between checking for cancelation and the syscall machine code instruction, the interrupt is missed. This can cause a deadlock if the syscall was going to wait indefinitely and the application relies on cancelation for interruption.

Musl solves this problem by inspecting the program counter in the interrupt handler and checking if it falls specifically in that range, and if so, modifying registers such that when it returns from the signal, it returns to instructions that cause ECANCELED to be returned.

Blew my mind when I learned this last month.


Introspection windows from a interrupting context are a neat technique. You can use it to implement “atomic transaction” guarantees for the interruptee as long as you control all potential interrupters. You can also implement “non-interruption” sections and bailout logic.


In particular you need to control the signal handlers. You can't do that easily in a library.


`pthread_cancel()` was meant for interrupting long computations, not I/O.


It always surprised me that in the path of so many glibc functions are calls to open() items in /etc and then parse their output into some kind of value to use or possibly return.

The initialization of these objects should have been separate and then used as a parameter to the functions that operate on them. Then you could load the /etc/gai.conf configuration, parse it, then pass that to getaddrinfo(). The fact that multiple cancellation points are discreetly buried in the paths of these functions is an element of unfortunate design.


It’s extremely easy to write application code in Haskell that handles async cancellation correctly without even thinking about it. The async library provides high level abstractions. However your point is still valid as I do think if you write library code at a low level of abstraction (the standard library must) it is just as error prone as in Java or C.


`pthread_cancel()` is necessary _only_ to interrupt compute-only code without killing the entire process. That's it. The moment you try to use it to interrupt _I/O_ you lose -- you lose BIG.


there is a better way - in any unbounded compute loop, add some code to check for cancellation. it can be very very very cheap

this is not possible if you are calling third party code that you can't modify. in this case it's probably a better idea to run it on another process and use shared memory to communicate back results. this can even be done in an airtight sandboxed manner (browsers do this for example), something that can't really be done with threads


Right, and then you can kill it, but that's essentially what `pthread_cancel()` is. `pthread_cancel()` is just fine as long as that's all you use it for. The moment you go beyond interruption of 100% compute-bound work, you're in for a world of hurt.


IO can fail at any point though, so that’s not particularly bad.


It's particularly bad because thread interruptions are funneled into the same system as IO errors, so it's easy to consume them by mistake.

Java has that same issue.


Which paper is that?


This idea is often used for self-supervised learning (SSL). E.g. see DINO (https://arxiv.org/abs/2104.14294).


The random noise is added to the model parameters, not the inputs, or not?

This reminds me of variational noise (https://www.cs.toronto.edu/~graves/nips_2011.pdf).

If it is random noise on the input, it would be like many of the SSL methods, e.g. DINO (https://arxiv.org/abs/2104.14294), right?


Yes you are right it's applied to the parameters, but other models (like ngcm) applied it to the inputs. IMO it shouldn't make a huge difference main point is you max differences between models.


> Presumably we still wouldn't enable Modules by default.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: