Can you tell us more about the motivation for this project? I'm very curious if it was driven by a specific use case.
I know there are specialized trading firms that have implemented projects like this, but most industry workflows I know of still involve data pipelines with scientists doing intermediate data transformations before they feed them into these models. Even the c-backed libraries like numpy/pandas still explicitly depend on the cpython API and can't be compiled away, and this data feed step tends to be the bottleneck in my experience.
That isn't to say this isn't a worthy project - I've explored similar initiatives myself - but my conclusion was that unless your data source is pre-configured to feed directly into your specific model without any intermediate transformation steps, optimizing the inference time has marginal benefit in the overall pipeline. I lament this as an engineer that loves making things go fast but has to work with scientists that love the convenience of jupyter notebooks and the APIs of numpy/pandas.
The motivation was edge and latency-critical use cases on a product I consulted on. Feature vectors arrived pre-formed and a Python runtime in the hot path wass a non-starter. You're right that for most pipelines the transformation step is the bottleneck, not inference, and Timber doesn't solve that (though the Pipeline Fusion pass compiles sklearn scalers away entirely if your preprocessing is that simple). Timber is explicitly a tool for deployments where you've already solved the data plumbing and the model call itself is what's left to optimize.
Highly doubt this. Have you read a translated book? Are you looking for literal translations or a translation from someone who's an expert in both languages and makes subjective adjustments based on their experience?
No, I agree with the other commenter. I'd rather read broken English than the fake tone AI injects on everything (and the suspicion of fabrications, too).
In my new domain, photography, the most common "advice" for beginners is to learn the exposure triangle, shoot manual and get everything done in camera. This kind of advice comes from beginners, quite close to take a fall from the Dunning-Kruger scale. I'm working towards a distinction from one of the most respected photography organizations in the world and nobody involved with it that gave me guidance ever asked how I took the images.
Maybe or, most likely this is the same for writing: there are people that think correct grammar and punctuation and no help on achieving this, means writing.
Why? Would a incorrect but literal translation be closer or further from what the author is trying to communicate?
I've been seeing this take on HN a lot recently, but when it comes to translation current AI is far, far superior to what we had previously with Google Translate, etc.
If the substack was written in broken English there's no way it would even be appearing on the front page here, even less so if it was written in Chinese.
An incorrect but more authentic translation would seem more real, like an human earnestly trying to tell a story. We would accept the imperfections and have a subjective feeling of more authenticity.
When the translation differs so much from what the author is trying to say in their native language, it loses its earnestness.
That's why translation is a job in the first place and you don't see publishers running whole books through Google translate. No one, least the authors, would accept that.
We don't know how much the imperfect translation would differ from the author’s intent, but we would sure try to meet him halfway. Nobody would criticize his broken English.
Contrast this with the faux polite, irritating tone of the AI, complete with fabrications and phrases the author didn't even intend to write.
Authenticity has value. AI speech is anything but authentic.
I mean, you're making assumptions about the author's intent going one way, but not the other. What if the polite tone is what they intended? And how do you know they didn't review the output for phrasing and fabrications?
The author acknowledged they used AI to translate. Is the translation they decided to publish among the given tools they had available to them not by definition the most authentic and intentional piece that exists?
All of this aside, how do you think tools like Google Translate even work? Language isn't a lookup table with a 1:1 mapping. Even these other translation tools that are being suggested still incorporate AI. Should the author manually look up words in dictionaries and translate word by word, when dictionaries themselves are notoriously politicized and policed, too?
That's not true from a mechanical perspective. Most SUVs use the same frame and parts as trucks by the same manufacturer (which is why they handle so poorly compared to sedans - it isn't just center of gravity)
If you define SUV as body-on-frame, sure. But most people think of crossovers as SUVs, and most are unibody. It's a big umbrella and how it's made isn't how mainstream thinks about buying.
If uv figures out a way to capture the scientific community by adding support for conda-forge that'll be the killshot for other similar projects, imo. Pixi is too half-baked currently and suffers from some questionable design decisions.
The key thing of conda-forge is that it's language (rust/go/c++/ruby/java/...) and platform (linux/macos/win/ppc64le/aarch64/...) agnostic rather than being python only.
If you want you can depend on a C++ and fortran compiler at runtime and (fairly) reliably expect it to work.
I used to work at amazon and had a medical exception for working from home. While obtaining the exception, the HR person in charge of my case would repeatedly call my personal cell phone to ask me questions about my disability. They did this 4-5 times despite my insistence that we keep all correspondence written and over email and despite me fulfilling all listed documentation requirements. Once my exception for my chronic condition was approved, they noted that I would need to renew every 6 months, because I guess lifelong conditions you're born with warrant constant validation.
maybe someone will finally highlight how ridiculous the gridlock is on the b44-sbs route, particularly through south williamsburg. I regularly see convoys of 4-5 buses arriving at the same time because the traffic through that neighborhood is so bad that the buses eventually catch up to each other and I regularly have to wait 30+ minutes for it on either end of the route.
I know there are specialized trading firms that have implemented projects like this, but most industry workflows I know of still involve data pipelines with scientists doing intermediate data transformations before they feed them into these models. Even the c-backed libraries like numpy/pandas still explicitly depend on the cpython API and can't be compiled away, and this data feed step tends to be the bottleneck in my experience.
That isn't to say this isn't a worthy project - I've explored similar initiatives myself - but my conclusion was that unless your data source is pre-configured to feed directly into your specific model without any intermediate transformation steps, optimizing the inference time has marginal benefit in the overall pipeline. I lament this as an engineer that loves making things go fast but has to work with scientists that love the convenience of jupyter notebooks and the APIs of numpy/pandas.
reply