More

tomekf · 2025-12-23T14:13:34 1766499214

How it’s done from technical point?

mmh0000 · 2025-12-23T16:30:53 1766507453

Layers.

PDF is an absurdly complex file format. It's part of the reason there is no single "good" PDF reader, just a lot of mediocre PDF readers that are all terrible in their own way. Which is a topic for another day.

There are several ways to remove data in a PDF:

- Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.

- Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement. The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.

- Then you have the computer illiterate, who think changing the foreground and background color to black is good enough anyway.

zauguin · 2025-12-24T02:32:51 1766543571

This seems highly misleading.

> - Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.

Compared to other formats this is actually relatively easy in a PDF since the way the text drawing operators work they don't influence the state for arbitrary other content. A lot of positioning in a PDF is absolute (or relative to an explicitly defined matrix which has hardcoded values). Usually this makes editing a PDF harder (since when changing text the related text does not adapt automatically), but when removing data it makes it much easier since you can mostly just delete it without affecting anything else. (There are exceptions for text immediately after the removed data, but that's limited and relatively easy to control.)

> - Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement.

That's actually rather tricky in PDFs since they usually contain embedded subset fonts and these usually do not have "🮋" as part of the subset. Also doing this would break the layout since "🮋" has a different width than most letters in a typical font, so it would not lead to less formatting issues than the previous option. Unless the "🮋" is stretched for each letter to have the same dimensions, but then the stretched characters allow to recover the text.

> The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.

PDF does not have a concept of a background color. If it looks like a background color in PDF, you have a rectangle drawn in one color and something in the foreground color in front of it. What you usually see in badly redacted PDF files is exactly this, but in opposite color: Someone just draws a black box on top of the characters. You could argue that this is smarter since it would still work even if someone would chnage colors, but of course, PDF is a vector format. If you just add a rectangle, someone else can remove it again. (And also copy & paste doesn't care about your rectangle)

gruez · 2025-12-24T04:10:35 1766549435

>- Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.

>- Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement. The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.

You're making it sound way harder than it is, when both adobe acrobat and the built-in preview app on mac can both competently redact documents. I'm not aware of instances of either (or any other purpose-made redaction tools) failing. I wouldn't homebrew a python script to do my redaction either, but that doesn't mean doing redactions properly in some insurmountable task for some intern.

array_key_first · 2025-12-24T07:48:04 1766562484

I would not trust either tool to adequately redact documents, though I'm sure it works under normal levels of scrutiny.

The most reliable way is to just screenshot the document or print and scan it, effectively burning it down and recreating it in a new format that has no concept of the past. This works across basically all formats, too, and against all tools.

JumpCrisscross · 2025-12-24T03:55:51 1766548551

> Then you have the computer illiterate, who think changing the foreground and background color to black is good enough anyway

To be fair, this works if you print out those copies and then re-scan them.

hallole · 2025-12-23T21:51:41 1766526701

Thanks for this. Really quells the urge I get every so often to just code my own PDF editor, because they all suck and certainly it couldn't be THAT hard. Such hubris!

brailsafe · 2025-12-24T01:13:13 1766538793

Heh, have at it, here's the full spec: https://developer.adobe.com/document-services/docs/assets/5b...

Should take... a weekend tops? ;) PDF is crazy and scary

marcosdumay · 2025-12-24T03:15:26 1766546126

> PDF includes eight basic types of objects: Boolean values, Integer and Real numbers, Strings, Names, Arrays, Dictionaries, Streams, and the null object

Wait, this is more complete than SOAP. It may be a good idea to redo the IPC protocol with a different serialization format!

jaggederest · 2025-12-24T06:04:07 1766556247

Well, it's a descendant of Postscript (much like JSON is a descendant of Javascript, loosely)

Society would probably never recover if we started implementing RPC-in-Postscript though.

embedding-shape · 2025-12-24T01:54:56 1766541296

7.5.6 "Incremental updates" from the specification is an interesting section too, speaking about accessing data people didn't think to remove from PDF files properly.

CamperBob2 · 2025-12-24T01:56:59 1766541419

We will be able to say that AGI has arrived when we can hand that spec off to a model and tell it to build an Acrobat clone.

exasperaited · 2025-12-24T09:57:26 1766570246

We will be able to say that AGI has arrived when the AI hands it back and says "no".

CamperBob2 · 2025-12-24T17:27:47 1766597267

Or goes on strike.

gregsadetsky · 2025-12-23T23:19:51 1766531991

Don't stop yourself before getting started. I believe in you - maybe you could write the one editor that would actually work!

Not kidding - it's a ~~~billion dollar market haha

Make an MVP/Show HN :-)

kayodelycaon · 2025-12-24T02:24:14 1766543054

I did a bunch of work creating pdfs using a low-level API, object goes here stuff.

As far as I understand it, at its core, pdf is just a stream of instructions that is continually modifying the document. You can insert a thousand objects before you start the next word in a paragraph. And this is just the most basic stuff. Anything on a page can be anywhere in the stream. I don't know if you can go back and edit previous pages, you might have a shot at least trying to understand one page at a time.

Did you know you can have embedded XML in PDFs? You can have a paper form with all the data filled in and include an XML version of that for any computer systems that would like an easier way to read it.

TRiG_Ireland · 2025-12-24T02:34:24 1766543664

The blog post about adding colour gradients to Typst dives into some of the weirdness of the format. https://typst.app/blog/2023/color-gradients

NamTaf · 2025-12-24T01:34:45 1766540085

Bravo to you for recognising the load-bearing 'just' before you threw it around :)

sigwinch · 2025-12-24T06:36:27 1766558187

qpdf has a redaction option. It’s routinely used to anonymize medical records for studies.

general1465 · 2025-12-23T14:40:31 1766500831

Mistaking redaction tool (replaces data with black square) and black highlighter (adds black square as another layer). If people doing redactions are computer-illiterate, they won't see the difference.

3eb7988a1663 · 2025-12-23T22:40:11 1766529611

I remember reading the recommendation for journalists to redact documents is to black them out in the digital version, print it out, and re-scan it. Anything else has too many potential ways by which it might be possible to smuggle data.

dmurray · 2025-12-24T01:19:14 1766539154

Even that might leak to length attacks: one reasonable plaintext would lead to black bars of 1135 px, another to 1138 px, and with enough redactions you can converge on what the plaintext might be.

The only safe way for journalists is to paraphrase what the document said and to say "an unnamed source claims that ..." and to guarantee with your reputation, and the reputation of your publisher, that you are being faithful to what the original source said. For even better results, combine multiple sources.

Unfortunately paraphrasing things and taking editorial responsibility have both been deprecated in favour of rereleasing press releases in the house style, so it's difficult to get the actual journalism these days.

eviks · 2025-12-24T04:29:39 1766550579

You can use constant /variable length replacement to avoid length leaks?

oliwarner · 2025-12-23T20:03:15 1766520195

They drew black boxes over the text. The text is still underneath. On OCR'd scanned documents, the text you'd copy is actually stored in metadata and just linked by position to the image.

Anyway, if you click on a "redaction", you're clicking on the box and can't select the text underneath, but if you just highlight the text around it, you can copy all the original text.

It's a bizarre oversight.

Gigachad · 2025-12-24T03:57:34 1766548654

PDF is less like an image, and more like a web page where elements can be stacked on top of each other. You can visually obscure things by sticking a black rectangle over the top, but anyone who inspects inside the pdf can remove it or see the text in the source.

There would also be a mix of text documents, and image scans. The way to censor each is different.

Perfectly censoring documents, particularly digital ones is actually surprisingly difficult.

stronglikedan · 2025-12-24T04:09:07 1766549347

> Perfectly censoring documents, particularly digital ones is actually surprisingly difficult.

But the difficult part is easily repeatable once it's figured out, which is why it surprises me that it's not built into Acrobat as a tool already.

etskinner · 2025-12-24T05:53:07 1766555587

In fact it is already built into Acrobat: https://helpx.adobe.com/acrobat/desktop/protect-documents/re...

tomekf · 2025-10-23T07:22:58 1761204178

Funny thing about Poland is that each new gov screams about unlawful things previous gov did, especially during elections campaigns, but then almost nothing happens. Rinse and repeat.

tomekf · 2025-10-16T08:12:17 1760602337

There are very nice Thinkpads running on Snapdragon now. But no Linux is available…

tomekf · 2025-10-09T05:26:27 1759987587

Is there a way to use it offline on iOS? I’ve saved a file to my Files app but the I’m unable to save… or I’m missing something completely…?

chunqiuyiyu · 2025-10-09T05:36:58 1759988218

What browser are you using? Could it be that your browser is blocking the content from being saved? I'm able to save normally using Safari on my iPad.

tomekf · 2025-09-15T19:14:43 1757963683

What battery you have in mind? Can you share name/model?

thomas8787 · 2025-09-15T19:43:01 1757965381

Marstek Venus E (5.12kWh), this plug-in model has gained a lot of popularity the last couple of months in some European markets like Belgium, Germany, and the Netherlands. It has a maximum charge rate of 2500W and can discharge at up to 2500W (should be connected to a separate power group and only if local regulations allow it). These kind of batteries plug into a regular AC socket and do not require an electrical inspection. But they aren't legal everywhere.

In my case, it's configured to track the readings of my digital electricity meter. The battery charges itself when my solar panels produce excess power and discharges when it detects grid consumption. Throughout the day, it buffers the intermittent solar power, and during the evening, night, and early morning hours, it keeps my grid power consumption close to zero.

tomekf · on Dec 2, 2024

Scammers do ;-)

tomekf · on Sept 27, 2024

FYI: Delivering terminal is one thing, paying for subscription is another. Poland pays for over 20k starlink terminals subscription. Every month:

https://www.barrons.com/news/poland-says-funding-20-000-star...

Multiple nations delivered them: https://en.wikipedia.org/wiki/Starlink_in_the_Russo-Ukrainia...

tomekf · on Sept 4, 2024

Did anyone use new ARM based devices in the enterprise environment? We have around 5 devices so far, and a part of cloud printing issue, our users love them. Great battery and performance for office tasks.

Any one else had a chance to play with them?

tomekf · on Dec 4, 2023

China is producing more CO2 in a week than many more countries in a year. Same with coal production. They also build many new coal plants (together with atomic ones). China should be first place to address emission.

dragonwriter · on Dec 4, 2023

> China is producing more CO2 in a week than many more countries in a year.

Well, yeah, China is more than 52 times the population of many other countries, so that's hardly surprising.

corethree · on Dec 5, 2023

Right. It needs to be measured per capita.

https://en.wikipedia.org/wiki/List_of_countries_by_greenhous...

From here it's easily seen that An individual in the US is on average one of the worst offenders while an individual from China is doing extremely well.

corethree · on Dec 4, 2023

You're right. We should be glad it's centrally controlled. If business interests in china were able to insert their own democratic opinion into the debate the problem would be much worse.

It begs the question how good is democracy when huge powerful corporate entities with only business interests in mind are able to participate in a democracy? There's no clear answer here.

tomekf · on Aug 22, 2023

This is worth highlighting as it is often omitted in the news and even in official UA communication - Starlik was enabled in Ukraine by Musk, but around 20-40k terminals were purchased and monthly feels are continuously being paid by donor countries, mainly Poland with over 20k units.

There is no free lunch.

inemesitaffia · on Aug 22, 2023

And sometimes you pay part of your bill