Further, unemployment benefits are managed by the states, and those states are r...

taylodl · on March 26, 2020

Somewhere there is an architect saying "I told you so!" I can almost guarantee the requirement was to handle several hundred requests per day, an architect pointed out if we get deluged then we won't be able to handle it, so maybe they were able to get them to allow for one or two thousand requests per day.

Now of course we don't know what the architecture of this system is and what the deltas in cost would have been to allow this to scale-out more - but I do know that all too often the more robust solution giving you much greater protection and lower cost down the road is often discarded if it costs even just 5%-10% more. Then the day comes when the people making these decisions get caught flat-footed and they try to blame everyone but themselves. It doesn't always happen like this - but it happens a lot.

jrumbut · on March 26, 2020

This reminds me of an old story about an engineer who took initiative and automated the accounts receivable process at his company, now they get paid 25% faster! He shows his boss and gets a promotion.

He decides to do it again, this time with accounts payable, and is promptly fired.

_ea1k · on March 26, 2020

I think that is small-think. The technical solution is only part of the problem and scaling up all systems to meet the .1% case seldom makes sense. They were smart to save 5-10%.

sushisource · on March 26, 2020

Eh.... On the flip side, processing and storing some simple text forms should be able to handle 1000s of simultaneous users on one box.

So, probably like most software of this nature, the reason it's not scaling is simply because the people who made it probably weren't the greatest engineers on the block.

chaz · on March 26, 2020

These are the same kinds of assumptions that lead engineers to think they can build a [any product] clone in a weekend. It's unlikely that the problem or constraints are nearly as simple as one may think.

Consider: single auth across all the state's services, external APIs, identity verification, address verification, employer ID verification, federal/military ID verification, income/tax verification, phone verification, bank account information, translation into multiple languages, accessibility features, etc. Also, there's probably a lot of legacy infrastructure and process.

Also, if "ability to burst to 10x normal filings per week that might happen once every 40 years" wasn't in the spec, I think they were right not to engineer for it.

taylodl · on March 26, 2020

Admittedly it's a value call. My thought is generally if it's a small incremental cost that greatly increases the robustness then you should go for it. But - sometimes the money or time just isn't there. I'm bothered more by the people not even wanting to have the discussion than by those who do a summary analysis and decide it's not worth it.

_ea1k · on March 27, 2020

That's a fair point. My comment comes from being in too many meetings where people want Twitter scale for conference-room-sized user bases.

It sometimes borders on sealioning.

freepor · on March 27, 2020

The 0.1% case happens. And if it’s going to seriously wreck lives when it happens then you should solve for it. Does Instagram need to handle the 0.1% case? No. But the unemployment website should.

_ea1k · on March 27, 2020

Unemployment forms being delayed by a day or two to deal with poor queuing will not "wreck lives".

marmaduke · on March 26, 2020

Yes but for every architect there's an antiarchitect saying YAGNI!!1

RHSeeger · on March 26, 2020

You spelled pragmatist wrong.

MattSayar · on March 26, 2020

Wow, just found what my state (CO) is doing to help manage the influx. Talk about a low-tech workaround.

>IMPORTANT NOTICE: Because of the high volume of claims, we are asking that you help us help you and our greater community.

>If you need to file an unemployment claim and your last name begins with the letter A - M, file a claim on Sunday, Tuesday, Thursday, or after 12 noon on Saturday.

>If you need to file an unemployment claim and your last name begins with the letter N - Z, file a claim on Monday, Wednesday, Friday or before 12 noon on Saturday.

josephorjoe · on March 26, 2020

Ooooh, like gas rationing in the 70s.

Melting_Harps · on March 27, 2020

> Wow, just found what my state (CO) is doing to help manage the influx. Talk about a low-tech workaround.

> Ooooh, like gas rationing in the 70s.

I wasn't born back then, but I heard about that being based on License plates at a few car meetups by the older guys in the group and I had the same thought when I heard that on CPR.

Odd, but it could work if you have total compliance; lets see how that pans out.

billyhoffman · on March 26, 2020

Distributed load balancing!

TremendousJudge · on March 26, 2020

well as long as it works...

Demiurge · on March 26, 2020

As a developer I immediately though of the power of queues. Twenty people trying to submit same form does not work for everyone, but a queue processing one person at a time might allow the twenty people to submit within a short time. It is flattening the curve! If I was contracted to fix this ASAP, I would set up an nginx front-end proxy config that doesn't allow more than X sessions and suggest a time in the future when they could try again.

driverdan · on March 26, 2020

Having worked on this type of application in the past they should find a new company to work with if they can't handle this traffic. We were handling hundreds of requests per second with ease 10 years ago. That was with MySQL and the app running on the same server.

It doesn't take many resources to show the user a form, validate it, and save to a DB.

dubcanada · on March 26, 2020

A bunch of armchair developers seem to have been summoned to tell the Federal government how to handle form submissions for an extremely security and privacy intense application using their fancy modern techniques.

You are talking about comparing a basic web form with an application for unemployment benefits which must go into a federal tax database and be processed using a what I assume is a garbage mainframe system.

It not only needs to be validated, it needs to securely store records, be able to compare them, and hook up to the system that handles payments, etc.

They can't just circumvent it and dump it into some silly Amazon or MySQL database and call it a day. That would require the employees to basically copy and paste that data into the actual warehouse and considering they have 3+ million to go through as it is making it easy for them to process is just as important as allowing people to submit.

For the time being the correct response is a queue gate.

Stop being silly.

devonkim · on March 26, 2020

Yep, USDS and 18F folks would have to agree with you here. The arcane crap that we have to deal with in payment and government information systems is beyond frustrating and makes it extremely tough. I read an article about having to fix a multi-decade Cisco router bug to get CI/CD and automated deployments working after USDS / 18F started setting up faster deployments but still needed to figure out how to deal with legacy stateful DB connections.

The reality of government paperwork systems on the backend is much, much closer to this hell and is part of why so many like myself ran screaming from public sector because when you see so many peers doing so well at FAANGS, why would you subject yourself to something that resists change and wants to keep it the same way? https://www.washingtonpost.com/news/federal-eye/wp/2014/03/2...

x0x0 · on March 26, 2020

The point is that backend pain shouldn't stop you from accepting it on the front end and putting it into a queue. Making the problem of getting the application through backend systems the states' to deal with, not the applicants'.

thrwn_frthr_awy · on March 26, 2020

So are you applying for the 44k/yr job to fix it? No, most of are not.

kbenson · on March 26, 2020

Yeah, even in the SF/SJ locality which has the highest Locality Pay Adjustment (at 41.44%)[1], the position would likely have to be GS-12/GS/13 to start being competitive.

There is the option of going to some area with a much lower cost of living and trying to hire there, but the problem might be getting enough people together to form a team. If you can easily get enough people with skill and experience, the area probably has jobs for them that pay better, and if those jobs don't exist, it might be hard to find the people.

1: https://www.federalpay.org/gs/locality/san-francisco

devonkim · on March 26, 2020

Eh, USDS and 18F jobs are kind of contract-based and do hit past six figures last I saw. However, they were defunded a lot since last I saw by POTUS45 so it's not clear what the state of comp is. DC area tech is a mish mash of rather enterprise-centric businesses and can be challenging if you're in the wrong domains of expertise.

thrwn_frthr_awy · on March 26, 2020

Unemployment services are ran by the state. The entry level Software Engineer salary by the state of California is around $64k, with senior level salaries between about $75-$105k in Sacramento. I do not know if if this is normal, above average, or below average when compared with other states.

devonkim · on March 26, 2020

Virginia, DC, Maryland have similar cost of living but VA, MD, and DC have drastically different governments, tax rates, rights, and laws despite people working in roughly the same 40 square miles. Even a federal employee graduating and writing software should make more than that. Senior salaries are between $110k and $140k with not a lot of outliers on either end (the distribution matters more to me than a median when talking salary these days for white collar jobs).

California is a huge state and the Bay Area is going to have drastically different stats for even the same industry comparing San Diego, Los Angeles, Sacramento, and San Luis Obispo (yep, there's software jobs there too).

yellowapple · on March 27, 2020

I recall that looking like a fortune last time I was laid off. I would've applied for that in a heartbeat.

Now, not so much, but if my circumstances change, then who knows?

ardy42 · on March 26, 2020

> The point is that backend pain shouldn't stop you from accepting it on the front end and putting it into a queue.

What if the backend rejects the form? The user's already moved on before their form made it through the queue. So then you're stuck re-implementing all the validations the backend needs in order to give the user feedback (which you may not even be able to do) or trying to get the user to come back later to try again.

> Making the problem of getting the application through backend systems the states' to deal with, not the applicants'.

Reducing permanent staff involved in processing applications is probably one of the main reasons the automated system was built in the first place. If they still have to do that, then you might as well just replace the frontend with a printable PDF.

x0x0 · on March 26, 2020

You can pick a balance between some validations and 100%, and I don't think it's that hard unless you're invested in saying this is just UNPOSSIBLE.

There is already processes (a workforce and/or outbound written letters) to reach out to applicants in the case of eg a dispute (terminated for cause vs laid off).

ardy42 · on March 26, 2020

> You can pick a balance between some validations and 100%, and I don't think it's that hard unless you're invested in saying this is just UNPOSSIBLE.

The point is that it's easy to say things should be easy when you don't know anything except the very surface details of the problem, and it's not your job to actually solve it.

Maybe the team that built the system in question were a bunch of dumb-dumbs who just needed a rockstar developer to show them how easy it is to scale, or maybe the problem is actually more complicated than it seems due some hidden complexities or constraints none of us actually know anything about (either technical or business).

joquarky · on March 27, 2020

Put it in a workflow where a form is filled out until it reaches a point where the back-end needs to do some heavy lifting, queue the form for processing, and then notify the user to continue to the next form in the workflow.

Demiurge · on March 26, 2020

I also completely agree with this sentiment. A gov't form could be an unsightly complex beast that can't be re-architected, sometimes, ever.

vsareto · on March 26, 2020

They could, they just don't want to pay for it. The government has no interest in being known for easily handling a huge spike of traffic during a crisis. They can just take the lower road and get by with less and saying 'try again later'. There's no repercussions here because it's the government.

Hence mainframe maintainers should really move to charging $1 million/year in a decade or two.

kazagistar · on March 26, 2020

They aren't choosing to have crap infrastructure, their infrastructure is intentionally defunded as part of a political campaign to engender distrust in government functions and increase privatization. Government is incompetent because if it is, its easy to justify selling off the country to the incredibly wealthy so they can get wealthier.

Demiurge · on March 26, 2020

It's a bit worse than that. The infrastructure isn't actually defunded, there are huge funds allocated to projects, but they're being consumed by managers at Deloitte, Lockheed, Booz Allen, Accenture, etc. The times when we see success is when enough funding trickled down to the few engineers who could make it work with what they get. Other times we see success is when there is enough public oversight by sufficiently independent stake holders. I see this in many local government agencies that are small, and projects accountable to city council, and so on.

seph-reed · on March 26, 2020

So, legitimately, how to we make it so the government does have repercussions? I see a lot of people making jokes about guillotines and nooses, but is there no better way?

Demiurge · on March 26, 2020

I suggest by campaigning to bring logic and critical thinking into early childhood education. Then philosophy, the classics. Science education.

Once you have more people who can understand that there are scientific and moral issues with manifest destiny, and religion isn't going to solve global warming, there will be some shifts in the public discourse and public policies.

champagneben · on March 26, 2020

What's the scientific issue with manifest destiny?

Demiurge · on March 26, 2020

It's the creation of pseudo-scientific explanations for coincidental advantages Europeans had, that created extreme intellectual complacency and bias that is holding back progress.

dsfyu404ed · on March 26, 2020

Having the schools teach RightThink instead of WrongThink tends to be a hard sell to an ideologically diverse nation.

Demiurge · on March 26, 2020

The whole point of logic and philosophy is that it teaches to think and analyze for yourself, Think, not SomeThink.

lern_too_spel · on March 26, 2020

Unemployment benefits are handled by the states, not by the federal government.

blaser-waffle · on March 26, 2020

In Canada, it isn't. The parent's username suggests they talkin about the CAN.

https://www.canada.ca/en/services/benefits/ei.html

In the US, though, it's by-state.

lern_too_spel · on March 27, 2020

GP was talking about GGP's suggestion, which was about a US state (Michigan).

C1sc0cat · on March 26, 2020

None of those comments really help explain why the bottle necks.

If the form has to go into a mainframe well just set up an asynconous Queue

macintux · on March 26, 2020

Validation is the problem. If someone thinks they’ve successfully applied, rejecting them asynchronously is often worse than not letting them apply in the first place.

llampx · on March 26, 2020

The government generally has no problem with rejecting filed claims after review.

macintux · on March 27, 2020

I’m sure, but there’s still, I’d wager, an order of magnitude difference between the paperwork rejected now and that if users were unable to receive immediate feedback in order to correct their input.

Dylan16807 · on March 26, 2020

It's "armchair" to say "you get what you deserve if your entire system depends on garbage"?

...okay, whatever you say.

dubcanada · on March 26, 2020

I called it garbage, but really neither of us know.

onion2k · on March 26, 2020

It doesn't take many resources to show the user a form, validate it, and save to a DB.

I bet that's what the previous developers thought.

What happens if you need to validate the form data against an external service that's coming and going due to the traffic spike?

What if your database is rejecting transactions occasionally?

What happens when your backup process locks all the database tables?

How do you reject duplicate form submissions from people hammering the submit button? Do you query the database to find previous submissions?

What happens when a scriptkiddie decides it'd be fun to DDOS the site? How do you differentiate good traffic from bad traffic?

What do you do when the cloud provider runs out of space and you can't scale up any more (https://news.ycombinator.com/item?id=22691926)?

You need to think of all of these things and many, many more to run a robust online service that can handle spikes hundreds of times bigger than the usual level. It's really not straightforward or simple.

Dylan16807 · on March 26, 2020

Or it's a much simpler problem that they didn't make it semi-fast because it didn't need to be semi-fast.

When "hundreds of times the usual level" is still only 50 page loads per second, and 10 milliseconds of CPU per page would be extreme overkill for anything written in a reasonable way, it actually is straightforward.

GordonS · on March 26, 2020

It's not just CPU though, but IO - I've worked with horrible enterprise systems before that had response times measured in seconds.

Dylan16807 · on March 26, 2020

Even 5 seconds will work if the actions can overlap. If it can't do things in parallel then we have issues much more fundamental than "performance", and there's no defending it as a competent system.

(That is not to say it's necessarily the devs' fault.)

GordonS · on March 26, 2020

I don't mean to defend it too much, because realistically it should be possible with relative ease to handle much more traffic than that - but my point is that in the enterprise and government worlds, things are often not as simple as you think.

Aside from potentially having to interface with dozens of unreliable, painfully slow SOAP-based web services, everything is often hosted on creaking, over-subscribed VMWare hosts, in VMs that would be under-specced regardless.

There is also often a "governing body" that severely restricts your tech stack choices.

Want to use Postgres? Nope, our standard is SQL Server - 2008 edition, actually!

Want to use Python/Ruby/Elixir/Clojure/Kotlin? None of that hipster nonsense here, we use good ole Java/VB.NET here!

Message queue, you say? It's Windows Message Queue with distributed COM all the way down here!

"Containers"? What's a one of those? You'll get a crappy VM with 1 vCPU and 1GB of RAM, and you'll thank me for it! etc...

As a dev, it's horrible and soul-destroying to work under such limitations, but if you have no choice...

driverdan · on March 27, 2020

All of those items are manageable. Some are simple setup or programming errors, some require a bit of added complexity but are normal in modern web apps.

Demiurge · on March 26, 2020

Completely agree with the sentiment. I think most often it is inadequate default configuration that bottle-necks somewhere, that never got tested with more than a handful of users at a time. Going to a hundred highlights some bugs. going to 1000 others. On the other hand, I have worked on a project for USDA and they had 10 year old servers running 15 year old software and did not allow any system administration, while the system admins were some unknown government employees completely inaccessible.

I have had to build python distribution completely in home/user-space in some cases, working on conservatively managed servers.

projektfu · on March 26, 2020

Usually it's not so much the form that causes things to fall down but some validation step that they are trying to do synchronously, that might have to access an IBM mainframe, and things time out. When you're getting a few an hour, it's not a big deal.

noelsusman · on March 26, 2020

At this point introducing a new company could cause more problems than it solves, and I think it's understandable to not be prepared for a volume of jobless claims that is almost an order of magnitude more than at any point in US history.

JMTQp8lwXL · on March 26, 2020

Put the web form (plain static assets JS/CSS/HTML) on a globally accessible CDN. Then use SQS intake for each unemployment application form. Then firehouse it out, wherever it needs to go, at a rate which you can realistically deal with it.

Queuing access to the form itself and telling someone to wake up at 4:52 AM so they can then merely access the static assets is a less-than-desirable user experience.

Demiurge · on March 26, 2020

It is more desirable than 504, and first thing I would do in 15 minutes with zero context. If I can get more context, of course something like your solution is more desirable, depending on the issue. It would take some time to figure whether it is necessary to bring in AWS or just database connection pooler, or whatever.

judge2020 · on March 26, 2020

Even typeform/Google Forms would be better suited for the task.

alasdair_ · on March 26, 2020

>Even typeform/Google Forms would be better suited for the task.

And now you've given a private company access to market-moving unemployment data. And a million other issues, especially legal ones.

The technology part in and of itself isn't that difficult, it's all of the constraints (and, often, mountains of laws) that are the bigger issue.

jkaljundi · on March 26, 2020

In related news on Queue-it: https://tech.eu/brief/queue-it-funding/

_curious_ · on March 26, 2020

Solid company

pjc50 · on March 26, 2020

The matching UK system has a (huge) queue in it: https://www.computerweekly.com/news/252480546/Huge-queues-fo...

Demiurge · on March 26, 2020

Is there a human factor in processing these?

OJFord · on March 26, 2020

Ocado (the IaaS for online supermarkets company, and, in the UK, online-only supermarket itself) has done this in response to the increased demand, and makes you wait in a 'virtual queue' (virtual relative to what in America you call a 'line-up', but we call a 'queue', at a physical supermarket) before you can place or edit your order.

commandlinefan · on March 26, 2020

> If I was contracted to fix this ASAP

You’re assuming that the people who built it in the first place (or the people that may or may not be contracted to fix it later) know or care. Remember, this is government contracting we’re talking about - lowest bidder wins. How do you win the lowest bid? By doing it as cheap and quick as you can. That means hiring inexperienced/cheap developers who can build something that looks like it will work for far less money than you can build something that actually will.

murph-almighty · on March 26, 2020

Secondhand story:

I briefly interned with a state judiciary's IT department around 2015 and got to get lunch with the CIO. He described to me how most court filings in the state had been manual prior to 2008 when the mortgage crisis hit and judges in the tax courts got _slammed_ with cases surrounding foreclosures. This , in turn, drove a need to develop a platform to automate the process of filing a case. It started with the tax court and gradually expanded to automate filings for other court divisions as well (e.g. Family, Civil).

I wouldn't be shocked if the revelation of "holy shit no one can file for unemployment" drove such an investment. I honestly think the next generation of politicians should take a page from product owners by isolate some shitty process that they'd have jurisdiction over, and finding some way to automate it. Bonus points if it's right before a watershed moment- imagine if someone had considered the problem you described prior to the coronavirus epidemic.

enraged_camel · on March 26, 2020

I mean, you can tell the numbers are extremely inaccurate via just a simple, cursory glance at the report.

Pennsylvania reported 378k claims.

California reported... 186k claims.

Yesterday, California's governor said they've received more than 1 million claims since March 13th (so, over a 12 day period from the 13th to the 25th). This DOL report covers March 14th through the 21st.

Are we to believe that the remaining 800k+ people all filed on March 13th, or March 22nd through the 25th?

But there's more. Utah reported an increase of only 9 claims compared to the week before. They went from 1,305 to 1,314.

Then, New York, where more than half of Covid-19 cases in the US are, reported only 80k?

JMTQp8lwXL · on March 26, 2020

As of 3/21. Lockdown in California began in the evening of 3/19. Still not enough time for the numbers to react.

_curious_ · on March 26, 2020

"Are we to believe that the remaining 800k+ people all filed on March 13th, or March 22nd through the 25th?"

I could believe it over the 22-25 stretch.

SpicyLemonZest · on March 26, 2020

Especially since "filed" may here mean when the paperwork was finally able to snake its way into some particular system.

fivre · on March 26, 2020

They also tend to have some... interesting features dictated by the state UI office. When I'd applied in Wisconsin about 6 years back the site stopped accepting form submissions outside business hours.

I assume some less computer-literate higher up thought that someone needed to be around to actually accept the form, same as in-person submissions.

alexpetralia · on March 26, 2020

I confirmed that the state's unemployment website has slowed to a crawl (if it is working at all).