Further, unemployment benefits are managed by the states, and those states are running web services which typically see a few hundred hits a day. They are now trying to process tens of thousands of new records each day, and at least in MI the service is absolutely not up to the task.
My wife managed to get her filing completed a little after 1am this morning. She was the only one of her 20 coworkers to successfully file, the rest are continuing to attempt to get the state web site to work today, while more people pile in.
Somewhere there is an architect saying "I told you so!" I can almost guarantee the requirement was to handle several hundred requests per day, an architect pointed out if we get deluged then we won't be able to handle it, so maybe they were able to get them to allow for one or two thousand requests per day.
Now of course we don't know what the architecture of this system is and what the deltas in cost would have been to allow this to scale-out more - but I do know that all too often the more robust solution giving you much greater protection and lower cost down the road is often discarded if it costs even just 5%-10% more. Then the day comes when the people making these decisions get caught flat-footed and they try to blame everyone but themselves. It doesn't always happen like this - but it happens a lot.
This reminds me of an old story about an engineer who took initiative and automated the accounts receivable process at his company, now they get paid 25% faster! He shows his boss and gets a promotion.
He decides to do it again, this time with accounts payable, and is promptly fired.
I think that is small-think. The technical solution is only part of the problem and scaling up all systems to meet the .1% case seldom makes sense. They were smart to save 5-10%.
Eh.... On the flip side, processing and storing some simple text forms should be able to handle 1000s of simultaneous users on one box.
So, probably like most software of this nature, the reason it's not scaling is simply because the people who made it probably weren't the greatest engineers on the block.
These are the same kinds of assumptions that lead engineers to think they can build a [any product] clone in a weekend. It's unlikely that the problem or constraints are nearly as simple as one may think.
Consider: single auth across all the state's services, external APIs, identity verification, address verification, employer ID verification, federal/military ID verification, income/tax verification, phone verification, bank account information, translation into multiple languages, accessibility features, etc. Also, there's probably a lot of legacy infrastructure and process.
Also, if "ability to burst to 10x normal filings per week that might happen once every 40 years" wasn't in the spec, I think they were right not to engineer for it.
Admittedly it's a value call. My thought is generally if it's a small incremental cost that greatly increases the robustness then you should go for it. But - sometimes the money or time just isn't there. I'm bothered more by the people not even wanting to have the discussion than by those who do a summary analysis and decide it's not worth it.
The 0.1% case happens. And if it’s going to seriously wreck lives when it happens then you should solve for it. Does Instagram need to handle the 0.1% case? No. But the unemployment website should.
Wow, just found what my state (CO) is doing to help manage the influx. Talk about a low-tech workaround.
>IMPORTANT NOTICE: Because of the high volume of claims, we are asking that you help us help you and our greater community.
>If you need to file an unemployment claim and your last name begins with the letter A - M, file a claim on Sunday, Tuesday, Thursday, or after 12 noon on Saturday.
>If you need to file an unemployment claim and your last name begins with the letter N - Z, file a claim on Monday, Wednesday, Friday or before 12 noon on Saturday.
> Wow, just found what my state (CO) is doing to help manage the influx. Talk about a low-tech workaround.
> Ooooh, like gas rationing in the 70s.
I wasn't born back then, but I heard about that being based on License plates at a few car meetups by the older guys in the group and I had the same thought when I heard that on CPR.
Odd, but it could work if you have total compliance; lets see how that pans out.
As a developer I immediately though of the power of queues. Twenty people trying to submit same form does not work for everyone, but a queue processing one person at a time might allow the twenty people to submit within a short time. It is flattening the curve! If I was contracted to fix this ASAP, I would set up an nginx front-end proxy config that doesn't allow more than X sessions and suggest a time in the future when they could try again.
Having worked on this type of application in the past they should find a new company to work with if they can't handle this traffic. We were handling hundreds of requests per second with ease 10 years ago. That was with MySQL and the app running on the same server.
It doesn't take many resources to show the user a form, validate it, and save to a DB.
A bunch of armchair developers seem to have been summoned to tell the Federal government how to handle form submissions for an extremely security and privacy intense application using their fancy modern techniques.
You are talking about comparing a basic web form with an application for unemployment benefits which must go into a federal tax database and be processed using a what I assume is a garbage mainframe system.
It not only needs to be validated, it needs to securely store records, be able to compare them, and hook up to the system that handles payments, etc.
They can't just circumvent it and dump it into some silly Amazon or MySQL database and call it a day. That would require the employees to basically copy and paste that data into the actual warehouse and considering they have 3+ million to go through as it is making it easy for them to process is just as important as allowing people to submit.
For the time being the correct response is a queue gate.
Yep, USDS and 18F folks would have to agree with you here. The arcane crap that we have to deal with in payment and government information systems is beyond frustrating and makes it extremely tough. I read an article about having to fix a multi-decade Cisco router bug to get CI/CD and automated deployments working after USDS / 18F started setting up faster deployments but still needed to figure out how to deal with legacy stateful DB connections.
The reality of government paperwork systems on the backend is much, much closer to this hell and is part of why so many like myself ran screaming from public sector because when you see so many peers doing so well at FAANGS, why would you subject yourself to something that resists change and wants to keep it the same way? https://www.washingtonpost.com/news/federal-eye/wp/2014/03/2...
The point is that backend pain shouldn't stop you from accepting it on the front end and putting it into a queue. Making the problem of getting the application through backend systems the states' to deal with, not the applicants'.
Yeah, even in the SF/SJ locality which has the highest Locality Pay Adjustment (at 41.44%)[1], the position would likely have to be GS-12/GS/13 to start being competitive.
There is the option of going to some area with a much lower cost of living and trying to hire there, but the problem might be getting enough people together to form a team. If you can easily get enough people with skill and experience, the area probably has jobs for them that pay better, and if those jobs don't exist, it might be hard to find the people.
Eh, USDS and 18F jobs are kind of contract-based and do hit past six figures last I saw. However, they were defunded a lot since last I saw by POTUS45 so it's not clear what the state of comp is. DC area tech is a mish mash of rather enterprise-centric businesses and can be challenging if you're in the wrong domains of expertise.
Unemployment services are ran by the state. The entry level Software Engineer salary by the state of California is around $64k, with senior level salaries between about $75-$105k in Sacramento. I do not know if if this is normal, above average, or below average when compared with other states.
Virginia, DC, Maryland have similar cost of living but VA, MD, and DC have drastically different governments, tax rates, rights, and laws despite people working in roughly the same 40 square miles. Even a federal employee graduating and writing software should make more than that. Senior salaries are between $110k and $140k with not a lot of outliers on either end (the distribution matters more to me than a median when talking salary these days for white collar jobs).
California is a huge state and the Bay Area is going to have drastically different stats for even the same industry comparing San Diego, Los Angeles, Sacramento, and San Luis Obispo (yep, there's software jobs there too).
> The point is that backend pain shouldn't stop you from accepting it on the front end and putting it into a queue.
What if the backend rejects the form? The user's already moved on before their form made it through the queue. So then you're stuck re-implementing all the validations the backend needs in order to give the user feedback (which you may not even be able to do) or trying to get the user to come back later to try again.
> Making the problem of getting the application through backend systems the states' to deal with, not the applicants'.
Reducing permanent staff involved in processing applications is probably one of the main reasons the automated system was built in the first place. If they still have to do that, then you might as well just replace the frontend with a printable PDF.
You can pick a balance between some validations and 100%, and I don't think it's that hard unless you're invested in saying this is just UNPOSSIBLE.
There is already processes (a workforce and/or outbound written letters) to reach out to applicants in the case of eg a dispute (terminated for cause vs laid off).
> You can pick a balance between some validations and 100%, and I don't think it's that hard unless you're invested in saying this is just UNPOSSIBLE.
The point is that it's easy to say things should be easy when you don't know anything except the very surface details of the problem, and it's not your job to actually solve it.
Maybe the team that built the system in question were a bunch of dumb-dumbs who just needed a rockstar developer to show them how easy it is to scale, or maybe the problem is actually more complicated than it seems due some hidden complexities or constraints none of us actually know anything about (either technical or business).
Put it in a workflow where a form is filled out until it reaches a point where the back-end needs to do some heavy lifting, queue the form for processing, and then notify the user to continue to the next form in the workflow.
They could, they just don't want to pay for it. The government has no interest in being known for easily handling a huge spike of traffic during a crisis. They can just take the lower road and get by with less and saying 'try again later'. There's no repercussions here because it's the government.
Hence mainframe maintainers should really move to charging $1 million/year in a decade or two.
They aren't choosing to have crap infrastructure, their infrastructure is intentionally defunded as part of a political campaign to engender distrust in government functions and increase privatization. Government is incompetent because if it is, its easy to justify selling off the country to the incredibly wealthy so they can get wealthier.
It's a bit worse than that. The infrastructure isn't actually defunded, there are huge funds allocated to projects, but they're being consumed by managers at Deloitte, Lockheed, Booz Allen, Accenture, etc. The times when we see success is when enough funding trickled down to the few engineers who could make it work with what they get. Other times we see success is when there is enough public oversight by sufficiently independent stake holders. I see this in many local government agencies that are small, and projects accountable to city council, and so on.
So, legitimately, how to we make it so the government does have repercussions? I see a lot of people making jokes about guillotines and nooses, but is there no better way?
I suggest by campaigning to bring logic and critical thinking into early childhood education. Then philosophy, the classics. Science education.
Once you have more people who can understand that there are scientific and moral issues with manifest destiny, and religion isn't going to solve global warming, there will be some shifts in the public discourse and public policies.
It's the creation of pseudo-scientific explanations for coincidental advantages Europeans had, that created extreme intellectual complacency and bias that is holding back progress.
Validation is the problem. If someone thinks they’ve successfully applied, rejecting them asynchronously is often worse than not letting them apply in the first place.
I’m sure, but there’s still, I’d wager, an order of magnitude difference between the paperwork rejected now and that if users were unable to receive immediate feedback in order to correct their input.
You need to think of all of these things and many, many more to run a robust online service that can handle spikes hundreds of times bigger than the usual level. It's really not straightforward or simple.
Or it's a much simpler problem that they didn't make it semi-fast because it didn't need to be semi-fast.
When "hundreds of times the usual level" is still only 50 page loads per second, and 10 milliseconds of CPU per page would be extreme overkill for anything written in a reasonable way, it actually is straightforward.
Even 5 seconds will work if the actions can overlap. If it can't do things in parallel then we have issues much more fundamental than "performance", and there's no defending it as a competent system.
(That is not to say it's necessarily the devs' fault.)
I don't mean to defend it too much, because realistically it should be possible with relative ease to handle much more traffic than that - but my point is that in the enterprise and government worlds, things are often not as simple as you think.
Aside from potentially having to interface with dozens of unreliable, painfully slow SOAP-based web services, everything is often hosted on creaking, over-subscribed VMWare hosts, in VMs that would be under-specced regardless.
There is also often a "governing body" that severely restricts your tech stack choices.
Want to use Postgres? Nope, our standard is SQL Server - 2008 edition, actually!
Want to use Python/Ruby/Elixir/Clojure/Kotlin? None of that hipster nonsense here, we use good ole Java/VB.NET here!
Message queue, you say? It's Windows Message Queue with distributed COM all the way down here!
"Containers"? What's a one of those? You'll get a crappy VM with 1 vCPU and 1GB of RAM, and you'll thank me for it! etc...
As a dev, it's horrible and soul-destroying to work under such limitations, but if you have no choice...
All of those items are manageable. Some are simple setup or programming errors, some require a bit of added complexity but are normal in modern web apps.
Completely agree with the sentiment. I think most often it is inadequate default configuration that bottle-necks somewhere, that never got tested with more than a handful of users at a time. Going to a hundred highlights some bugs. going to 1000 others. On the other hand, I have worked on a project for USDA and they had 10 year old servers running 15 year old software and did not allow any system administration, while the system admins were some unknown government employees completely inaccessible.
I have had to build python distribution completely in home/user-space in some cases, working on conservatively managed servers.
Usually it's not so much the form that causes things to fall down but some validation step that they are trying to do synchronously, that might have to access an IBM mainframe, and things time out. When you're getting a few an hour, it's not a big deal.
At this point introducing a new company could cause more problems than it solves, and I think it's understandable to not be prepared for a volume of jobless claims that is almost an order of magnitude more than at any point in US history.
Put the web form (plain static assets JS/CSS/HTML) on a globally accessible CDN. Then use SQS intake for each unemployment application form. Then firehouse it out, wherever it needs to go, at a rate which you can realistically deal with it.
Queuing access to the form itself and telling someone to wake up at 4:52 AM so they can then merely access the static assets is a less-than-desirable user experience.
It is more desirable than 504, and first thing I would do in 15 minutes with zero context. If I can get more context, of course something like your solution is more desirable, depending on the issue. It would take some time to figure whether it is necessary to bring in AWS or just database connection pooler, or whatever.
Ocado (the IaaS for online supermarkets company, and, in the UK, online-only supermarket itself) has done this in response to the increased demand, and makes you wait in a 'virtual queue' (virtual relative to what in America you call a 'line-up', but we call a 'queue', at a physical supermarket) before you can place or edit your order.
You’re assuming that the people who built it in the first place (or the people that may or may not be contracted to fix it later) know or care. Remember, this is government contracting we’re talking about - lowest bidder wins. How do you win the lowest bid? By doing it as cheap and quick as you can. That means hiring inexperienced/cheap developers who can build something that looks like it will work for far less money than you can build something that actually will.
I briefly interned with a state judiciary's IT department around 2015 and got to get lunch with the CIO. He described to me how most court filings in the state had been manual prior to 2008 when the mortgage crisis hit and judges in the tax courts got _slammed_ with cases surrounding foreclosures. This , in turn, drove a need to develop a platform to automate the process of filing a case. It started with the tax court and gradually expanded to automate filings for other court divisions as well (e.g. Family, Civil).
I wouldn't be shocked if the revelation of "holy shit no one can file for unemployment" drove such an investment. I honestly think the next generation of politicians should take a page from product owners by isolate some shitty process that they'd have jurisdiction over, and finding some way to automate it. Bonus points if it's right before a watershed moment- imagine if someone had considered the problem you described prior to the coronavirus epidemic.
I mean, you can tell the numbers are extremely inaccurate via just a simple, cursory glance at the report.
Pennsylvania reported 378k claims.
California reported... 186k claims.
Yesterday, California's governor said they've received more than 1 million claims since March 13th (so, over a 12 day period from the 13th to the 25th). This DOL report covers March 14th through the 21st.
Are we to believe that the remaining 800k+ people all filed on March 13th, or March 22nd through the 25th?
But there's more. Utah reported an increase of only 9 claims compared to the week before. They went from 1,305 to 1,314.
Then, New York, where more than half of Covid-19 cases in the US are, reported only 80k?
They also tend to have some... interesting features dictated by the state UI office. When I'd applied in Wisconsin about 6 years back the site stopped accepting form submissions outside business hours.
I assume some less computer-literate higher up thought that someone needed to be around to actually accept the form, same as in-person submissions.
My wife managed to get her filing completed a little after 1am this morning. She was the only one of her 20 coworkers to successfully file, the rest are continuing to attempt to get the state web site to work today, while more people pile in.
These numbers are going to get much, much worse.