HTTP 200 {"status": "success", "payload": "{\"error\": true}"}
Ah, the 200 Go Fuck Yourself pattern.
I use HTTP error codes in my API, and still occasionally see a GET /resource/{“error”:“invalid branchID provided”} from people who don’t seem to know what they are.
I know an architect who designs APIs this way. Also includes a status code in the response object because why have one status code when you can have two, potentially contradictory, status codes?
I may have run in your acquaintance work, stuff along the lines of
200 OK
{ error_code: s23, error_msg: "An error was encountered when performing the operation" }
If you happen to run into him, kindly tackle him in the groin for me.
Thanks!
Well, looking at your example, I think a good case can even be made for it.
“s23” doesn’t look like an HTTP status code, so including it can make total sense. After all, there’s plenty of reasons why you could want custom error codes that don’t really align with HTTP codes, and customised error messages are also a sensible use case for that.
Of course duplicating the actual HTTP status code in your body is just silly. And if you use custom error codes, it often still makes sense to use the closest matching HTTP status code in addition to it (so yeah, I agree the 200 in your example doesn’t make a lot of sense). But neither of those preclude good reasons for custom codes.
Still, 200 should not be returned. If you have your own codes, just return 500 alongside that custom code.
But WHY are you trying to make a case for a bad practice? Don’t enable this kind of bullshit, please.
If there’s an error, don’t say it’s 200 OK. Give me something, a 4xx, or at least a 500. Sure, add all you want to the body, but respect the goddamn headers!
This fucks up so many things - starting right with API specs and documentation, s23 (or any other code this crap spits out) are not a part of the pdf file, which is the ONLY available documentation for this 3rd party service. If it serves any internal purpose, I have no clue, but for me it’s useless.
Log analytics is a mess, and you can forget about auto-generating a client, of course…
This is just a huge red flag for me, if their public interfaces look like this, I dont want to know whats under the hood, and I’m actively lobbying for us to change to another provider.
I’m making a case for custom codes, not for using a 200 status code with it. My reply said the 200 didn’t make sense.
Of course once you use custom codes, the actual HTTP status codes do become less important, because there’s some redundancy there. That’s not an argument to do it wrong, but it is an argument that accurate HTTP status codes are less of a priority. So understandably some people will take shortcuts.
Apparently you find this very frustrating, but in the end it’s just an implementation detail. But it also sounds like you’re more frustrated with the service API as a whole than the fact it uses custom error codes specifically, so I’m just going to leave it at that.
I inherited a project where it was essentially impossible to get anything other than 200 OK. Trying to use a private endpoint without logging in? 200 OK unauthorized. Sent gibberish instead of actual request body format? 200 OK bad request. Database connection down? You get the point…
It’s the HTTP version of “great job.”
You get the point…
Computer version of dude wincing through the pain, tears in eyes, giving you a thumbs up.
Lmao do they work at Oracle???
When I used to work at Oracle every so often a customer would call and complain some function was throwing error “ORA-00000 normal successful completion” and they wanted it filing as a bug and for us to fix it.
I was never quite sure how we were supposed to fix stupid.
Ugh this just reminded me that I ran into this exact issue a couple years ago. We were running jobs every hour to ingest data from an API into our data warehouse. Eventually we got reports from users about having gaps in our data. We dug into it for days trying to find a pattern, but couldn’t pinpoint anything. We were just missing random pieces of data, but our jobs never reported any failures.
Eventually we were able to determine the issue. HTTP 200 with “error: true” in the response. Fml
I’ve seen the status code in a JSON response before: https://cloud.google.com/storage/docs/json_api/v1/status-codes#401-unauthorized
One reason I can think of for including it is that it may make it easier for the consumer to check the status code if it’s in the JSON. Depending on how many layers of abstraction you have, your app may not have access to the raw HTTP response.
Although, yeah you lose the single source of truth though.
Depending on how many layers of abstraction you have, your app may not have access to the raw HTTP response.
That sounds like either over-abstraction or bad abstraction then
Yea, I don’t really see a scenario where you are both, making http requests (and therefore care about http responses), and also not able to see the response.
If you are using some wrapper client for an API, you wouldn’t be dealing with the response anyway so it being in json isn’t particularly helpful
Here I am preferring 200, with success boolean / message string…
Iike HTTP errors codes for real fuck up’s, if I see 500 somethings fucked in the app, otherwise a standardised json response body seems way easier
What about both? User supplies bad input? HTTP 400 with response body json describing the error in a standard format?
when you are too lazy to ask your request library to not throw exception on non-200 responses.
Throwing exceptions is fine since errors are an exceptional circumstance (not expected during normal use of the app), and you probably want errors to follow a different code path so that they can be logged, alerts triggered if needed, etc.
I always loved how Sierra took its error message and turned it into an intentionally quitting the game message because every time they closed the game, instead of closing properly it crashed.
And no error message…
I guess that’s how it’s done. Yeah.
This is always how graphql works :)
Doesn’t matter, the client ignores the error anyways.
This is very frustrating! I get so many requests from customers asking why we returned response code 400 when we gave a description of the problem in the response body.
Ah, I see you too have run code in Azure Functions…
Getting only a message with no error indicator isn’t much better either
This became a religious war at my last role.
I had a similar one at a past work too. A test which was asserting a response status 500.
Like, instead of the test asserting the correct error/status code was being returned, it was instead asserting any error would simply getting masked as a 500.
Basically, asserting the code was buggy…
That made me angry a couple of times but I still miss that place sometimes.
Someone GraphQLs
me with gRPC error codes: nil, parameter error, app error – OK, you fucked up, we fucked up. Edit: forgot NotFound.
I really should read about the various ones that exist at some point, but I’ve always got bigger fires to put out.
Edit, since it seems unclear, gRPC != HTTP and does not use the same status codes. I meant that I felt like I was using fewer than I should, though I just checked and basically not.
This is basically the difference between HTTP 4xx and 5xx error codes. 4xx means the client did something wrong (invalid request, tried to load something that doesn’t exist, doesn’t have access), whereas 5xx means the request was OK but something broke on the server.
Yeah, I know how http status codes work. I just followed the existing pattern at my current place with gRPC and this post made me realize I don’t know most gRPC error codes and best practices.
The way you phrased it made it sound like you definitely don’t
gRPC does not use HTTP status codes. I meant that I might be making a similar mistake with gRPC status codes though, after checking just now, not so much (there are only 17 total codes, not all of which apply to my APIs).
Several Favicon APIs do this. Even Google’s Favicon endpoint does it, because they return a fallback image. It’s pretty annoying.
At a prior job, our
APIload balancers would swallow all errors and return an HTTP 200 response with no content. It was because we had one or two clients with shitty integrations that couldn’t handle anything but 200. Of course, they brought in enough money that we couldn’t ever force them to fix it on their end.I once worked on a project where the main function would run the entire code in a try-catch block. The catch block did nothing. Just returned 200 OK. Didn’t even log the error anywhere. Never seen anything so incredibly frustrating to work on.
Why not
POST /to/the/api?withCorrectErrorCodes
?Assuming there was some API key system in place, could just check on the key to see if it belongs to one of those clients. If yes, 200. Else, real APIs.
There was nothing RESTful or well planned about this API’s interfaces, and the work to do something like that would have been nontrivial. Management never prioritized the work.
I think we might have worked at the same company. Did it begin with a K?
Welcome to graphQL. The REST abstraction few need, but everyone wants for some reason.
I looked into it once at my last company, but none of us knew it and we had a tight deadline. For our scale and usecase, it definitely seemed like needless complication for most things compared to any payoff of switching.
My team recently migrated to graphql and they don’t even do it right. The graphql layer still makes REST calls and then translates them to a gql format, so not only do we get no time or computing savings, we also get the bullshit errors
Funny who it’s your team but they did it poorly.
The royal “my team”. I’m on qa, no say in development architecture unfortunately
It make sense for a wrapper layer to do this and I had to fight against APIs that didn’t. If I make a single HTTP call that wraps multiple independent API calls into one, then the overall HTTP code should reflect status of the wrapper service, and the individual responses should each have their own code as returned by the underlying services.
For example on one app we needed to get user names by user id for a bunch of users. To optimize this, we batched calls into groups. The API would fail with an error code if one of the user ids in the batch was bad or couldn’t be found. That meant we wouldn’t be getting data for any of the users in the batch and we didn’t know which userId was bad either. Such a call should return 200 for the overall call and individual result for each id, some of which could be errors.
I’ve got better news:
- notice 200 error:true story on our side
- fix it
- fix it better: add detailed description, add message on what needs to be done on client side
Client to mutual users: meh, we see an error, not our problem. Me: screams in swear language
Honestly makes perfect sense.
- Message received and successfully parsed.
- An error occured while processing request. Ideally they would have a message in the response saying what went wrong if it is relevant for the user.
The problem with only reacting with 500 Internal Server Error is that the user will never improve their input data, if they can do something about it. Responding with 404 is just mean as they wont know if the endpoint is not found or the database couldn’t find any data. Differentiating the communication from the processing is i.m.o the best way to do it.
That’s not what HTTP errors are about, HTTP is a high level application protocol and its errors are supposed to be around access to resources, the underlying QUIC or TCP will handle most lower level networking nuances.
Also, 5xx errors are not about incorrect inputs, that’s 4xx.
…HTTP is a high level application protocol and its errors are supposed to be around access to resources…
I’ve had fellow developers fight me on this point, in much the same way as your parent post.
“If you return a 404 for a record not found, how will I know I have the right endpoint?”
You’ll know you have the right endpoint because I advertised it—in Open API, in docs, etc.
“But, if
/users/123
returns a 404, does that mean that the endpoint can’t be found or the record can’t be found?”Doesn’t matter. That resource doesn’t exist. So, act appropriately.
It’s not like you can’t return a body with the 404 that specifies that the user itself is not found versus the ending being wrong.
Standardize a response body across your APIs that specifies the cause of the non-2xx response. Have an enum per API/service for causes. Include them in the API doc.
If anyone still doesn’t get it, quietly dispose of them at your friend’s pig farm.
And it’s not even always a simple case of “that resource doesn’t exist”. A 404 could also mean that the resource does exist but the current authenticated user doesn’t have the correct permissions to access it, so it’s more like “as far as you know that resource doesn’t exist”. Some people might argue that 403 should be used for that, but then you’re telling potential bad actors that maybe shouldn’t even have access to your documentation that they have indeed found a valid endpoint.
Avoiding 403 seems like a security through obscurity approach to me.
I suppose there might be some special admin only endpoints you’d want to 404 on if the user is not an admin. But for most cases it’s really hell integrating an API that 404s on everything… is my token invalid, did I set a parameter wrong, or did I get the path wrong? I guess I gotta spend all day doing trial and error to figure it out. Fun!
Also makes integration tests on your security unreliable. Someone renames an endpoint and suddenly your integration tests aren’t actually testing security anymore. Checking for 403 and getting a 404 because someone renamed something will indicate the test needs to be updated to use the new path. Checking for 404 (because the user isn’t supposed to have access) and getting 404 (because the path was changed) means your test is useless but you won’t know it was rendered useless.
Some osint tools use this : they test an email on thousands of services, and use the error result (403/404) to know if the person has an account there.
It depends on the context. If it’s an URL that is easy to guess and reflects user-created content, your system is leaking information about their users if it returns 403. The example that comes to mind is GitHub returning 404s for both nonexisting and private repos when the authenticated user doesn’t have access to it.
No.
404 is for “I can not confirm this resource exists”
For example, a private github repo must return 404 for unauthorized users, API requests must act as if that repository doesn’t exist (including returning 404 status codes).
403 is for “I can confirm this resource exists, you cannot access it”
I usually treat a path as a series of dereference operations, each with a potential security precondition. You could protect /secure/… with credential checks, and report 403 at that point, before even looking at the rest of the resource path. It exposes the prefix but not the multiple endpoints that might exist below that point.
The parser in most APIs will automatically handle parsing responses for 400 errors, but if the logic fails due to data being wrong, what do you respond with? E.g you send a valid SSN but the database could not find the person, or you send a valid email, but bo such email was found.
You can send 4xx errors yourself too. If the client needs to change something about the request, that’s a 4xx, like 400 Bad Request. If the server has an error and it’s not the client’s fault, that’s a 5xx like 502 Bad Gateway.
The wikipedia listing of all HTTP codes is quite helpful. People also forget you can send a custom response body with a 4xx or 5xx error. So you can still make a custom JSON error message.
Obviously you can, and i do returm 4xx codes if the initial parsing, authentication or something else goes wrong im the controller, but once im in the next api, or any number of systems down the chain, im probably gonna return a 200 with a status with a tracking code. It’s proven, at least for us very helpful to find issues fast on both sides. To me getting a 4xx back when it’s step 6 our of 13 that is the problem in the API but the request itself is fine doesn’t seem meaningful and just makes customers assume things. I guess if every endpoint only does one thing, i’d probably do like you.
A 2xx means success to its requester. If you have an error in step 6 out of 13 that breaks the resource action, you shouldn’t be returning a success.
You might argue what to return and what kind of information to include in the response (like tracking numbers), but it shouldn’t be a 2xx and I don’t see how a misleading 200 would be more helpful than a 400 bad request.
I mean sure, in the strict meaning of the code-guides you are probably correct. Most problems stems for us at least from cross-reference issues which are normally configuration problems in the underlying system or other data-related issues. Those are often not neither the responsibility of the server or the client, and sometimes its both. There are often no code that is suitable to respond, and to just send “Bad request” when it’s a good request - does not make sense. Therefore i think it’s better to let badrequest be for bad requests, instead tell the client that sure, this is a good request but for this reason it didn’t work this time. This has to happen for it to work. Either i can do it with a simple structure in json with maybe 5 status codes and a message, or i have to figure out what 20 http status codes both i and the client has to implement and give them meaning that isn’t their intended meaning.
and to just send “Bad request” when it’s a good request - does not make sense
That’s when you use a 5xx status, then. The client doesn’t care how many other services you reach out to in order to fulfill their request. A 5xx code also covers failures in other parts of the system.
That’s still a 4xx situation.
Except of course that http has a myriad of response codes that are more useful than a 200 with an error body. This was a serious mistake of GraphQL imo
What’s wrong with graphql over a web socket? Graphql doesn’t necessitate http or any other transport method, it can be done via pigeons. Graphql has zero control over how http works when you use graphql over http, it doesn’t force implementors to use http at all
Aww a whole new generation of devs get to make the same mistakes SOAP made. Makes me feel all fuzzy inside.
I used SOAP in my first web dev job over a decade ago when I was making flight search software and connecting to horrific APIs owned by the airline industry to get flight details and purchase tickets. Why are these two things even remotely the same? It’s closer to SQL than SOAP, and I’d choose graphql over any soap api. I still wouldn’t do it over http if I could avoid it though.
Meanwhile, in the real world…
Then complain to Apollo or whoever created the server, not the graphql spec. I’ve used graphql over a web socket on production apps for almost a decade now. I don’t use http for graphql if I can avoid it and I always have been able to.