I don't think the actual root cause is some network timeout, but some internal Tank timeout, having problems with the normalization or storing the file to disk.
Because other, way larger files (400MB flac) which took 19mins worked fine. Only this MP3 with 76MB failed under 10mins.
David Trattnigchanged title from Upload to Tank failed after "normalizing" stage with "" to Upload to Tank failed after "normalizing" stage with "504 Gateway Timeout"
changed title from Upload to Tank failed after "normalizing" stage with "" to Upload to Tank failed after "normalizing" stage with "504 Gateway Timeout"
In all likelyness tank doesn’t respond with a 504 error (if it does, it shouldn’t). It’s more likely that the 504 Gateway timeout error is triggered by the NGINX reverse proxy. If we want to reduce the frequency of these request errors we need to adapt the NGINX configuration (probably the proxy_read_timeout configuration for the tank location). But with a polling based architecture the dashboard will always need to have handling for connection timeouts – and it does. That’s why I say that the 504 errors aren’t really our problem. Same goes for 404 errors at the same endpoint. We have handling for both in the dashboard so any permanent error is something else.
The requests are red-ish because the dev tools network panel simply colors network requests that have a status code >= 400 red-ish. That doesn’t necessarily mean that the client app doesn’t have handling for these requests.
Thank you for the analysis. Are you saying, that the problem is likely not related to Dashboard and Steering but Nginx, and therefore Jointech should take over this task? (cc @kay_jointech)
I’m not saying that there isn’t an issue with tank (or the dashboard) whatsoever. If you’re seeing an actual failed request in the dashboard media uploader then something went wrong.
What I’m saying is that it’s just highly unlikely that the 504 and 404 errors on the /tank/api/v1/files/:fileId/import endpoint have anything to do with that and I’m trying to save us some time because IMHO it takes us in the wrong direction of finding the actual culprit. The 504 and 404 errors are expected and handled because they are a necessary consequence of a polling-based architecture. There has to be a read-timeout in NGINX. We can prolong it, but it will always be there and so we will always have to have some logic in the dashboard that knows how to address it.
As expected the issue was entirely unrelated to the 504 errors:
There was a rather arbitrary default client timeout configured in the dashboard code. The dashboard simply aborted the process when the upload phase took more then 5 minutes. I’ve changed it and the upload now works for me. Please test it from your side @david.
Another try, with slightly different logs (compare GET & PATCH in contrast to the previous try).
Also noted, that before the upload failed at 100% of the normalizing stage. After your last commits, the upload fails at the beginning of the normalizing stage.
My import on dashboard.aura.radio worked yesterday, so we definitely got a new/different issue.
But the 409 is a good starting point. There’s basically only two type of requests that return 409 errors. The authentication mechanism and the flow upload endpoint. I didn’t touch the flow uploader recently, but I will investigate it in the evening.
@david I can’t reproduce this error. The upload seems to work fine for me. I’m using the exact same file. Can you share more information on what you’re doing, what browser you’ve tested (including version and operating system), and if there is anything else you’re doing simultaneously?
On second look at the request and error messages this part might actually be a bug in tank (or at least undefined behavior).
@eigenwijsje: In the screenshot from @david you can see that the the dashboard makes a GET request to the /files/:fileId/import endpoint with a waitFor=done query. The server responds with a 404. The dashboard treats a 404 along with a waitFor=done query on this endpoint as an indication that the import job has finished, because if there is no import job for this file (hence the 404) then the import must have finished. The PATCH request that is sent afterwards modifies the files title, because that can’t be set when creating the file. Tank sending a 404 for the import query and then responding with a 409 error and the file import is not done should not happen.
The only thing that comes to mind that might explain this is a some kind of race condition. I don’t know how tank handles uploads but if upload and normalization are two different import jobs and there is a short time frame in between them with no import job running for a file then a request to the endpoint might yield a 404 even though normalization has not yet finished. Basically:
dashboard uploading to tank
dashboard waiting for import to finish
tank finishes file upload job
dashboard triggers post-import request
tank starts normalization job
tank handles dashboard post-import request, sees running normalization job and rejects it with a 409
If this description is accurate I see two things we should address:
it should be possible to set the title in the request that creates the file object (just to avoid that PATCH request that introduces a unnecessary source of errors in the file upload handling)
tank must not return a 404 on the import endpoint if another import job is scheduled to run after one of the import jobs has finished.
The status code 409 Conflict can be returned from a total of eleven errors from the Flow JS upload, the store (the database) and the importer (fetch, convert and normalize).
I never saw these errors, but from what I see above, it only gives the status code without any details to the error causing it. I’ll add the details to the responses.
it should be possible to set the title in the request that creates the file object (just to avoid that PATCH request that introduces a unnecessary source of errors in the file upload handling)
I guess we could add a description field to the file, leave the metadata untouched that we get from the file itself after the upload to avoid confusion, and add the field to the request.
tank must not return a 404 on the import endpoint if another import job is scheduled to run after one of the import jobs has finished.
Do you mean any other part of the import for this one file (format conversion, normalization)?
I can look into this.
Technically from what I remember, the whole process is one job that goes through different job states.