Can not finish fitting model
Greetings!
I have send a request to my endpoint to fit the dataset. It starts to fit but can not finish.
I was waiting for more than an hour but it is still in process. Is it ok, that fitting takes so much time?
Seems like there could be a mistake somewhere, but I can not understand where to find.
Comments
Made a getinfo.txt file. Maybe it could help.
Have found some info in the documentation. According to this chapter I was managing to run fit locally. After calling:
$ om shell
I was wondering if I can do something with fit. So I called:
Then I found this chapter and done the following:
I was waiting for about 15 min, but got no response.
I interrupted the process and got some Tracebacks:
Greetings!
I fixed the issue.
It was related to the insufficient configs in the docker-compose file. I changed them, rebuild everything and it works.
I was waiting for more than an hour but it is still in process. Is it ok, that fitting takes so much time?
omega|ml does not add substantial time to the fitting process of a model - essentially the fit process takes the same amount of time on the runtime as it does locally (it runs the same code).
In general the example you give should work out of the box. One potential problem could be a mismatch in the pandas or the scikit-learn versions of the client where you are submitting vs. the versions in the runtime worker. However this would typically trigger an Exception, not an endless wait.
got some Tracebacks:
The tracebacks are from the networking stack. This looks like runtime has not responded, indicating a problem either with the broker (rabbitmq) or in the runtime worker itself.
From the getinfo logs I can see that you are running the open source stack in local docker-compose. This means you can directly peek into the worker logs. If the rabbitmq broker works ok, this will show the reception of the omega_fit task message and any errors that result from it. If the broker does not work ok, it will show errors related to that.
To debug, a few more things to try:
Thank you for the response!
Glad you were able to fix it!
It was related to the insufficient configs in the docker-compose file. I changed them, rebuild everything and it works.
Could you share an example of the config here or perhaps file an issue at https://github.com/omegaml/omegaml/issues, please? We are most interested to improve resilience to such problems and provide better guidance to users. Thank you!