Test for the project using OmegaML

edited March 2021 in Ask anything

Greetings!


I was preparing some tests for endpoints inside our Project, which is using OmegaML.

In order to test the flow of creating or getting the datasets from the DB, it would be nice to fake the connection to the MongoDB - to not use our original local DB.

As all of such requests are made through the OmagaML, like:

om.datasets.put(...)
om.datasets.get(dataset_store_name)

I was wondering, what is the best approach to handle it.

The main question is:

How can I provide configs to Omega(), like OMEGA_MONGO_URL, to link my fake connected MondoBD and not using the original OmagaML connection?

In order to make a call like:

om.datasets.get(dataset_store_name)

to my fake DB.


Best regards,

Daria

Comments

  • How can I provide configs to Omega(), like OMEGA_MONGO_URL,

    Most settings can be updated using the corresponding environment variable as per the documentation.

    A as of yet undocumented feature is to use the config.yml file, where you can override any setting post initialising from other sources using the OMEGA_USER_EXTENSIONS :

    # $CWD/config.yml
    OMEGA_USER_EXTENSIONS:
      OMEGA_MONGO_URL: 'mongodb://...'
    

    Note that updating the mongodb connection must be consistent between a client program and the runtime, i.e. if you use say a local MongoDB on your client, while the the runtime uses a different MongoDB, the runtime will not be able to read the objects stored in your local MongoDB.

    Thus a better approach to tracing requests to the mongodb may be to use the pymongo built-in monitoring. You can attach a monitor to the omegaml mongodb instance as follows:

    # before accessing om.models/datasets/jobs for the first time
    from pymongo.event_loggers import CommandLogger
    om.defaults.OMEGA_MONGO_SSL_KWARGS['event_listeners'] = [CommandLogger()]
    

    This will attach the CommandLogger to the mongodb connection used by omegaml (note this is always on the client, not in the runtime). For example:

    [1] om.datasets.list('xyz')
    Command saslStart with request id 1681692777 started on server ('mymongodb', 27017)
    Command saslStart with request id 1681692777 on server ('mymongodb', 27017) succeeded in 13964 microseconds
    Command saslContinue with request id 1714636915 started on server ('mymongodb, 27017)
    ....
    Command find with request id 596516649 started on server ('mymongodb', 27017)
    Command find with request id 596516649 on server ('mymongodb', 27017) succeeded in 13999 microsecond
    
  • edited April 2021

    Hello!

    Thank you for the reply!

    Actually, I do no need to track the requests to MongoDB. Currently, I need just to write pytests to test endpoints inside our project.

    Imagine that we have an endpoint where the User can upload a dataset. Inside this endpoint, we check the User in our PostgresDB (as a main DB for the Django project), create metadata for the dataset, and store it with the dataset as:

    om.dataset.put(metadata_store, metadata_store_name, append=False)  <--- for metadata
    om.datasets.put(data_frame, dataset_store_name, append=False)  <--- for original dataset
    


    Currently, I am writing a test to test this upload flow using pytest. It looks like this:

    pytestmark = pytest.mark.django_db
    
    ...
    
    class TestPutDatasetView:
        def test_put_dataset(self, api_client_authenticated):
            dataset = open(TEST_ORIGINAL_DATASET_PATH, 'rb')
            url = reverse('models:put_dataset')
            data = {
                'dataset': dataset
            }
            response = api_client_authenticated.put(url, data)
    
            assert response.status_code == 200
    

    The problem is - It stores the dataset in the real MongoDB, that was connected while the project initialization (locally - while building the Docker containers).

    As I do not want to use the real DB, it would be great to fake the connection to Mongo. But this connection is established by the OmegaML while initializing the project.

    You said:

    Note that updating the mongodb connection must be consistent between a client program and the runtime, i.e. if you use say a local MongoDB on your client, while the the runtime uses a different MongoDB, the runtime will not be able to read the objects stored in your local MongoDB.


    It is not pretty clear how to fake the connection to MongoDB and run the runtime with this connection just for test purposes. To call all operations, like:

    om.datasels.list() | .get() | .put()

    om.models. fit() | .predict()

    inside tests. So, the main flow will be not changed.


    Could you, please, provide any advice concerning this point?


    P.S. As for now, I just delete all the created datasets in the 'tearDown, after all the tests are performed. But it seems to be not the best option.

    Best Regards,

    Daria

Sign In or Register to comment.