DB Housekeeping

Hello,

for sizing of our systems we use very often a load tests and we noticed, that hydra tables become very large if we don’t remove data. We speak mostly about these tables:

hydra_oauth2_access
hydra_oauth2_authentication_request
hydra_oauth2_authentication_request_handled
hydra_oauth2_authentication_session
hydra_oauth2_code
hydra_oauth2_consent_request
hydra_oauth2_consent_request_handled
hydra_oauth2_refresh

Known way to do some housekeeping is by using of /oauth2/flush. But it handles only hydra_oauth2_access table.

In the issue https://github.com/ory/hydra/issues/1574 aeneasr wrote on 13 August

Clean up should now be more or less automatic for most flows. We also introduced better foreign key checks since then. There might still be stale data but it definitely shouldn’t be too much. If you do find leaky tables in the latest version, please do report here :slight_smile:

This is why we thought after upgrade to 1.7.4 the cleaning will work automatically. But I see with this version we have the same problem as before. /oauth2/flush handles only hydra_oauth2_access. Did I misunderstand that?

I can imagine cleaning e.g. hydra_oauth2_refresh after 30 days because our refresh token are valid for 30 days but what is about other tables?

Is there any official proper way to do some housekeeping considering all/most tables?

Thank you very much!

Hi there, thank you for the question - looks like we need to do some improvement here! What would be perfect is if you could create a reproducible GitHub repo with all the ways these tables get accumulated as we could use that as a basis for testing the implementation!

Hello,

just for your information:

in our load tests we used only the authorization code flow. That was the reason why so many tables were filled up with a lot (thousend and thousend of rows) of data. After using refresh token flow we can see that every table was filled up with 2 - 3 hundred of rows - not more. Only hydra_oauth2_access gets a lot of data but there is a flush concept for it. So we don’t have the problem any more. Of cause you could run in the same problem in 2 - 3 years in case you don’t clean the database. But I would say if you set the TTL of your refresh token to certain time (e.g. 90 days) you can clean your data by considering this period without any problem. The problem you could run in is the setup with refresh token without ttl but I don’t know if such concept of auth is used very often.

Thank you