The engineering team at Tata 1mg was facing an unusual behavior with the background server where we were using Sidekiq to process the jobs asynchronously. The memory on the server was reaching nearly 100% in about a few days. The pattern was observed at a gap of a cycle of 10–12 days. This needed constant monitoring and a restart of the Sidekiq process manually every time the memory exceeded 80%.We conducted research on the problem and did a root cause analysis, which yielded some specific pattern that was being repeated and causing the issue.
What we observed:
After checking multiple options and memory graphs, we observed that there was an identical pattern getting repeated and the memory was getting increased abnormally during a specific time, repeatedly. This was due to a periodic job running at that time which was scheduled in the Sidekiq scheduler to run after every 3 days.We looked into the Sidekiq worker responsible for that periodic job and figured out that there was one piece of code that was memory intensive and needed to perform the operations on a large number of records. This worker was consuming around 1 GB of memory which was much less than our server’s capacity i.e 8 GB. Still, the whole memory was being utilized.We were expecting that the memory would be utilized once the Sidekiq worker was executed. But the memory was increasing every time when a different thread executed this job and was not getting released.
Steps to reproduce:
We tried to execute that particular job and observed the memory graph. We had 15 Sidekiq threads which were running different jobs concurrently. To confirm the issue, we performed this job with several concurrent threads. Below is the explanation and observations for the same:With 15 concurrent threads, the memory was increased by almost 80–90% of single execution(~1 GB).Sharing some statistics after running the worker:Scenario 1: With 15 concurrent threadsFirst Run: 1051 MbSecond Run: 2000 MbThird Run: 2874 MbForth Run: 3692 MbFifth Run: 4412 MbSixth Run: 5204 Mband so on…Our server got hanged after the 8th-9th run, and Sidekiq got killed because of the memory.Scenario 2: With two concurrent threadsFirst Run: 1059 MbSecond Run: 1793 MbThird Run: 1801 MbForth Run: 1873 MbFifth Run: 1906 MbThe maximum memory consumed was about 180% to 200% of single execution(~1GB), indicating the increase corresponding to the number of threads being executed at once. To confirm, we performed another execution with three threads and later with five threads, which yielded similar observations.Scenario 3: With one concurrent threadFirst Run: 1085 MbSecond Run: 1179 MbThird Run: 1181 MbForth Run: 1204 MbFifth Run: 1216 MbSixth Run: 1216 MbThe memory was almost constant and did not increase after each run.This suggests each thread of the Sidekiq process holds the memory separately and does not share it with other threads. And because we had 15 concurrent threads, multiple workers required almost 12–15 GB of memory to perform this task. Hence, we were receiving memory alerts at regular intervals after running this task around 6–7 times.
Solution:
After considering multiple approaches for this use case, we finally decided to run this job via Crontab by converting it into a Rake task which could be executed as a Ruby process. As the Ruby process got completed after the Rake task was finished, the memory got released immediately. It resolved the issue, and we stopped getting memory alerts irrespective of running this task anytime.
Conclusion:
While executing some jobs via Sidekiq, we should keep in mind how many concurrent threads are configured and if there is any memory-intensive process. We should always use smaller memory operations wherever possible. This is a common problem with Sidekiq. People have implemented restart solutions also after some regular intervals to release the memory Sidekiq workers hold.For reference, Gitlab has also implemented Sidekiq memory killer to overcome such memory issues in their application.
How We Solved Sidekiq Memory Issue In Rails Application was originally published in Tata 1mg Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.
- Log in to post comments
- 3 views