It is important to collect aggregated statistics so that management can analyze the data and make well-informed decisions. Sphere was retained by a client in the recruiting industry who, among other things, needed to collect the following data:
- Total shifts posted
- Total hours posted
- Total shifts worked
- Total hours worked
- Average length of shifts
- Average shifts per job
In addition, Sphere had to provide the possibility of “spoofing” the statistics to a certain point while the production database was being tested. Up to that point, the statistics should have been based not on the actual values from the database but on some customarily-entered data.
Since our client was using a Ruby on Rails application, we decided to write a statistics module in Ruby as well in order to leverage existing code and to simplify maintenance. We considered three implementation options:
| Option | Advantages | Disadvantages |
|
|
|
|
|
|
|
|
|
After presenting these three options to our client, we agreed to proceed with the third option.
Calculating & Storing Statistics
The Statistics::Employer model is used for calculating and storing statistical data. In its table, we store the date, employer’s foreign key, and all other values needed to calculate the statistics (total hours posted, total hours worked, number of applications, and average number of applications).
class CreateStatisticsEmployers < ActiveRecord::Migration
def change
create_table :statistics_employers do |t|
t.date :date, index: true, null: false
t.references :employer_profile, index: true, foreign_key: true, null: false
t.integer :jobs_count, default: 0, null: false
t.integer :shifts_posted_count, default: 0, null: false
t.decimal :hours_posted_count, default: 0, null: false
t.integer :shifts_worked_count, default: 0, null: false
t.decimal :hours_worked_count, default: 0, null: false
end
end
end
All formulas are contained in the model code:
module Statistics
class Employer < ActiveRecord::Base
belongs_to :employer_profile
class << self
def total_jobs
sum(:jobs_count)
end
def total_shifts_posted
sum(:shifts_posted_count)
end
def total_hours_posted
sum(:hours_posted_count)
end
def average_length_of_shift_posted
total_shifts_posted.zero? ? 0 : total_hours_posted / total_shifts_posted
end
def average_shifts_per_job
total_jobs.zero? ? 0 : total_shifts_posted.to_f / total_jobs
end
def total_shifts_worked
sum(:shifts_worked_count)
end
def total_hours_worked
sum(:hours_worked_count)
end
def average_length_of_shift_worked
total_shifts_worked.zero? ? 0 : total_hours_worked / total_shifts_worked
end
end
end
end
Methods are using ActiveRecord::Calculations, so they can be called up on any scope, which is useful for filtering by date/employer.
Collecting Statistics
The collection of statistics can be divided into three sub-tasks:
- What time to start daily statistics collection.
- What dates to collect statistics.
- How to collect statistics.
We have already answered the first question by choosing an embodiment (implementation variation). After analyzing the operation in the application, we found that the majority of shifts end before 2 a.m., so the statistics will be collected by schedule at 3 a.m.
Cron can be used to perform this task, but we decided to use clockwork gem:
# clock.rb
require 'clockwork'
require './config/boot'
require './config/environment'
module Clockwork
every(1.day, 'statistics.collect', at: '3:00') { Statistics::CollectJob.perform_later }
end
Statistics::CollectJob is background job, consistently resolving the remaining two sub-tasks:
# app/jobs/statistics/collect_job.rb
module Statistics
class CollectJob < ::BaseJob
def perform
Statistics::UntrackedDatesService.new.execute
Statistics::UpdateUntrackedService.new.execute
end
end
end
Statistics::UntrackedDatesService – detects which dates are untracked and creates UntrackedDate for them. It always counts yesterday as untracked, as well as dates on models with updated_at after midnight the previous day.
UntrackedDate is a very simple active record model that contains only date attribute with unique index.
As we collect statistics for jobs and shifts, we need to track Job and Shift model updates. Also, as we count jobs and posted shifts on each job’s creation date, and worked shifts at the end time of each shift, we assume Job#created_at‘s and JobShift#end_time‘s dates are untracked if those jobs/shifts changed from the time of the last statistics update.
So the full code of UntrackedDatesService is:
# app/services/statistics/untracked_dates_service.rb
module Statistics
class UntrackedDatesService
attr_reader :working_date
def initialize(current_date = Date.current)
@working_date = current_date - 1.day
end
def execute
untracked_dates.each do |date|
Rails.logger.info "Marked #{date} as untracked"
Statistics::UntrackedDate.find_or_create_by date: date
end
end
private
def untracked_dates
[
working_date,
*untracked_past_jobs_dates,
*untracked_past_shifts_posted_dates,
*untracked_past_shifts_worked_dates
].uniq
end
def untracked_past_jobs_dates
Job.where('updated_at >= ?', working_date.beginning_of_day)
.where('created_at < ?', working_date.beginning_of_day)
.pluck(:created_at).map(&:to_date)
end
def untracked_past_shifts_posted_dates
# similar logic
end
def untracked_past_shifts_worked_dates
# similar logic
end
end
end
The last subtask is performed by Statistics::UpdateUntrackedService. It takes each untracked date, deletes all statistics for that day, and calculates new statistics. (Calculation is incapsulated in yet another service, UpdateService.) We need to delete all previous statistics to keep the process simple. UpdateService does not know why we mark this date as untracked. It just does what it is supposed to do.
In UpdateService, we create groupings by employer and calculate aggregated stats. Then we bulk insert all stats into the Statistics::Employer model:
module Statistics
class UpdateService
attr_reader :date
def initialize(date)
@date = date
end
def execute
return if date < KEEP_LIVE_STATISTICS_FROM
Rails.logger.info "Updating statistics for #{date}"
Statistics::Employer.where(date: date).delete_all
Statistics::Employer.create employers_statistics
end
private
def employers_statistics
# Here we have a lot of ruby/rails/sql magic
# and return array of hashes for each statistics entry
# (i.e. grouped by date/employer_profile_id)
end
end
end
This is all we need to collect and calculate statistics, but we have one more step to cover.
Callbacks
Sometimes a model’s time attributes can be changed. In that case, we can only track that statistics were changed in the new date, but not in the old one (because we can’t know what the previous time was). So we have to use callbacks to track previous dates of previous timestamps.
Here is a Tracking module that could be required by any tracked model:
module Statistics
module Tracking
extend ActiveSupport::Concern
included do
cattr_accessor(:statistics_tracked_attributes) { Set.new }
after_update :check_statistics_tracked_attributes_have_changed
end
class_methods do
def track_attributes_for_statistics(*attributes)
statistics_tracked_attributes.merge attributes.map(&:to_s)
end
end
def check_statistics_tracked_attributes_have_changed
(statistics_tracked_attributes & changed).each do |attr|
before, after = changes[attr]
next unless date_changed?(before, after)
Statistics::UntrackedDate.mark before.to_date
end
end
def date_changed?(before, after)
before && (!after || before.to_date != after.to_date)
end
end
end
and it is included into Job
include Statistics::Tracking
track_attributes_for_statistics :created_at
and JobShift
include Statistics::Tracking
track_attributes_for_statistics :clocked_out_at
Now we have implemented full, easily expandable business logic to collect and output application statistics!
Output
Finally, we need all collected data to output. Since we use ActiveAdmin, I will show ARB code snippets and the screenshots it outputs.
First, we need filter form:
form_for search, url: admin_statistics_employer_path, method: 'post' do |f|
f.text_field :employer_profile_id
f.text_field :from , class: 'datepicker', 'data-datepicker-options' => '{"maxDate": "-1d"}'
span '-'
f.text_field :to, class: 'datepicker'
f.submit 'Filter'
end
search here is a form object that takes params[:search] and returns scoped Statistics::Employer.where(date: from..to).
We can output total statistics by the period:
table do
thead do
tr do
th :stat
th :value, style: 'text-align: right'
end
end
tbody do
%w(total_jobs total_shifts_posted total_hours_posted
average_length_of_shift_posted average_shifts_per_job
total_shifts_worked total_hours_worked
average_length_of_shift_worked).each do |stat_name|
tr do
td stat_name.titleize
td number_with_delimiter(stats.public_send(stat_name).round(1)), style: 'text-align: right'
end
end
end
end
We can output monthly breakdown of all these stats, using chartkick gem:
h3 'Shifts'
div line_chart(
[
{ name: 'posted', data: stats.group_by_month(:date).sum(:shifts_posted_count) },
{ name: 'worked', data: stats.group_by_month(:date).sum(:shifts_worked_count) }
]
)
h3 'Hours'
div line_chart(
[
{ name: 'posted', data: stats.group_by_month(:date).sum(:hours_posted_count) },
{ name: 'worked', data: stats.group_by_month(:date).sum(:hours_worked_count) }
]
)
Summary
We would like to emphasize the following:
- The “spoofing” requirement is implemented using a constant Statistics::KEEP_LIVE_STATISTICS_FROM. (Did you noticed it in the code above?) The process of forming and loading made-up statistics prior to this date is beyond the scope of this article.
- Prepopulating the statistics with the existing data is performed with a straightforward rake task – just take each date application worked and pass it to UpdateService.
- In the real statistics, there are some more complex metrics, like breakdown of job roles. We used Postgresql hstore columns for storing it, but this topic is also beyond the scope of this article.




