
Explanation of config option to simplify the scheduler

August 27, 2009
commit d61f9761e3037b135e1b82e77a354f10794393e6

2.6.19.4-rt19, modified in linux-2.6.29.y-BRANCH_SS-RT


The goal of the scheduler simplification config option is to reduce the
latency and overhead of the real timer scheduler and the fair scheduler.

The simplification is controlled by the config option CONFIG_EJ_SIMPLIFY_SCHED.


================================================================================
Terminology

A "fair task" is a task whose scheduling policy is one of:

   SCHED_OTHER
   SCHED_BATCH
   SCHED_IDLE

A "real-time task" or "RT task" is a task whose scheduling policy is one of:

   SCHED_FIFO
   SCHED_RR

The "fair scheduler" is the portion of the scheduler that is controlling
fair tasks.  It is implemented mostly in kernel/sched_fair.c amd partly in
kernel/sched.c.

The "real-time scheduler" or "RT scheduler" is the portion of the scheduler
that is controlling real-time tasks.  It is implemented mostly in
kernel/sched_rt.c and partly in kernel/sched.c.

The "load" of a group of tasks is the number of tasks in the group, weighted
by the priority of each of the tasks.


================================================================================
Overview of results of enabling CONFIG_EJ_SIMPLIFY_SCHED


Load Balancing
--------------


--- Migration of real-time tasks is reduced.  This applies to both pulling
    tasks and pushing tasks.


--- In idle_balance(), do not directly pull tasks from other cpus.  Instead
    set this_rq->next_balance = next_balance so that balancing will occur by
    calling run_rebalance_domains():

      #ifdef CONFIG_EJ_LOAD_BALANCE_IN_TIMER_SOFTIRQ
         in scheduler_tick_in_timer()
      #else
         by SCHED_SOFTIRQ


--- No attempt is made to maximize power savings that is requested by
    CONFIG_SCHED_MC or CONFIG_SCHED_SMT.

    CONFIG_SCHED_MC is architecture dependent, currently available for
    390, sparc, and x86.

    CONFIG_SCHED_SMT is architecture dependent, currently available for
    ia64, mips, powerpc, x86, and sparc.


--- switched_to_rt() may be invoked by check_class_changed() which is invoked
    by task_setprio() (among other callers).  Pushing the task to another
    cpu if the current cpu is overload is removed.


--- The measurement of busyness of cpus for the purpose of load balancing
    of fair tasks is more simplistic.  The priority of tasks is not used to
    weight the load imposed by tasks.  The busyness is a measure of how much
    cpu time was available for non-real-time tasks in the previous jiffy.
    The amount of cpu used by real-time tasks is exagerated by 25%.  The
    exact formula is:

      tsk_power = left_power / number of fair tasks

      where

         "left_power" of a cpu is:
            duration of jiffy - (1.25 * RT task execution time in last jiffy)

         "RT task execution time in last jiffy" is actually an array of
         possible values, rq->rt.load_per_jiffy[].

         Each time update_cpu_load() is called, the array is updated with the
         current value of new_load, which is rq->rt.this_jiffy_runtime:

            load_per_jiffy[0] =                            new_load
            load_per_jiffy[1] = ( 1 * load_per_jiffy[1]) + new_load) /  2
            load_per_jiffy[2] = ( 3 * load_per_jiffy[2]) + new_load) /  4
            load_per_jiffy[3] = ( 7 * load_per_jiffy[3]) + new_load) /  8
            load_per_jiffy[4] = (15 * load_per_jiffy[4]) + new_load) / 16


         The element of load_per_jiffy[] to use is based on
         this_rq->idle_at_tick.

         If the cpu was idle at the last tick, then the array index is
         sd->idle_idx, otherwise it is sd->busy_idx.  These are initialized
         to the values in SD_CPU_INIT from include/linux/topology.h:

            .busy_idx = 2
            .idle_idx = 1

         and can be modified at run time via:

            /proc/sys/kernel/sched_domain/cpuX/domainX/busy_idx
            /proc/sys/kernel/sched_domain/cpuX/domainX/idle_idx

            for example:

               /proc/sys/kernel/sched_domain/cpu0/domain0/busy_idx
               /proc/sys/kernel/sched_domain/cpu0/domain0/idle_idx


--- Each occurance of fair scheduler load balancing is constrained to stop
    after a certain amount of work is completed.  The amount of work is
    defined as the number of tasks moved instead of the sum of the load of
    the tasks moved.  If the current cpu is not the busiest cpu, then the
    tasks are pulled to the current cpu.  The formula to calculate the maximum
    amount of work to complete is:

                            ( b_lp * ( b_nr + t_nr ) )
      nr_to_move  =  b_nr - ( ---------------------- )
                            (      b_lp + t_lp       )

      where:

         b_lp is the "left_power" of the busiest cpu
         b_nr is the number of fair tasks on the busiest cpu

         t_lp is the "left_power" of this cpu
         t_nr is the number of fair tasks on this cpu

         "left_power" of a cpu is:
            duration of jiffy - (1.25 * RT task execution time in last jiffy)

         "RT task execution time in last jiffy" is actually an array of
         possible values, rq->rt.load_per_jiffy[].  (See earlier description
         for more details.)


--- sysctl_sched_nr_migrate does not limit the maximum value of nr_to_move
    that is calculated for the fair scheduler load balancing.


--- If (duration of jiffy - (1.25 * RT task execution time in last jiffy)) <= 0
    for the current processor then load balancing will not attempt to pull any
    tasks to the current processor.


--- sched_exec() and sched_fork() choose the target cpu by calling
    sched_balance_self() which calls find_idlest_cpu().  find_idlest_cpu()
    uses the same data and calculations as the load balancing code, but always
    uses rq->rt.load_per_jiffy[1] as "RT task execution time in last jiffy".


--- try_to_wake_up() of a fair task chooses the target cpu by calling
    select_task_rq_fair() which calls find_idlest_cpu().  find_idlest_cpu()
    uses the same data and calculations as the load balancing code, but always
    uses rq->rt.load_per_jiffy[1] as "RT task execution time in last jiffy".

    The simplified select_task_rq_fair() does not check for load imbalance
    and does not do any load balancing.


--- active_load_balance() which is called by the migration thread does
    nothing (does not push running tasks off the busiest cpu onto idle
    cpus).  But the migration thread is still needed because there are
    other mechanisms that still put tasks on the rq->migration_queue.


--- migration_thread is not woken by load_balance_newidle() because
    load_balance_newidle() does not exist.


--- migration_thread is not woken by load_balance().


--- migration thread is woken only by set_cpu_allowed_ptr().
    migration_thread() in this case will not active load balance, but will
    migrate the thread that was enqueued by set_cpu_allowed_ptr().


--- idle_balance() is called from __schedule() when !rq->nr_running).
    idle_balance() is simply setting rq->next_balance = jiffies, not actually
    balancing.  So we are sacrificing an opportunity to balance on the newly
    idle cpu.

    The runq is locked while idle_balance() is called, so we have shortened
    the path while lock is held, but on an __idle__ cpu.


Real-Time Group Scheduling
--------------------------


--- Real-Time group scheduling is disabled.  (See
    Documentation/scheduler/sched-rt-group.txt for a description of group
    scheduling.)


--- The following control files do not exist:

      /proc/sys/kernel/sched_rt_period_us
      /proc/sys/kernel/sched_rt_runtime_us


--- Removed def_rt_bandwidth.rt_runtime_lock, and thus the cross-cpu
    contention for the lock.


--- The timer def_rt_bandwidth->rt_period_timer is initialized, and the timer
    function sched_rt_period_timer is invoked once.  After this single timer
    expiration it is not re-enabled.  The initialization could be removed with
    some ugly #ifdef's to prevent compile warnings.


Miscellaneous
-------------


--- Removed statistic rt.rt_nr_uninterruptible and this it is not reported in
    /proc/sched_debug


--- Some work is deferred to scheduler_tick_in_timer() if curr is a fair task
    and is not done at all if curr is an RT task or the idle task:

      update_rq_clock()
      update_cpu_load()

    scheduler_tick_in_timer() will be called by run_timer_softirq().  If curr
    is no longer the highest priority fair task when scheduler_tick_in_timer()
    executes then the fair task tick, task_tick_fair(), will be lost.

    If curr is a RT task with sched policy of SCHED_RR then try to lock rq.
    If the attempt to lock rq succeeded then execute the RT task tick,
    task_tick_rt().  This means that RT task tick might be lost.  If the
    tick is lost and the current task is SCHED_RR then the time slice will
    not occur.


--- The sched_rr_get_interval() syscall will return interval of zero for
    non-RT task.


--- In sched_slice(), the slice is not adjusted for load weight (nice level).
    The impact on callers of sched_slice() is:

      check_preempt_tick() will not adjust the ideal_runtime of curr when
      deciding whether to preempt curr.

      place_entity() will not adjust the slice added to the vruntime of a
      new task (initial == 1).


--- calc_delta_fair() does not scale delta by priority (since load weight
    (nice level) is not being used).

      wakeup_gran() does not scale the wakeup granularity by priority since
      calc_delta_fair() does not scale delta by priority.

      The result of wakeup_gran() is used to determine whether to preempt the
      currently running task when a new fair task is woken, so the priorities
      of curr and the woken task will not modify the wakeup granularity.
      (The wakeup granularity defaults to
      /proc/sys/kernel/sched_wakeup_granularity_ns, but can be reduced by
      adaptive_gran().)


--- __update_curr() will update curr->vruntime only when

      curr->delta_exec > sysctl_sched_min_granularity

    instead of everytime it is called.

    The value added to curr->vruntime is

       (delta_exec * (fair_nr_all / NR_CPUS)) + us2ns( nice value )

    instead of

       a priority weighted delta execution time.

    cfs_rq->min_vruntime is not moved forward.


--- Various scheduler statistics are not maintained.


--- print_cfs_stats() ignore leaf cfs_rq's.


================================================================================
Group Scheduling

   CONFIG_EJ_SIMPLIFY_SCHED can not be selected if any of the following
   config options are selected:

      CONFIG_GROUP_SCHED
      CONFIG_RT_GROUP_SCHED
      CONFIG_FAIR_GROUP_SCHED

   This removes most of the group scheduler overhead and latency.

   Even with all of those config options disabled, the scheduler default is
   to throttle real-time tasks to 95% of the cpu (this is described in
   Documentation/scheduler/sched-rt-group.txt).  Even if RT throttling is
   set to unlimited (echo -1 > /proc/sys/kernel/sched_rt_runtime_us), extra
   overhead normally still exists.

   CONFIG_EJ_SIMPLIFY_SCHED:

      - removes /proc/sys/kernel/sched_rt_period_us
      - removes /proc/sys/kernel/sched_rt_runtime_us
      - removes the overhead of sched_rt_runtime_exceeded()
      - sets the value of sched_rt_runtime_us to -1 (unlimited)


================================================================================
Modified files

kernel/sched.c
kernel/sched_debug.c
kernel/sched_fair.c
kernel/sched_rt.c
kernel/sysctl.c
include/linux/sched.h


================================================================================
In the following lists of functions, the functions are ordered in the same
order that they occur in the source files.


================================================================================
#ifdef CONFIG_EJ_SIMPLIFY_SCHED
Inline functions changed to have an empty body of "{}"

   rt_set_overload()
   rt_clear_overload()
   inc_rt_migration()
   dec_rt_migration()
   enqueue_pushable_task()
   dequeue_pushable_task()
   init_sched_rt_class()
   set_load_weight()
      nice values are ignored
   scheduler_tick_in_timer()
   sched_exec()


================================================================================
#ifdef CONFIG_EJ_SIMPLIFY_SCHED
Functions changed that are not easily classified as simplified or made
more complicated.

   set_cpus_allowed_rt()
      Do not update pushable task queue.
      Do not call update_rt_migration(), which manages rt_rq->overloaded.

   prio_changed_rt()
      Do not pull real-time tasks.

   update_cpu_load()
      Use
         rq->rt.load_per_jiffy[] and rq->rt.this_jiffy_runtime
      instead of
         rq->cpu_load[]          and rq->load.weight
      (rq->rt.this_jiffy_runtime is updated in put_prev_task_rt())
      Do not call calc_load_account_active()

   find_idlest_cpu()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         Return cpu with the lowest load (weighted by nice value).
         this_cpu is chosen in case of a tie of load.
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         tsk_power of a cpu is:
            (time unused by RT tasks in the last jiffy) / number of fair tasks
         Return cpu with the largest tsk_power, (cpu must be in
            p->cpus_allowed).
         this_cpu is chosen in case of a tie of tsk_power.

   sched_balance_self()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         Ignore parameter "flag".
         Ignore sched domains.
         Function becomes:
            return find_idlest_cpu(this_cpu, current);

   try_to_wake_up()
      Do not search for another cpu if p->rt.nr_cpus_allowed == 1

   load_balance_fair()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         Limit movement of tasks based on number of tasks moved instead of
         load of tasks moved:
            Input max_nr_move instead of max_load_move.
            Return nr_moved instead of load_moved.
         Do not use the weight of an individual task to determine whether to
            pull it.
         Ignore sysctl_sched_nr_migrate limit on number of tasks to move
            during a balance.


================================================================================
#ifdef CONFIG_EJ_SIMPLIFY_SCHED
Functions that do not exist

   rt_overloaded()
   update_rt_migration()
   next_prio()
   inc_rt_prio_smp()
   dec_rt_prio_smp()
   inc_rt_prio()
   dec_rt_prio()
   inc_rt_group()
   dec_rt_group()
   incr_rt_nr_uninterruptible()
   decr_rt_nr_uninterruptible()
   pick_rt_task()
   pick_next_highest_task_rt()
   pick_optimal_cpu()
   find_lowest_rq()
   find_lock_lowest_rq()
   has_pushable_tasks()
   pick_next_pushable_task()
   push_rt_task()
   push_rt_tasks()
   pull_rt_task()
   pre_schedule_rt()         rt_sched_class.pre_schedule
   needs_post_schedule_rt()  rt_sched_class.needs_post_schedule
   post_schedule_rt()        rt_sched_class.post_schedule
   task_wake_up_rt()         rt_sched_class.task_wakeup
   switched_from_rt()        rt_sched_class.switched_from
   start_rt_bandwidth()
   calc_delta_mine()
   update_load_add()
   update_load_sub()
   inc_cpu_load()
   dec_cpu_load()
   cpu_avg_load_per_task()
   double_lock_balance()
   weighted_cpuload()
   source_load()
   target_load()
   find_idlest_group()
   sched_migrate_task()
   balance_tasks()
   move_one_task()
      Not needed because called by active_load_balance()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED

   ---------------------------------------------------------------------
   This group not needed because they are called by find_sd_busiest_rq()

   group_first_cpu()
   get_sd_load_idx()
   init_sd_power_savings_stats()
   update_sd_power_savings_stats()
   check_power_save_busiest_group()
   update_sg_lb_stats()
   update_sd_lb_stats()
   fix_small_imbalance()
   calculate_imbalance()
   find_busiest_group()
   find_busiest_queue()
   ---------------------------------------------------------------------

   load_balance_newidle()
      not needed, was called by idle_balance()
   sched_rt_global_constraints()
      Not needed, /proc/sys/kernel/{sched_rt_period,sched_rt_runtime} removed
   sched_rt_handler()
      Not needed, /proc/sys/kernel/{sched_rt_period,sched_rt_runtime} removed
   update_stats_enqueue()
   update_stats_wait_end()
   update_stats_dequeue()
   enqueue_sleeper()
      updates schedstats
   wake_affine()
      Called by non-existant select_task_rq_fair()
   __load_balance_fair()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         Called by load_balance_fair()


================================================================================
#ifdef CONFIG_EJ_SIMPLIFY_SCHED
Functions simplified

   sched_rt_runtime_exceeded()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         return 0

   inc_rt_tasks()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         inc_rt_prio()
         inc_rt_migration()
         inc_rt_group()
            start_rt_bandwidth()
               // code to handle bandwidth limits on RT tasks.  This is
               // related to CONFIG_RT_GROUP_SCHED

   dec_rt_tasks()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         dec_rt_prio()
         dec_rt_migration()
         dec_rt_group()

   enqueue_task_rt()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         decr_rt_nr_uninterruptible()
         inc_cpu_load()

   dequeue_task_rt()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         incr_rt_nr_uninterruptible()
         dec_cpu_load()

   select_task_rq_rt()
      Lower cost calculation:
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         If task policy is not SCHED_FIFO or SCHED_RR, leave task on its cpu
         else put task on cpu with lowest rq->rt.load_per_jiffy[0]
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         #ifdef CONFIG_SNSC_RT_NO_PUSH_IF_SLEEP
            If current task is about to sleep, select this cpu
         else if task's cpu is not overloaded, leave task on that cpu
         else if current task is an RT task, put task on cpu from
         find_lowest_rq().
         else, leave task on its cpu.

   rq_online_rt()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
          if (rq->rt.overloaded)
            rt_set_overload(rq);
          cpupri_set(&rq->rd->cpupri, rq->cpu, rq->rt.highest_prio.curr);

   rq_offline_rt()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
          if (rq->rt.overloaded)
            rt_clear_overload(rq);
          cpupri_set(&rq->rd->cpupri, rq->cpu, CPUPRI_INVALID);

   switched_to_rt()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         if (rq->rt.overloaded && push_rt_task(rq) && rq != task_rq(p))
            check_resched = 0;

   set_task_cpu()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         Adjust fields for CONFIG_SCHEDSTATS
         Update migration counters

   move_tasks()
      Only do load_balance_fair() [do not do load_balance of other classes]

   find_sd_busiest_rq()
      No attempt is made to maximize power savings that is requested by
      CONFIG_SCHED_MC or CONFIG_SCHED_SMT.

      Ignores sched groups and sched domains.

      Sets *imbalanced = nr_to_move instead of load (weight) to move.
      (This will be used as input to load_balance_fair(), which is modified
      to use input of max_nr_move instead of max_load_move.)

      Busiest cpu has smallest:
         (amount of cpu not used by RT tasks) / number of cfs tasks running

   load_balance()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         Do not lock the run queues around move_tasks().  The queues will
         instead be locked in load_balance_fair(), which is called by
         move_tasks().

      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         if (active_balance)
            wake_up_process(busiest->migration_thread);

   idle_balance()
      Do not directly pull tasks from other cpus.
      Set rq->next_balance to jiffies so that run_rebalance_domains() will
      be called in scheduler_tick_in_timer() or by raising SCHED_SOFTIRQ.

   active_load_balance()
      Do not do anything (do not push running tasks off the busiest CPU
      onto idle CPUs).

   scheduler_tick()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED

      Some work is deferred to scheduler_tick_in_timer() if curr is a fair
      task (this work is not done at all if curr is an RT task or the idle
      task):
         update_rq_clock()
         update_cpu_load()
      scheduler_tick_in_timer() will be called by run_timer_softirq().  If curr
         is no longer the highest priority fair task when
         scheduler_tick_in_timer() executes then the fair task tick,
         task_tick_fair(), will be lost.

      if curr is a RT task
         if current sched policy is SCHED_RR then try to lock rq
         if attempt to lock rq succeeded
            curr->sched_class->task_tick()
            // This means that RT task tick, task_tick_rt(), might not occur
      else if curr is not the idle task
         save curr in per cpu variable "currs", to be used by
            scheduler_tick_in_timer()
         rq->idle_at_tick = 0
      else
         rq->idle_at_tick = 1

   pick_next_task()
      remove a fair sched class optimization

   preempt_schedule_irq()
      __schedule();
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         local_irq_disable();
         zzz why is it ok to ifdef out this disable?

   sched_rr_get_interval()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         This system call will return interval of zero for non-RT task.

   calc_delta_fair()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         Do not scale delta by the priority (since load weights are not
         being used).

   sched_slice()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         Do not adjust slice for nice level and cfs_rq->load

   __update_curr()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         Update curr->vruntime only when
            curr->delta_exec > sysctl_sched_min_granularity
         curr->vruntime is increased by:
            delta_exec * (fair_nr_all / NR_CPUS)
            us2ns( nice value )

   update_stats_wait_start()
      do nothing

   account_entity_enqueue()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         cfs->rq->rq->load   += se->load
         cfs_rq->task_weight += se->load
         add task to se->group_node

   account_entity_dequeue()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         cfs->rq->rq->load   -= se->load
         cfs_rq->task_weight -= se->load
         remove task from se->group_node

   enqueue_entity()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         Remove minor amount of statistics gathering.

   dequeue_entity()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         update some stats

   set_next_entity()
      #ifndef CONFIG_EJ_SIMPLIFY_SCHED
         update some stats

dequeue_task_fair()
   for_each_sched_entity() is
      for (; se; se = NULL)
   so code in #ifndef CONFIG_EJ_SIMPLIFY_SCHED
      is not needed
      This only saves a few instructions.

select_task_rq_fair()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         Check for imbalance
         possibly do load balancing.
         return either current cpu or previous cpu
      #else
         find_idlest_cpu(task_cpu(p))

   print_cfs_stats()
      ignore leaf cfs_rq's

   print_cfs_rq()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         cfs_rq->exec_clock does not exist
         cfs_rq->load.weight does not exist

   print_cpu()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         rt.rt_nr_uninterruptible does not exist

   struct ctl_table kern_table[]
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         sched_rt_period_us does not exist
         sched_rt_runtime_us does not exist


================================================================================
#ifdef CONFIG_EJ_SIMPLIFY_SCHED
Functions with added complexity

   pick_next_task_rt()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         if (!rq->rt.rt_exec_start)
            rq->rt.rt_exec_start = rq->clock;

   put_prev_task_rt()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         if (rq->rt.rt_exec_start) {
            delta = rq->clock - rq->rt.rt_exec_start;
            rq->rt.rt_exec_start = 0;
            rq->rt.this_jiffy_runtime += delta;

   wake_up_new_task()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         fair_nr_all++

   finish_task_switch()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         fair_nr_all--

   __sched_setscheduler()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
          if changing sched class
            fair_nr_all-- or fair_nr_all++

   __enqueue_entity()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         clamp se->vruntime to range of
            rq_of(cfs_rq)->clock - ms2ns(100)
            rq_of(cfs_rq)->clock + ms2ns(100)

   entity_tick()
      #ifdef CONFIG_EJ_SIMPLIFY_SCHED
         Do not check whether curr should be preempted if curr is not the
         current fair class task.

         Get here via:
            scheduler_tick_in_timer()
               if (task_of(se) == curr)
                  fair_sched_class.task_tick(this_rq, curr, 0)
                  task_tick_fair[rq, curr, queued]
                     se = &curr->se;
                     cfs_rq = cfs_rq_of(se);
                     entity_tick(cfs_rq, se, queued)
                     entity_tick[cfs_rq, curr, queued]
                        if (rq_of(cfs_rq)->curr == task_of(curr))
         (Other possible way to get here is from hrtick())


================================================================================
#ifdef CONFIG_EJ_SIMPLIFY_SCHED
New functions

   cpu_left_power()
      Time unused by RT tasks in the last jiffy.  "Time used by RT tasks" is
         actually multiplied by 1.25 to give RT tasks more impact.

   per_tsk_power()
      Divide power by number of fair tasks running.

