Use Q-Learning to Assign Workers, Balance Machine Use, and Boost Factory Productivity
Mikey Tabak, PhD • 21 Aug 2025
In this blog post, we demonstrate how reinforcement learning can be used to optimize workforce scheduling in a factory setting. Workers have different skills, instruments require certain skillsets, raw materials are limited, and downtime (like calibration) can disrupt operations. We'll simulate this environment and train a Q-learning agent to assign workers to instruments hour by hour, aiming to maximize productivity and minimize inefficiencies like idle time, skill mismatches, or resource shortages. Learn more about Reinforcement Learning.
from scheduling.scheduling_utils import WorkforceScheduler
import random
We define the number of workers, the total skills needed across instruments, and a daily work schedule. Each instrument will later be given skill requirements.
instrument_names = ["NMR", "GC", "HPLC", "FTIR", "Mass Spec"]
scheduler = WorkforceScheduler(
num_workers=8,
num_machines=len(instrument_names),
num_skills=2, # The total number of distinct skills in the factory's workforce system. Each instrument requires one skill
day_length=10, # number of work hours in the day
max_workers=4 # the maximum number of workers wanted on a instrument
)
Compatibility depends on their skill set and each instrument's requirements. Depending on simulation settings, some workers may not have the necessary skills to operate any instruments
compatibility = scheduler.get_worker_machine_compatibility()
for worker_id, instruments in compatibility.items():
names = [instrument_names[mid] for mid in instruments]
print(f"Worker {worker_id} can work on: {names}")
Worker 0 can work on: ['NMR', 'GC', 'HPLC', 'FTIR'] Worker 1 can work on: ['NMR', 'GC', 'HPLC', 'FTIR'] Worker 2 can work on: [] Worker 3 can work on: ['NMR', 'GC', 'HPLC', 'FTIR', 'Mass Spec'] Worker 4 can work on: [] Worker 5 can work on: ['NMR', 'GC', 'HPLC', 'FTIR', 'Mass Spec'] Worker 6 can work on: ['Mass Spec'] Worker 7 can work on: []
Q-learning is a value-based reinforcement learning algorithm. The agent learns to map states (the current assignment and factory state) to actions (worker-to-instrument mappings) in a way that maximizes long-term reward. Rewards are computed from productivity, idle time, skill mismatch penalties, raw material depletion, QA backlog accumulation, and calibration downtime.
This training loop runs over multiple episodes (simulated days), allowing the agent to explore and improve its scheduling policy.
training_rewards, best_q_table = scheduler.train_q_learning(episodes=1000)
The agent assigns workers to instruments step by step using learned policies.
At each time step (1 hour), the agent chooses the best action (assignment) based on learned Q-values.
scheduler.q_table = best_q_table # replace live Q-table with best snapshot
scheduler.reset()
done = False
while not done:
state = scheduler._get_state()
q_vals = scheduler.q_table[state]
if q_vals:
action, _ = max(q_vals.items(), key=lambda x: x[1])
worker_machine_map = dict(action)
else:
worker_machine_map = {
wid: random.choice(list(scheduler.machines.keys()))
for wid in scheduler.workers
}
_, _, done, _ = scheduler.step_all(worker_machine_map)
This Gantt-style chart shows which workers were assigned to which instruments over time, including maintenance time. Sometimes workers have no assignment, this is based on needs and skill levels available.
scheduler.visualize_schedule(machine_names=instrument_names)
Some workers are not assigned to any machines throughout the day. These workers did not meet the sill requirements for any machine, as can be seen in the section Display which instruments each worker is compatible with
The numbers inside each hour block are the worker ID's for each of the workers on that instrument
scheduler.visualize_machine_activity(machine_names=instrument_names)
In this simulation, raw materials randomly become available throughout the day. In a real system, we would also have a model calibrated to optimize material ordering (example here)
scheduler.visualize_resources()
As instruments complete tasks throughout the day, many of those tasks require follow-up work by a quality assurance (QA) team. In this simulation, we track the accumulation of unfinished QA work in the form of a QA backlog.
Each machine contributes a small amount of QA time (called qa_delay) whenever it completes a task. This reflects the time QA personnel must spend checking or validating completed work. Meanwhile, the QA team can process a fixed amount of work per hour (0.5 hours of QA time per time step). If machine output exceeds QA capacity, the backlog grows. If the QA team keeps up, the backlog shrinks.
This visualization helps highlight how workforce scheduling decisions impact not only direct task completion, but also downstream bottlenecks like quality control.
scheduler.visualize_qa_backlog()
This simulation demonstrates how reinforcement learning, and Q-learning specifically, can be applied to complex scheduling problems involving multiple interacting constraints. We modeled realistic production dynamics like skill compatibility, resource constraints, instrument downtime, and QA bottlenecks.
The Q-learning agent was able to discover a scheduling policy that improves over time —minimizing penalties and optimizing throughput.
This approach is extensible to real-world production scheduling, predictive maintenance, and labor optimization in factories, labs, and other resource-constrained environments.
From simulating complex factory dynamics to building AI agents that adapt in real time, QSC helps businesses improve productivity, reduce inefficiencies, and make smarter use of people and machines. Develop AI-driven solutions to workforce management with QSC.
Contact Us