Skip to content
D-LIT Logo

Techniques & Models

Predictive modeling, retention analysis, talent segmentation, and workforce planning.

By D-LIT Team

The analytical techniques covered in this article represent the full range of methods used in mature people analytics programs - from statistical fundamentals that every HR analyst should understand, to predictive models that require data science capability, to two analytical domains that most HR analytics resources overlook entirely: manager effectiveness measurement and HR data quality management.

Each section describes what the technique is, what business question it answers, how to implement it, and the common pitfalls that produce misleading results.


Predictive Attrition Modeling

Predictive attrition modeling uses historical employee data to identify current employees who are at elevated risk of voluntary separation in the next 3-12 months. The business case is compelling: if a model correctly identifies 60% of employees who will leave, and targeted interventions retain half of them, the organization saves the replacement cost for each retained employee - typically 50-200% of annual salary per person.

How the model works:

A classification model is trained on historical employee records. Each record is labeled with whether that employee voluntarily separated within the prediction window (typically 12 months). The features used as predictors include:

  • Demographic and tenure factors: Tenure at current company, tenure in current role, age (where legally permissible), level in the organization.
  • Compensation factors: Time since last pay increase, ratio of current pay to midpoint of pay band, recent bonus as a percentage of target, equity cliff timing.
  • Career development signals: Time since last promotion, number of job applications to internal postings, goal achievement rate, manager assessment scores.
  • Engagement signals: eNPS score from most recent survey, survey participation rate, participation in optional company events or learning.
  • Manager and team signals: Manager tenure, manager’s team turnover rate, team size changes (downsizing as a signal of organizational instability).
  • Behavioral signals (where available): Building badge swipe patterns, calendar density changes, email response time changes - though these require careful legal review and employee communication before use.

Model architecture:

Logistic regression is the appropriate starting point. It is interpretable - the coefficients directly indicate the direction and magnitude of each feature’s effect on attrition probability - and it performs well when training data is limited (most organizations have relatively small historical employee populations). Gradient boosting methods (XGBoost, LightGBM) improve predictive accuracy for organizations with sufficient training data (typically 2,000+ historical observations) but sacrifice interpretability.

Evaluation criteria: Area Under the ROC Curve (AUC) measures overall discrimination. Precision at the top decile (the fraction of the model’s highest-risk predictions that actually attrited) is the practically important metric - intervention programs are typically delivered to the top 10-20% of risk-scored employees, so accuracy at the tail matters more than overall accuracy.

The intervention design matters as much as the model: A model with 70% AUC paired with a well-designed retention intervention program outperforms a model with 85% AUC paired with no structured response. The model is a triage tool. The human conversation between manager and employee is where retention actually happens.

Critical pitfall: Selection bias in labeled data. Employees who left are labeled as positive examples. But employees who stayed may have been equally likely to leave - the difference is that someone intervened, or they got a better offer and turned it down, or they were planning to leave but life circumstances changed. This means the model may be learning risk factors that correlate with who got retained rather than who wanted to leave. Validation on a held-out test set that preserves temporal order (train on years 1-3, test on year 4) is essential.


Workforce Planning and Headcount Forecasting

Workforce planning connects business strategy to talent supply and demand. It answers the question: given where the business is going, what workforce do we need - in terms of size, skills, and structure - over the next 1-3 years, and what is the gap between that future-state requirement and the current workforce trajectory?

The demand forecast:

Demand-side forecasting starts with the business plan. Revenue projections, product roadmap commitments, market expansion plans, and operational targets each imply headcount requirements. The analytical work is translating business metrics into workforce requirements:

  • Revenue per employee benchmarks from the current year, adjusted for anticipated productivity improvements, produce a headcount estimate: Required Headcount = Projected Revenue / Target Revenue per Employee.
  • Ratio analysis maps operational metrics to headcount: a support organization that handles 500 tickets per agent per month requires headcount proportional to projected ticket volume.
  • Driver-based models identify the organizational drivers (customers, products, geographic markets, store count) that historically predict department-level headcount, then project those drivers forward.

The supply forecast:

Supply-side forecasting projects the future workforce size and composition assuming no additional hiring. Starting from current headcount, subtract projected attrition (using the attrition model) and planned exits (known retirement-eligible employees, fixed-term contract expirations), add expected internal transfers, and account for planned promotions that change level composition but not headcount.

Projected Headcount (t+1) = Current Headcount - Projected Attrition - Planned Exits + Planned Promotions (from below) + Inbound Transfers - Outbound Transfers

The gap and the action plan:

The gap between demand forecast and supply forecast defines the hiring, development, and restructuring agenda. A gap that consists primarily of skills the organization does not currently have is a different problem than a gap caused by projected growth in headcount for roles the organization hires regularly. The former requires build (skills development), buy (external hiring), or borrow (contingent workers, partnerships) decisions; the latter requires scaling existing recruiting pipelines.

Scenario planning: Workforce plans should be developed at multiple demand scenarios (base case, upside, downside) and tested against supply shocks (higher-than-expected attrition, talent market tightening). The value of scenario planning is not in predicting which scenario will occur but in pre-thinking the organizational response to each, so that when conditions change, the response is faster.


Pay Equity Analysis

Pay equity analysis examines whether employees in comparable roles are compensated equitably, with equity defined as the absence of pay differences that cannot be explained by legitimate business factors.

The two-level analysis:

Unadjusted analysis calculates median pay by demographic group without controlling for any factors. This surfaces structural issues: if women are paid less on average than men, is that because they are concentrated in lower-paying job families, lower levels, or lower-paying geographic markets? Or is it because they are paid less within the same jobs at the same levels?

Adjusted analysis answers the within-job question. It controls for legitimate pay determinants and measures the residual gap.

Implementing the regression:

Ordinary least squares regression with pay (log-transformed for better statistical behavior) as the dependent variable:

ln(Pay) = β₀ + β₁(DemographicGroup) + β₂(JobLevel) + β₃(JobFamily) + β₄(Tenure) + β₅(Performance) + β₆(Geography) + ε

The coefficient β₁ on the demographic group variable is the adjusted pay gap estimate, expressed as a percentage when the equation uses log pay. A value of -0.03 indicates a 3% unexplained pay gap for the group coded 1 relative to the reference group.

Which factors are legitimate controls: This is the most consequential analytical decision in pay equity analysis. Job family and level are always included. Geographic location is standard where pay bands differ by geography. Performance ratings are typically included but are themselves subject to potential bias - controlling for a biased performance rating may mask a combined evaluation and compensation inequity. Seniority (years at company) is included. Prior compensation history is increasingly excluded as a control, both because of legal restrictions on using salary history in hiring negotiations in many jurisdictions and because using prior pay as a predictor can perpetuate historical inequities.

The remediation process: Unexplained pay gaps above a threshold (typically 2-3%) should trigger case-by-case review of the employees contributing to the gap, not blanket adjustments. Some gaps reflect legitimate individual factors that were not captured in the model. Others reflect decisions that cannot be justified and should be corrected through targeted pay adjustments in the next compensation cycle.

Legal coordination: Pay equity analyses conducted internally as part of a remediation process may be protected by attorney-client privilege when structured correctly. Coordinate with employment law counsel before beginning an analysis that may produce findings requiring remediation.


DEI Analytics

DEI analytics - diversity, equity, and inclusion - encompasses three distinct measurement domains that are often conflated.

Diversity measurement tracks representation. The analytical framework most useful for strategic decision-making is the representation pipeline: how does demographic representation change at each level of the organizational hierarchy?

Level 1 (Individual Contributor): 52% women, 48% men
Level 2 (Senior IC): 45% women, 55% men
Level 3 (Manager): 36% women, 64% men
Level 4 (Director): 28% women, 72% men
Level 5 (VP+): 19% women, 81% men

The representation pipeline immediately identifies where representation breaks down and focuses the intervention. In the example above, the most significant drop occurs at the IC-to-Manager transition - which focuses attention on promotion equity, the manager selection process, and whether the manager pipeline development programs are equitably accessible.

Flow rate analysis complements stock representation. Hiring rate by demographic group (what share of new hires are from each group), promotion rate by demographic group, and attrition rate by demographic group together explain why the representation numbers look as they do and what levers are available to change them.

Representation Change = Hire Rate Effect + Promotion Rate Effect - Attrition Rate Effect

If women are promoted at lower rates than men at the Manager-to-Director transition, that drives the representation drop at Director - even if women are hired at equitable rates and retained at comparable rates.

Equity measurement examines process fairness: are advancement, compensation, and development opportunities distributed equitably? Pay equity analysis (above) is one dimension. Performance rating distributions by demographic group, promotion nomination rates, access to high-visibility projects, and inclusion in leadership development programs are additional equity measures.

Inclusion measurement captures the experience of belonging and psychological safety. This is the domain of engagement surveys, with specific question batteries designed to measure inclusion: “I feel comfortable bringing my full self to work,” “My ideas are valued in team meetings,” “I experience a sense of belonging here.” Inclusion scores are frequently segmented by demographic group to identify whether the overall organizational experience differs significantly for different employee populations.


Recruitment Funnel Optimization

Recruitment funnel analysis treats the hiring pipeline as a conversion funnel and applies the same analytical framework used in marketing analytics: measure conversion at each stage, identify where candidates drop off disproportionately, and test interventions to improve conversion at the bottleneck stages.

Building the funnel:

Applications → Screening Conversations → Technical/Skills Assessments → Panel Interviews → Final Round → Offer Extended → Offer Accepted

For each transition, calculate the conversion rate:

Conversion Rate (Stage n to Stage n+1) = (Candidates Advancing to Stage n+1 / Candidates at Stage n) x 100

Source-level funnel analysis: The funnel should be segmented by candidate source. LinkedIn applications may convert from application to screening at 8%, while employee referrals convert at 35%. This difference does not necessarily mean LinkedIn is a bad source - the yield may still be acceptable given volume and cost - but it informs how to structure sourcing investment.

Stage-level time analysis: In addition to conversion rates, measure average days elapsed at each stage. Long stage durations indicate process bottlenecks: slow background check vendors, hiring managers who are not scheduling interviews promptly, decision delays in the debrief stage. For technical roles in competitive markets, days spent waiting for a hiring decision are days during which a candidate is receiving and evaluating competing offers.

Quality dimension: Funnel analysis that tracks only conversion rates does not capture whether the candidates advancing are high quality. Incorporating interviewer scorecard ratings into the funnel analysis identifies whether screening is effectively filtering for relevant skills or whether low-quality candidates are advancing to expensive panel interview stages.

A/B testing in recruiting: Job description language, application requirements, and screening process design can be A/B tested to improve early-funnel conversion rates. This requires careful experimental design - randomizing which candidates receive each treatment - and is most practical for high-volume roles where statistical power is achievable.


Manager Effectiveness Analytics

Manager effectiveness is one of the highest-leverage analytical problems in HR - and one of the least systematically addressed. The quality of direct management is the most frequently cited factor in employee engagement, attrition, and productivity research. Despite this, most organizations measure manager effectiveness only through subjective methods (upward feedback, 360-degree reviews) and do not connect manager behavior to measurable workforce outcomes.

The behavioral outcome framework:

Effective managers produce measurable outcomes in their teams. The analytical approach maps specific manager behaviors to team-level outcomes, establishing which management behaviors are statistically associated with better retention, higher engagement, and stronger performance.

Team-level outcome metrics:

  • Team voluntary turnover rate: Calculated as voluntary separations from the team over a period divided by average team headcount. Manager IDs from the HRIS enable this calculation.
  • Team engagement score: The average engagement survey score for direct reports, adjusted for company-wide effects (so that a manager is evaluated relative to how their team’s experience compares to the company baseline, not just the absolute score).
  • Team performance distribution: The distribution of performance ratings on the manager’s team, segmented to understand whether the manager is developing high performers, managing out low performers, or producing a compressed rating distribution that reflects calibration avoidance.
  • Promotion rate: The proportion of direct reports promoted over the past 24 months. Managers who develop talent produce higher promotion rates.
  • Internal transfer rate: Employees voluntarily transferring away from a manager’s team is a behavioral signal distinct from external attrition.

Behavioral input metrics:

  • One-on-one meeting frequency: Calendar or check-in tool data (from Lattice, 15Five, or similar) showing whether managers hold regular one-on-one meetings and the consistency of that practice.
  • Feedback frequency: The rate at which managers provide documented feedback to direct reports through performance tools.
  • Recognition activity: How frequently managers recognize employee contributions through recognition tools.
  • Response to team survey results: Whether managers complete action planning after engagement survey results and follow through on commitments.

Manager effectiveness score construction:

Manager Effectiveness Score = w₁(Team Retention Rate) + w₂(Team Engagement Score) + w₃(Feedback Frequency) + w₄(1:1 Consistency) + w₅(Team Promotion Rate)

Weights should reflect the relative importance of each dimension in your organizational context. The score should be validated against business outcomes (team productivity, quality metrics) before being used for manager performance evaluation.

Using the analysis: The primary use case for manager effectiveness analytics is not performance management of managers but identification of coaching needs and development opportunities. A manager with high turnover and low engagement but strong behavioral indicators (frequent one-on-ones, active feedback) may be dealing with team-structural or role-definition problems outside their control. A manager with low behavioral activity and poor team outcomes is a development priority. The analytics enable targeted coaching conversations grounded in evidence.


HR Data Quality Management

Data quality is the silent failure mode of HR analytics. Organizations invest in dashboards and models that produce numbers that are wrong - not obviously wrong, but systematically biased in ways that lead to incorrect conclusions - because the underlying data was not validated, cleaned, or maintained with analytical use cases in mind.

The four dimensions of HR data quality:

Completeness: What proportion of records have values for each critical field? An HRIS where 35% of employee records are missing the manager field cannot support manager-level analysis. A completeness audit - systematically measuring null rates for every analytically important field - should be conducted before any analysis program is launched, and repeated quarterly.

Accuracy: Do the recorded values reflect reality? A performance rating of “Exceeds” that was entered as the default because the manager did not complete the review is an accurate recording of the data entry but an inaccurate representation of employee performance. Accuracy is harder to audit than completeness because it requires validation against an external source of truth.

Consistency: Do the same concepts have the same representation across systems? If the HRIS records department as “Engineering” and the ATS records it as “Engineering & Product” and the payroll system records it as “Eng,” joining these systems on department produces mismatches. Consistency auditing maps field values across systems and identifies where standardization is required.

Timeliness: Are records updated promptly when reality changes? A manager hierarchy that is updated on a 45-day lag because HR processes require form submission and manual entry will produce manager-level analytics that reflect the organizational structure of six weeks ago.

Building a data quality program:

The most effective approach to HR data quality is not retrospective auditing of existing data but upstream prevention: designing HRIS workflows that enforce data quality at the point of entry. Required fields, validated dropdown menus instead of free text, integration-driven automatic population of fields that can be derived from other systems, and regular data quality reviews as part of the HR operational calendar all reduce the audit-and-correct burden.

For analytical purposes, every data pipeline that feeds an HR dashboard or model should include explicit data quality checks: assertions that critical fields are not null above a threshold, that date fields are within expected ranges, that foreign key joins produce expected match rates. When quality checks fail, the pipeline should surface the failure rather than silently producing a report based on incomplete data.

The governance structure: Data quality without accountability produces quality checks that are routinely ignored. Effective HR data quality programs assign ownership: a specific person or team is responsible for the accuracy of each source system, data quality metrics are reported to HR leadership on a regular cadence, and data quality is treated as an operational discipline alongside other HR process quality measures.


Connecting the Techniques

These techniques compound when applied together. Predictive attrition modeling is most accurate when the training data is high-quality (data quality management is the prerequisite). Workforce planning is most useful when it incorporates attrition forecasts from the predictive model. Pay equity analysis is most actionable when it is segmented by manager and department (connecting to manager effectiveness analytics). DEI analytics and recruitment funnel analysis share the representation pipeline framework.

For the data sources that feed these analyses, see HR Data Sources. For the KPIs that these techniques produce and contextualize, see HR KPIs. For how to present the outputs of these analyses to executive and operational audiences, see HR Dashboards.

Get More from D-LIT

Ready to transform your analytics capabilities? Talk to our team about how D-LIT can help your organisation make better, data-driven decisions.

Get in Touch