What’s in a 5? Ensuring ratings are fair and useful

In our August Best Practice Group session, two clients were interested in understanding their ratings data, and specifically how they could prevent bias and over-rating.

‍

“How can we ensure that the ratings managers give are fair?”

“Ratings are skewed towards the higher end of the scale. While our employees are excellent, I feel this is not showing the full picture”

‍

These are questions we get quite often and are good questions to ask. Performance development needs to evolve with your organisation – your approach needs to grow as your firm scales and grows.

‍

There’s an even split between clients who use ratings, and those who only use open text fields. Ratings are easier to complete and compare, but are not as insightful, and sometimes lead to more questions than answers. The best might be a hybrid approach.

‍

Everyone gets a 3. Except me.

‍

Bias is a complex and persistent challenge. Taken broadly, it could mean anything that impacts the fairness and accuracy of an evaluation. There are entire sections of libraries devoted to bias, so we won’t go into it in detail, but it’s worth quickly summarising the four we most commonly come across:

Proximity bias: those you see (or notice) are perceived to contribute more (e.g. the person who comes early and stays back late is contributing the most)‍
Affinity bias: those you share most in common with (background, culture, gender, preferences) are perceived to have more impact (e.g the person who shares the same educational journey as me is best qualified to lead others)‍
Horns and Halos: we assume capabilities based on characteristics we perceive (e.g.attractive people are thought to be intelligent, overweight people assumed to be indolent)‍
Self-fulfilling prophecies: we assume someone will be great (or not), and find evidence to reinforce that belief (e.g. the person I hired after a rigorous process must be doing well)

‍

The other issue firms often experience is over-rating – giving higher ratings that an employee deserves. Managers could be doing this because they are avoiding having a difficult conversation with the employee, they sufficiently value the process or they just don’t really understand what the rating is supposed to convey. Over-ratings are problematic because they are unfair to other staff and doesn’t actually help the employee improve.

‍

What can we do?

There were four strategies that worked.

‍

A) Clearer ratings

‍

When designing evaluation forms, explain what each rating means. Where possible, use examples that make sense to a manager in their role. Have a rubric and explain what this through a conversation with managers.

‍

This is the best system-wide change you can make, and is the basis of any other enhancement, but in of itself is not enough. Leaders need to ensure managers understand the importance of getting this right and are best supported to do so.

‍

B) Training

Conduct workshops to take managers through types of bias, as well as how to detect and correct for them. The most effective workshops take managers through their evaluations and encourage them to reflect on how they can be improved.

The challenge is training sessions take time away from already busy leaders. Managers who are busy or who are not committed to performance development won’t invest in training sessions. Some clients saved time by conducting training online or in a hybrid format, but tricky topics like bias require more engaged sessions.

C) Moderation meetings

Review the data with HR business partners and department leaders. Review ratings data by key criteria (for example role, department, experience, gender, background, manager) and analyse for discrepancies.

Is a particular manager scoring much higher than the rest? Are women doing worse on average? Is a manager scoring employees of colour lower?

Moderation meetings take time but because real data is being reviewed, it can be a very insightful process. Clients said it took time to get moderation meetings organised well, but once they took place managers and partners usually found it incredibly valuable, and it often resulted in improvements to the process, commitment to training or recognition that current processes were working well.

D) No ratings

Instead of ratings, use open text questions only and designs prompts to ensure reviewers are comprehensive and detailed. This should ensure varied and insightful responses.

Clients felt this provided a treasure trove of insight, though it made it harder to actually extract value or to detect bias (it didn’t mean bias disappeared!). It also meant the form had to be simpler and shorter. Some clients also felt managers preferred this approach as it gave them the opportunity to share detailed feedback.

There’s a real opportunity here to use natural language processing and semantic analytics to extract insights - for example extracting key themes within the comments (e.g. certain roles / departments mention these skills) or guessing the tone of a comment based on the words used. Often this requires special skills, tools or features within your performance management platform, or at least the ability to extract the relevant data and process it within excel.

‍

Do you rate ratings?

At the end of the day, we believe ratings have their place. It’s a useful tool, as long as:

Managers and employees understand what the ratings mean, and there are other opportunities to share qualitative feedback within the form. Open text along with a rating provides useful context, and forces raters to explain their rating.
There are processes in the firm to evaluate and understand the ratings. Training and/or moderation meetings can not only help detect bias but also unearth insights from the review process.

Rooting out bias from performance reviews takes time and effort, but it starts with curiosity and concern. If you’re interested to learn more, check out the HBR article “How One Company Worked to Root Out Bias from Performance Reviews” (by Joan C.Williams, Denise Lewin Loyd, Mikayla Boginsky, and Frances Armas-Edwards). It discusses how a midsized law firm discovered four patterns of bias after conducting an audit. They implemented two simple changes (redesigning their evaluation form to be more objective and a one-hour workshop) – and just a year later saw some interesting results.

The Partner Remuneration Handbook (written by Performance Leader founder and CEO Ray D’Cruz and Michael Roch) also dives into ratings and provides useful design guidance. Some top tips from the book:

‍Consider the spread. A broader spread (six-or five-point scales) will create more diversity in responses, but a more dense spread (three or four points) creates more consistency. ‍
Odd vs even number of points. An even number of points forces the rater to make a chose, as there is no exact middle of the scale. ‍
Consider the language. Scales with positive and strength-based language are becoming more popular among professional services firms, as these may reduce stress and anxiety among both managers and employees. Strength-based ratings are perceived as developmental rather than critical.‍
Low to high. Ordering low to high (instead of high to low)might create a better spread, as this helps overcome the courtesy bias. This is also useful if you feel there is a bias towards the higher end of the scale. ‍
Weighting ratings. Allocating percentages to different metrics can encourage raters to spend more time qualifying some ratings, leading to more thoughtful responses. It also allows firms to better differentiate and understand results.

The best approach is the approach that works for you. Pick three approaches and experiment with different groups, then run focus groups with those users to understand and improve their experience. Maybe even have them rate it.

‍

What’s in a 5? Ensuring ratings are fair and useful

Everyone gets a 3. Except me.

What can we do?

Do you rate ratings?

Recent Articles / Podcasts