Hopefully, if you are reading this article, it means that you’re now convinced that relative estimation is a great way to move forward. The lights are dimmed, the sunglasses are on, and the cards are ready to be dealt for your inaugural Planning Poker session. But hang on — where do you start? What does a 1-point user story actually mean? How about a 13-pointer? What is the best way to initially calibrate so that the team has a foundation to work from? If these are the questions that are running through your mind, then please read on.
An Approach
One calibration approach that some teams like to use is to identify the smallest user story in the product backlog and designate it to be the initial 1⁄2-point story (assuming they are using the Fibonacci sequence). Once this initial baseline has been confirmed, the team works its way down the list of user stories and allocates 1 point for any story that is roughly double the 1⁄2-pointer, 2 points for any story that is roughly double a 1-pointer, and so on.
This approach can certainly work, and it seems straightforward on the surface, but the reality is that it can end up taking considerably more time than you might expect. First, the team has to actually traverse through the entire product backlog to identify the starting contenders, and second, the team needs to reach a consensus as to which user story should become the actual initial baseline.
Bear in mind that your team is new to this process, so it helps to reduce as much ambiguity as possible. It is for this reason that I like to calibrate story points by utilizing work completed in the past.
Using Historical Work
The idea behind leveraging historical work is to help create mappings between known quantities (old completed work) and the new Fibonacci story point values (or whatever other scale you choose to use).
Using historical work offers a team two significant advantages: familiarity and consistency.
Familiarity
It is obvious that any team will be more familiar with work that they completed previously than with work they are going to do in the future. This familiarity proves to be particularly helpful when playing Planning Poker because instead of comparing future unknown work to other future unknown work (similar to the first approach described earlier), teams can compare future unknown work with past known work. Not only does this approach remove an element of ambiguity, but also, the speed at which these comparisons take place will be much quicker because the team can more readily recollect the historical work.
Consistency
When historical work forms the set of benchmarks (for the various point values in the Planning Poker deck), these same benchmarks can be used across any and all projects that the same team works on down the track. This early work will naturally speed up future proceedings because the initial benchmarking process is required only once (as opposed to whenever a new product backlog is formulated and presented).
Creating the Mappings
Five steps are required when creating the mappings between the historical work and the new point scale. Steps 2 and 3 in the following process are inspired by James Grenning’s article “Planning Poker Party”, which describes a similar approach (using a new product backlog rather than historical work).
Step 1: Identify
Identify a recent project that the same team (or at least most of the team) was involved in. List the discrete pieces of work, and write them on index cards (if they are in digital form). If they are not already in the user story format, they should be converted to ensure comparative consistency moving forward (see Figure 1).
Step 2: Sort and Stack
For this next step, you need a nice, big table and the development team. Starting with the first index card, read the user story out loud and place it on the table (see Figure 2.1).
Next, take the second card and ask the team whether they recall it taking more, less, or the same amount of effort as the first card (see Figure 2.2). If it took less effort, place it to the left of the original; if it took more effort, place it to the right; and if it took roughly the same amount of effort, stack it on top of the first. If there is any contention or confusion, “burn” the card (not literally, please).
Then take the next card and place it either to the left of both cards (if it took less effort than both), to the right of both cards (if it took more effort than both), between the cards (if its effort was somewhere in the middle), or on top of one of the cards (if it was roughly the same effort). Repeat this process for all of the index cards (see Figure 2.3).
Step 3: Sizing Up
At this stage in the process, there should be a number of sequential card stacks (of varying sizes) on the table. Please note that I use the word stack loosely, as you can certainly have just one card in a stack in this exercise. The stack at the very left of the table will therefore contain the cards representing the smallest user stories, and the stack representing the largest stories will be located at the very right end of the table.
Now, it’s time to play some Planning Poker. Automatically assign all cards in the leftmost stack a 1-point value (see Figure 3). As an aside, I like to reserve the smallest 1⁄2-point value for trivial changes, such as label adjustments or textbox alignments, so unless your smallest stack is made up of these tiny requirements, consider starting with 1 point rather than 1⁄2 point.
Starting with a representative from the second-smallest stack (directly to the right of your new 1-pointers), determine the relative effort that was required to complete it compared to a representative from the smallest stack (for example, it may be three times as much effort).
As each stack gets classified, place a card representing its relative point value above it for quick recollection, so using our example, the second stack would be tagged with a 3-pointer card.
Step 4: Subclassify
With any luck, your Planning Poker session ran smoothly (thanks to the tips that you picked up in Planning Poker at Pace, leaving you with several stacks of user stories with corresponding point values.
In a perfect world, there will be a stack that corresponds to each value in the point system that you’re using (see Figure 4.1), but do not worry if this isn’t the case. At the end of the day, so long as you have a couple of benchmark stories, you can at least get started.
If you happen to be spoiled for choice by having stacks containing a number of stories, then you can further classify them into subcategories that relate to different areas of focus (see Figure 4.2. For example, you could end up with three different 5-point stories. Even though they are grouped together (based on similar effort), they could all have very different focal points. Story 1 could have data optimization complexities, story 2 could have more of a user-interface focus, and story 3 could require integration with a third-party product. By subclassifying in this manner, the ability to compare apples to apples (when estimating new product backlogs) becomes a reality.
Step 5: Final Filter
The final step in this calibration exercise is to filter out one representative from each stack (or substack if you subclassified as explained in step 4). These final champions will become the reference stories that are used to help start off future Planning Poker sessions (on new product backlogs). Considering that the stories have already been classified, the selection of the reference stories can be based on choosing a random story from the stack, or if you wish to be more discerning, the team can select stories that carry the most familiarity.
Keep Up Your Recycling
Although the initial calibration exercise may be complete, I recommend that you embrace and continue your new recycling practices. At the end of every subsequent project, add any completed stories to the benchmark collection to continuously build up a rich library of stories that are not only familiar but also easily relatable to a variety of different requirements.
There you have it. You are now equipped with a process to leverage historical work to calibrate some relative benchmarks. By utilizing work completed in the past, the team gains the added benefits of familiarity and consistency, making the transition to relative estimation smoother and less ambiguous.
If you liked this article, you can:
Subscribe to this RSS feed!
Find out more about relative estimation by taking one of our CSM training courses.
Excelent article, very useful for everyone who is working with scrum methodologies.
Keep up the good work!
I have some standard set of dilemmas, especially when I am handling Embedded Systems Projects!
1. For relative estimations, I have to consider the “base” of one “feature development” is “similar” to the next one, just a size differentiator ! But it is many times not!! So, developing a feature using a standard “C” functions is “different” than the one using some “drivers” or 3rd party plug-ins….
2. My teams are not homogenous.One works on “C” within a constrained hardware environment, the other works on “Java” on a workstation and the third works on the electronics hardware design! In fact, presently, I am not able to run the hardware development under the SCRUM rhythm. So, relative estimation is a big issue.
3. I have always experienced that the “relative’ estimations are also “person-relative”! So, the person with better competence /speed has a different estimation. I know, that’s what the poker should do–average-out–, but still, Team A and Team B has different relative velocities.
4. Lastly, how should I give the sprint-wise and total “time-estimate” to my Top management and moreover, to my Customer, neither of whom don’t (want to) understand Story-Points! Every time convert back to time from story points? Then what’s the fun?
Nice questions Shirish! I’ll answer them in the order that you asked them.
1) In this case the calibration time buckets that I talk about above could be a good fallback option until you have a story from each ‘development type’ correlated to a point value. From then on you are able to compare apples to apples etc.
2) Interesting situation and in this case, based on the fact that each team member appears to be almost independent I would create separate sub-projects/product backlogs (even with one person) so that each can independently run relative estimation over their work (perhaps with you as the counterpoint in planning poker).
3) Firstly, I don’t see any problem with having two differing relative velocities so long as the teams stay static (i.e. you don’t chop and change members around). My other recommendation is to try and balance your junior/senior team members out so that both teams have the same overall skill and experience levels.
4) When all is said and done, your story point estimation session will give you a total number of points for the project – let’s call it 300 points. Your velocity will give you the average number of points your team will be able to complete in a sprint – let’s say it’s 30. So for your sprint level report you would tell the customer / management that within the next 2 weeks (assuming a 2 week sprint duration) we will be able to get User Story/PBI X, Y and Z complete (assuming that the sum of their stories is <= 30). For the complete project you would simply take the 300/30 X 2 (total points/velocity X sprint duration) which = 20 weeks. Add maybe 3 weeks for Sprint 0 and some stabilisation buffer and you can give a rough (yet constantly updatable) timeframe for the full duration of the project i.e. 23 weeks.
Of course you'll have to plug in numbers that make sense to you but you can see how I've calculated this rough number. Hopefully this makes sense.
As an aside, I usually present a 5 slide summary of story point estimation to upper management who without fail are intrigued and quite excited about it so I recommend you don't give up trying to educate the naysayers
Keep it going and I hope this helps.
Ilan–
Starting by comparing to old work is indeed a common technique. I’m glad you’ve written it up here. I particularly liked that you made the point of saying to throw away the calibrations once they’ve been used. That is, as you say, very important. Continued reference back to them as a form of holy grail creates unneeded confusion later. Thanks for sharing your post.
Really glad you enjoyed the post Mike. For anyone reading this who is brand new to relative estimation, Mike’s brilliant book Agile Estimation and Planning is without doubt the seminal work in this area and is not only very helpful but also extremely well written.
Just in relation to Shirish’s third question:
I love the fact that when participating in planning sessions, my team members and I do not pre-emptively assign tasks to specific developers, or even make casual assumptions about who’s going to do what (not even testing), because it eliminates the possibility that estimates will be biased, and ensures that items at the top of the sprint backlog do not get passed over for a lower priority task just because someone doesn’t think it’s their responsibility.
Obviously certain team members have specialities, and the reality is this will often determine which tasks they complete, but sometimes it doesn’t. They may end up just testing or reviewing something that someone else decided to have a go at because it was the next priority.
In my opinion, story points should always be based on the notion that anyone might do it, including newer or less experienced developers, to avoid unintentionally over-committing as a team. After-all, no Product Owner would hesitate to add more stories to the sprint if the newer members discover that the tasks weren’t as scary as they originally thought!