Which technology is best for your specific needs?

IT managers making investment decisions must frequently weigh a broad set of functional tradeoffs between similar products in order to make a choice that best meets their needs. But it can be difficult to obtain accurate information about the true functional capabilities of products that are being evaluated. Product suppliers tend to highlight their strongest features in marketing information, while obscuring information about weaker features.

Introducing Open Source product evaluations

The IDEAS Collaborative Product Evaluation (CPE) process applies the open source concept to in-depth functional evaluations. Rather than relying on hidden methodologies and proprietary data of professional "experts", the CPE approach harnesses the collective knowledge of the IT community to arrive at transparent assessments of technology products.

Grade and compare products at arbitrary levels of detail

The CPE system represents product capabilities as a hierarchical taxonomy of features, allowing a set of products to be compared at an arbitrary level of detail. Any visitor can submit ratings for functional details of a product, representing their assessment of the quality of that function's implementation. Because ratings can only be submitted for specific functions rather than entire products, the system draws out deep expertise of contributing users, rather than merely measuring general allegiances towards particular products or vendors.

Ratings qualified with distributed moderation

Submitted ratings are qualified through distributed moderation, whereby a select subset of community members are automatically given the opportunity to moderate ratings of other user. Members with moderation privileges can increase or lower the credibility of a rating based on its explanation. Other visitors can then adjust the credibility threshold at which they want to see ratings.

Customize summary ratings based on your own functional priorities

Visitors who register with the CPE system can assign weights to any level of functionality, representing the importance of that functional capability in their particular environment. These weights are used to composite detail ratings into customized summary ratings for each product, reflecting the suitability of each product for a particular user's requirements.

How Collaborative Product Evaluation Works

Overview



Introduction

Buyers considering the acquisition of complex technology products sometimes face difficult choices. Technology products typically have an extensive and diverse set of functional capabilities, which customers must try to match with their specific requirements. It can be challenging for buyers to weigh the functional tradeoffs between a set of similar products in order to make a choice that best meets their needs. In particular, it can be difficult to obtain accurate information about the true functional capabilities of products that are being evaluated. Product suppliers tend to highlight their strongest features in marketing information, while they may obscure information about weaker features.

Customers who are already using a product probably have the best understanding of how well it works. They are thus in a position to provide the most accurate information about the quality of product features. Global networks such as the Internet potentially provide a means to connect directly with users who are willing to share information about their experiences with products. Indeed, a variety of web sites exist that allow users to submit ratings for a broad range of products. However, these sites usually only allow users to assign ratings at a relatively high level. Users can provide an overall grade for a product, and at most two or three detail grades addressing specific product characteristics.

While this process may be sufficient to rate relatively simple products, it may not be suitable for more complex products with more extensive feature sets, such as such as computer systems or software. Because each user matches these products with their specific requirements, they have a unique view of the products' value. For users to rate the quality of these products in an objective way, it is necessary to structure the evaluation process so that users can comment on specific product features.

Decomposing Products into Functional Details

The Collaborative Product Evaluation system allows product quality to be compared at arbitrary levels of detail. The CPE system represents product capabilities as a hierarchical taxonomy of features, in which products are decomposed into subsystems, each of which can be further decomposed into sub-subsystems and so on, until the lowest level of functional detail is reached.

Users submit ratings for functional details at the lowest level of the hierarchy, rather than for entire products or higher-level subsystems. In this way, specialists can collaborate on grading complex products by focusing on the the areas they are most familiar with, and summary product ratings will be based on the aggregate of specific functional capabilities, rather than offhand assessments based on users' biases.

Any user can assign ratings to individual features of a product, along with a written explanation of the rating, representing their assessment of the quality of that feature's implementation. Users can submit functional ratings anonymously (the ability to post anonymously is important, because people sometimes have important information they want to contribute, but they may be afraid to do so if they can be connected with it), or they can provide varying levels of identity information. However, anonymous ratings will initially have a lower credibility level than ratings submitted by users who have registered and logged in (see below).

Registered users can also assign weights to features at any level of the taxonomy, representing the importance of a particular functional level in their particular environment. The system uses these weights to composite moderated feature ratings into summary ratings for each product, reflecting the suitability of each product for a particular user's requirements.

Interacting with the Collaborative Product Evaluation System

When users first enter the CPE site, they are presented with a bar chart showing summary ratings for all evaluated products. The summary rating for each product is calculated with a weighted average of its sub-level ratings, each of which is averaged from its lower-level ratings and so on. Users visiting the site anonymously have equal weights applied to each functional level, while registered users can set their own weights for each functional level.

Each bar is also labelled with the total number of ratings that have been submitted for that product below the current functional level, and at the current credibility threshold. The credibility threshold can be adjusted to see the summary scores that result from considering ratings at higher or lower credibility.

Below the bar chart is usually some introductory text, explaining the problem that the evaluated products are designed to solve, and providing broad descriptions for each of the first level of functional categies. Links then point to the evaluation pages for each of these categories. Next to each link is shown the total number of ratings that have been submitted for that category at the current credibility threshold.

Descending to the Next Functional Level

By clicking on a link to a functional category, users descend to the evaluation page for that level. Similar to the top-level page, the bar chart shows the ratings for all evaluated products at this level, based on the average of lower-level ratings. Each bar also shows the total number of ratings that have been submitted for that product at lower functional levels, and at the current credibility threshold.

The text describes this functional category in more detail, and provides broad descriptions for the next lower level of functional categies. Links then point to evaluation pages for each of the next-lower levels.

Descending Deeper into the Functional Hierarchy

This format is repeated for each succesive functional level. A series of "bread crumb" links over the bar chart can be used to quickly navigate to any functional level above the current one.













Functional Details Rated at the Lowest Level

The ratings that have actually been submitted by users are shown at the lowest functional levels. The bar chart shows the average of all ratings that have been submitted at this functional level, and at the current credibility threshold.

For each submitted rating, the system displays the ID of the user who submitted the rating (or "Anonymous" if the user did not register); the user's rating of the current function for one of the evaluated products; the time and date when the rating was submitted; a written comment explaining why that rating was given; and the current level of credibility that this rating has achieved, based on the distributed moderation process.

Only ratings at the current credibility threshold or higher are displayed and used in calculating the bar chart values.














Submitting Ratings for Product Functions

At the lowest functional levels of an evaluation, anyone can submit a new rating for a specific product function by pressing the "Submit Rating" button. The button brings up a form for selecting one of the evaluated products, and a rating for how well the current functional level is implemented in that product.

A text box enables the user to enter a comment explaining how they arrived at their rating. Users may include a limited set of HTML tags in their comment, including <b> (bold), <i> (italics), and <a href=""> for including links (ratings that link to supporting material tend to achieve higher credibility scores in the distributed moderation process). All other tags are scrubbed from the comment text before submission.

The appearance of the rating commment can be previewed, or submitted directly. Upon submission, the user is presented with a captcha to prevent automated submissions. If that test is passed, the rating is entered into the system, subject to the following constraints:

  • Users must enter more than a few words of text in the comment box for the rating to be accepted.
  • Registered users may only submit a single rating of a particular function for each product under their ID.
  • If a rating is submitted anonymously, the system tracks the IP address from which the rating was submitted, and prevents another anonymous rating of that function in a product to be submitted from that IP address for some period ot time (usually 24 hours).

Adjusting Moderation Threshold to Vary Credibility of Ratings

Users can vary the credibility threshold at which ratings are displayed in order to increase or decrease their confidence in the results. The system will only show ratings with credibility at the level of the threshold or higher.

By default, ratings that are submitted anonymously start out with a credibility level of 0, while registered users start out with a credibility level of 1. Further, ratings from registered users with a good reputation might start at level 2, while registered users with a poor reputation might start out at 0 or lower. Over time, the distributed moderation process should segment ratings into various levels of credibility (see Qualifying Submitted Ratings with Distributed Moderation).

The default credibility threshold is 1. Setting the moderation level to 2 or higher produces fewer ratings, but at higher levels of credibility.

Reducing the moderation level to 0 increases the number of ratings, at lower levels of credibility.

The credibility threshold is maintained at every level of the functional hierarchy. Note how summary standings at the root level change as moderation level is varied.

Credibility threshold = 0
Credibility threshold = 2

Weighting Importance of Functional Categories

Users who register and log in under their ID are presented with additional controls for setting weights of functional levels. These weights represent the importance of particular functions for their specific requirements, and are used in calculating the average product ratings.

Weighting Multiple Functional Levels

Weights can be applied at every functional level of the evaluation.

Qualifying Submitted Ratings with Distributed Moderation

Accepting ratings over public networks such as the Internet carries certain risks. Because Internet users are by definition quasi-anonymous, their accountability is weak. At minimum, public input mechanisms may be clogged with insincere responses (e.g. "trolls") or off-topic responses (e.g. "spam"). At worst, they are at risk of being gamed with deliberate disinformation campaigns or disrupted with denial-of-service attacks. As a result, any scheme for collecting qualitative product feedback from Internet users requires some means to maintain the quality of information that is submitted.

The most straightforward method to validate information submitted over the Internet is to employ human moderators to examine every piece of data before it is used in an evaluation. However, this approach may be inefficient in an Internet environment, where the user base is very large, and users expect rapid feedback to their actions. When dedicated moderators are confronted with huge volumes of data that can potentially pour from the massive Internet user base, they quickly become a bottleneck, introducing delays in processing submitted information. Also, the moderators may themselves lack the ability recognize the value of information when it is submitted, or they may consciously or unconsciously introduce a bias in their assessment of information quality.

To address these issues, certain methods have emerged for validating information submitted on the Internet in a way that can effectively scale with the size of its user base. Generally, these methods take advantage of the user community to validate information submitted. One such method is distributed moderation. To compensate for the behavioral problems that inevitably occur when users are given the ability to provide ratings anonymously, the use of distributed moderation delegates the task of assessing ratings quality to the users themselves. But unlike traditional approaches of enlisting users for grading the quality of submitted information (i.e. the ubiquitous "Was this review helpful to you?" buttons on amazon.com), not all users can moderate ratings with distributed moderation. Rather, automatically selected subsets of registered users are temporarily offered the opportunity to moderate a limited number of ratings from other users.

The approach has several benefits. First, it relieves the moderation burden on system managers to moderate ratings, so that they do not become a bottleneck. By enlisting automatically designated users in the moderation process, the system enables rapid feedback in filtering out high-quality ratings from chaff. Second, the moderation process becomes more democratic, in that regular users rather than designated "gatekeepers" are given the chance to evaluate the quality of information submitted by other users. This leverages the expertise of users in determining whether a rating is valuable or not. The more users are involved in the moderation process, the more effective it becomes. Because distributed moderation is optimized to scale with the number of participants, it is well suited for use on the Internet, where the user base takes on a global scope. Finally, the relative scarcity of moderation privileges increases the incentive for members to moderate ratings when given the opportunity.

Distributed Moderation in Practice

The distributed moderation system works by assigning a score ranging from -1 to +5 to each posted rating. When a rating is first submitted, it receives an initial score that can range from -1 to +2, depending on the reputation of the user who submitted it. The default score for users who identify themselves is set at +1, while posts from anonymous users start with a score of 0. Registered users can improve their reputation with productive behavior, including regularly reading the site in-depth, posting ratings that get high or low scores, and moderating the ratings of other users. Ratings from users with an especially good reputation can start with a score of +2, and ratings from users with low reputation can start at a score of 0 or -1.

Registered users who read the site regularly and maintain a good reputation establish their eligibility to become moderators, which means that the system may automatically assign them moderation privileges on occasion. A user who has been given moderation privileges is credited with a limited number of moderation points, which they can use to boost or lower the score of individual ratings from other users (moderators do not have the ability to moderate their own ratings). Users who have moderation points read an evaluation as they would normally. When they reach one of the lowest functional levels, at which users have submitted ratings, each of the other users' ratings is flagged with a moderation selector. These selectors offer a list of descriptors for the rating, including negative assessments such as "Off-topic" or "Redundant", and positive assessments, such as "Insightful". Each qualifier corresponds to a -1 or +1 moderation coefficient, respectively, depending on whether it is positive or negative. After the moderations are submitted, they are applied to the selected ratings, causing their credibility level to rise or fall.

All readers of the site can then set a credibility threshold that is applied at every level of the evaluation hierarchy. This threshold filters the ratings that are displayed based on the moderation scores of the ratings. The threshold can be set in a range from a minimum of -1, in which case every submitted rating is displayed, to +5, in which case only the most credible ratings are displayed.

Applying Moderation to Ratings

Registered users who read the site regularly and maintain a good reputation establish moderator eligibility, which means that the system may automatically assign them moderation privileges on occasion. A user who has been given moderation privileges is credited with a limited number of moderation points (usually 10). Each of these points can be applied to a single rating to increment or decrement its credibility level by one. Users with moderation points cannot use them on their own ratings, and they can only moderate a rating once. Unused moderation points expire after some period of time (usually 3 days).

When a user with moderation points views an evaluation page at the lowest functional level, the ratings that are displayed have an additional control showing moderation qualifiers. When these qualifiers appear, the user can apply a qualifier to one or more ratings, and press the "Moderate" button at the bottom of the page.


Privacy Policy | Terms of Use Policy | Contact

Copyright © 2007-2008 Ideas International, Inc.
Portions Copyright © 2007-2008 Qualirate, LLC
All rights reserved.