If you are a frequent user of
INDEX + MATCH, then you will find the
LOOKUP + SEARCH formula to classify data incredibly useful.
- Excel level: Power user
- Last updated: October 2020
A word or group of words within a search phrase may need to be classified. We build out search phrase data models that contain properties of an item. For example a jacket has:
- Colour (Red, Black, Blue, Green, …)
- Materials (Leather, Duffel, …)
- Size (Small, Medium, Large, …)
- Type (Bomber, Sports, …)
Those list values can be used for variable substitution.
|blue coat||[colour] [coats]|
|black coat||[colour] [coats]|
|duffle jacket||[material] [jackets]|
|buy duffle jacket||[purchasing] [material] [jackets]|
|small brown winter coat||[size] [colour] [season] [coats]|
For a keyword analysis of 50,000+ phrases, being able to filter the data in Excel can be tricky using just the ‘contains’.
PPC Campaigns are structured as a tree node, whereby a word group belongs to a campaign. This does not allow for a keyword phrase to appear under multiple nodes of the same data set. Instead, a better solution is to classify words rather than entire search phrases so filters can be applied to important properties.
Lists are a powerful way of dynamically classifying the search phrases. Here are some list examples:
- Genders (men’s, man, women’s, woman, ladies)
- Materials (denim, leather, suede)
- Occasions (birthday, christmas, easter)
- Relationships (husband, wife, brother, sister)
- Seasons (spring, summer, autumn, winter)
- Timeframes (monday, tuesday, weekend, weekday, january, february)
Tip: If you are working at an agency, ask your clients to send you lists of data, such as categories and product properties.
Ready to work through the classification formula? Here is an example to get you started. Study the formulas in the sheets along with this tutorial.
The combination of a
LOOKUP vector with
SEARCH allows Excel users to find a group of characters within another group of characters (substring matching).
LOOKUP is not case sensitive, however it is good practice to keep the data uniform. Make sure you
TRIM the data to remove any spaces at the beginning or end.
ARRAYs are extremely difficult to use, often cause bugs and very slow, so instead we create a searchable vector.
This is done by using the
SEARCH function with the premade list as the “
find_text” attribute, and the string for the “
within_text” attribute that we are trying to lookup. It is important to use absolute cell references.
For example, assuming the keywords we wish to search are on the current tab, in cell
G2 the list is on a tab called Genders and the list has 5 items in Column A, the formula would be:
This will not work yet, so expect an
LOOKUP + SEARCH Function
Now wrap that
SEARCH inside a
LOOKUP, using a “
lookup_vector” is the
SEARCH you just performed and the “
result_vector” is the same Genders list as in the the
SEARCH function, with absolute references.
This formula now show substrings that are within the list and an
#N/A value if a match is not found. To make this more friendly, wrap it in an
IFERROR statement so that a hyphen is presented if no match is found.
Grouping Items of a List
If your list contains variations of a topic, such as gender, each item will printed individually, i.e. “women’s“, “woman“, “ladies“. To collect those variations under a single category of “Women’s“, add a column in the Genders tab, Column B with the grouping.
You can now use
VLOOKUP against the value received from
LOOKUP + SEARCH to power
VLOOKUP. See the tab Keyword List (#2) for examples.
The Final Grouping Classification Formula
You will need to scroll to see the entire code. You now have a dynamically generated classification through substring matching.
If you decide to extend any of the classifications, you will need to insert a new line. Even with absolute cell references, they can often shift from
$A$2 if you prepend or append to the list. By using a hyphen at the beginning and end, insert inbetween the hyphens to automatically extend the list, without the need for editing the absolute cell references.
Empty cells are not allowed in the list, it will break the formula.
This classification will match substrings. This means if you have a word such as “coat“, it will match “coat“, “coats“, “overcoat“, but also “coating“. This could be a problem if you have substrings with multiple meanings.
Make the List Absolute Cell References Dynamic
This could be done with the use of
CONCATENATION functions. However, this would need to be computed upon each cell unless the value is stored somewhere. This also makes the formula almost unusable and a lot more complicated than just using “The
Some listings may have more than one match. For example, a keyword term could be “black and white dresses“, so the first colour “black” would be matched, unless there was a “black and white” classification.