Clustering is the distribution of promoted queries across the site’s pages. It is done after the semantic core (SC) has been collected and cleaned of unsuitable and “empty” queries. Example of clustering:
Before
After
Page 1
Page 2
Page 3
The goal of clustering is to distribute queries across pages in a way that increases their chances of ranking well by following the logic from the user’s and search engine’s perspective.
Queries are combined based on three main criteria: query semantics, intent, and top-10 match.
Queries are divided into two categories: informational and transactional (commercial). SEO-specialists usually do not assign the same page in the SC for both commercial and informational queries – in this case, it’s unlikely to effectively promote both groups.
Intent is a concept similar to semantics but more specific. It refers to the user’s more precise intention when entering a particular query into the search bar. According to this intention, the query is assigned to a specific page in the SC – new or existing. Intent is determined in two ways. The first is simple analysis of the meaning of words in the query, but this method is not always suitable. In this case, study the search results to understand whether users are looking for a product or information.
This approach looks for overlapping links in the search results for different queries. The more pages overlap across queries, the more likely these queries should be combined on one page within the SC. There are three methods of top-10 clustering: soft, middle, and hard. They vary in labor intensity and relevance of results.
In this method, there is no single “main” query. The first query is linked to the second, the second to the third, and so on. In this case, each query must have at least two matches with others. However, it’s still not necessary for the same number of pages to overlap in the search results for each query.
A broader meaning query is selected, usually high or medium frequency. Other queries are matched to it, each semantically linked to the main one. The links between narrower queries and the content of their search results are not considered. The main thing is that each of them intersects once with the main query.
In this method, every query is linked with all others, requiring a complete intersection of results in the output for all queries. This approach ensures the most accurate result but is the most labor-intensive. It results in the semantic core being segmented into separate pages more than any other method.
The top-10 clustering approach helps to see the logic of queries through the eyes of the search engine. This is essential for page-by-page breakdown of the semantic core.
The previously described methods are almost never used manually anymore. Automatic clustering by intent uses complex algorithms and machine learning models that “understand” the meanings of words in queries and can logically group them.
The usual workflow with clustering consists of two stages: