Posted by gfiorelli1 In 2011 I wrote a post here on Moz. The title was "Wake Up SEOs, the New Google is Here." In that post I presented some concepts that, in my personal opinion, we SEOs needed to pay attention to in order to follow the evolution of Google. Sure, I also presented a theory which ultimately proved incorrect; I was much too confident about things like rel="author", rel="publisher", and the potential decline of the Link Graph influence. However, the premises of that theory were substantially correct, and they remain correct five years later:
Many things have changed in our industry in the past 5 years. The time has come to pause, take a few minutes, and assess what Google is and where it's headed. I'll explain how I "study" Google and what I strongly believe we, the SEOs, should pay attention to if we want not only to survive, but to anticipate Google's end game, readying ourselves for the future. Obviously, consider that, while I believe it's backed up by data, facts, and proof, this is my opinion. As such, I kindly ask you not to take what I write for granted, but rather as an incentive for your own investigations and experiments. Exploring the expanded universe of GoogleSEO is a kingdom of uncertainty. However, one constant never changes: almost every SEO dreams of being a Jedi at least once in her life. I, too, fantasize about using the Force… Gianlu Ka Fiore Lli, Master Jedi. Honestly, though, I think I'm more like Mon Mothma. Like her, I am a strategist by nature. I love to investigate, to see connections where nobody else seems to see them, and to dig deeper into finding answers to complex questions, then design plans based on my investigations. This way of being means that, when I look at the mysterious wormhole that is Google, I examine many sources:
Now, when examining all these sources, it's easy to create amazing conspiranoiac (conspiracy + paranoia) theories. And I confess: I helped create, believed, and defended some of them, such as AuthorRank. In my opinion, though, this methodology for finding answers about Google is the best one for understanding the future of our beloved industry of search. If we don't dig into the "Expanded Universe of Google," what we have is a timeline composed only by updates (Panda 1.N, Penguin 1.N, Pigeon…), which is totally useless in the long term: Instead, if we create a timeline with all the events related to Google Search (which we can discover simply by being well-informed), we begin to see where Google's heading: The timeline above confirms what Google itself openly declared: "Machine Learning is a core, transformative way by which we’re rethinking how we’re doing everything." Google is becoming a “Machine Learning-First Company,” as defined by Steven Levy in this post. Machine learning is becoming so essential in the evolution of Google and search, perhaps we should go beyond listening only to official Google spokespeople like Gary Illyes or John Mueller (nothing personal, just to be clear... for instance, read this enlightening interview of Gary Illyes by Woj Kwasi). Maybe we should start paying more attention to what people like Christine Robson, Greg Corrado, Jeff Dean, and the staff of Google Brain write and say. The second timeline tells us that starting in 2013 Google started investing money, intellectual efforts, and energy on a sustained scale in:
2013: The year when everything changedGoogle rolled out Hummingbird only three years ago, but it's not just a saying: that feels like decades ago. Let’s quickly rehash: what's Hummingbird? Hummingbird is the Google algorithm as a whole. It's composed of four phases:
This last phase, Search, is where we can find the “200+ ranking factors” (RankBrain included) and filters like Panda or anti-spam algorithms like Penguin. Remember that there are as many search phases as vertical indices exist (documents, images, news, video, apps, books, maps...). We SEOs tend to fixate almost exclusively on the Search phase, forgetting that Hummingbird is more than that. This approach to Google is myopic and does not withstand a very simple logical square exercise.
If even one of the three elements of the logical square is missing, organic visibility is missing; think about non-optimized AngularJS websites, and you’ll understand the logic. How can we be SEO Jedi if we only see one facet of the Force? Parsing and indexing: often forgottenOver the past 18 months, we've a sort of technical SEO Renaissance, as defined by Mike King in this fundamental deck and despite attempts to classify technical SEOs as makeup artists. On the contrary, we're still struggling to fully understand the importance of the Parsing and Indexing phases. Of course, we can justify that by claiming that parsing is the most complex of the four phases. Google agrees, as it openly declared when announcing SintaxNet. However, if we don't optimize for parsing, then we're not going to fully benefit from organic search, especially in the months and years to come. How to optimize for parsing and indexingAs a premise to parsing and indexing optimization, we must remember an oft-forgotten aspect of search, which Hummingbird highlighted and enhanced: entity search. If you remember what Amit Singhal said when he announced Hummingbird, he declared that it had “something of Knowledge Graph.” That part was — and I'm simplifying here for clarity's sake — entity search, which is based over two kinds of entities:
Why does entity search matter?It matters because entity search is the reason Google better understands the personal and almost unique context of a query. Moreover, thanks to entity search, Google better understands the meaning of the documents it parses. This means it's able to index them better and, finally, to achieve its main purpose: serving the best answers to the users' queries. This is why semantics is important: semantic search is optimizing for meaning. It's not a ranking factor, it's not needed to improve crawling, but it is fundamental for Parsing and Indexing, the big forgotten-by-SEOs algorithm phases. Semantics and SEOFirst of all, we must consider that there are different kinds of semantics and that, sometimes, people tend to get them confused.
Logical semanticsStructured data is the big guy right now in logical semantics, and Google (both directly and indirectly) is investing a lot in it. A couple of months ago, when the mainstream marketing gurusphere was discussing the 50 shades of the new Instagram logo or the average SEO was (justifiably) shaking his fists against the green “ads” button in the SERPs, Google released the new version of Schema.org. This new version, as Aaron Bradley finely commented here, improves the ability to disambiguate between entities and/or better explain their meaning. For instance, now:
At the same time, we shouldn't forget to always use the most important property of all: “SameAs”, one of few properties that's present in every Schema.org type. Finally, as Mike Arnesen recently explained quite well here on the Moz blog, take advantage of the semantic HTML attributes ItemRef and ItemID. How do we implement Schema.org in 2016?It is clear that Google is pushing JSON-LD as the preferred method for implementing Schema.org The best way to implement JSON-LD Schema.org is to use the Knowledge Graph Search API, which uses the standard Schema.org types and is compliant with JSON-LD specifications. As an alternative, you can use the recently rolled out JSON-LD Schema Generator for SEO tool by Hall Analysis. To solve a common complaint about JSON-LD (its volume and how it may affect the performance of a site), we can:
The importance Google gives to Schema.org and structured data is confirmed by the new and radically improved version of the Structured Data Testing Tool, which is now more actionable for identifying mistakes and test solutions thanks to its JSON-LD (again!) and Schema.org contextual autocomplete suggestions. Semantics is more than structured data #FTW!One mistake I foresee is thinking that semantic search is only about structured data. It's the same kind of mistake people do in international SEO, when reducing it to hreflang alone. The reality is that semantics is present from the very foundations of a website, found in:
HTMLSince its beginnings, HTML included semantic markup (e.g.: title, H1, H2...). Its latest version, HTML5, added new semantic elements, the purpose of which is to semantically organize the structure of a web document and, as W3C says, to allow “data to be shared and reused across applications, enterprises, and communities.” A clear example of how Google is using the semantic elements of HTML are its Featured Snippets or answer boxes. As declared by Google itself (“We do not use structured data for creating Featured Snippets”) and explained well by Dr. Pete, Richard Baxter, and very recently Simon Penson, the documents that tend to be used for answer boxes usually display these three factors:
The conclusion, then, is that semantic search starts in the code and that we should pay more attention to those "boring," time-consuming, not-a-priority W3C error reports. ArchitectureThe semiotician in me (I studied semiotics and the philosophy of language in university with the likes of Umberto Eco) cannot help but not consider information architecture itself as semantics. Let me explain. Everything starts with the right ontologyOntology is a set of concepts and categories in a subject area (or domain) that shows their properties and the relations between them. If we take the Starwars.com site as example, we can see in the main menu the concepts in the Star Wars subject area:
Ontology leads to taxonomy (because everything can be classified)If we look at Starwars.com, we see how every concept included in the Star Wars domain has its own taxonomy. For instance, the Databank presents several categories, like:
Ontology and taxonomy, then, lead to contextIf we think of Tatooine, we tend to think about the planet where Luke Skywalker lived his youth. However, if we visit a website about deep space exploration, Tatooine would be one of the many exoplanets that astronomers have discovered in the past few years. As you can see, ontology (Star Wars vs celestial bodies) and taxonomies (Star Wars planets vs exoplanets) determine context and help disambiguate between similar entities. Ontology, taxonomy, and context lead to meaningThe better we define the ontology of our website, structure its taxonomy, and offer better context to its elements, the better we explain the meaning of our website — both to our users and to Google. Starwars.com, again, is very good at doing this. For instance, if we examine how it structures a page like the one on TIE fighters, we see that every possible kind of content is used to help explain what a TIE fighter is:
In the case of characters like Darth Vader, the information can be even richer. The effectiveness of the information architecture of the Star Wars website (plus its authority) is such that its Databank is one of the very few non-Wikidata/Wikipedia sources that Google is using as a Knowledge Graph source. What tool can we use to semantically optimize the structure of a website?There are, in fact, several tools we can use to semantically optimize the information architecture of a website. Knowledge Graph Search APIThe first one is the Knowledge Graph Search API, because in using it we can get a ranked list of the entities that match given criteria. This can help us better define the subjects related to a domain (ontology) and can offer ideas about how to structure a website or any kind of web document. RelFinderA second tool we can use is RelFinder, which is one of the very few free tools for entity research. As you can see in the screencast below, RelFinder is based on Wikipedia. Its use is quite simple:
RelFinder will detect entities related to both (e.g.: George Lucas or Marcia Lucas), their disambiguating properties (e.g.: George Lucas as director, producer, and writer) and factual ones (e.g.: lightsabers as an entity related to Star Wars and first seen in Episode IV). RelFinder is very useful if we must do entity research on a small scale, such as when preparing a content piece or a small website. However, if we need to do entity research on a bigger scale, it's much better to rely on the following tools: AlchemyAPI and other toolsAlchemyAPI, which was acquired by IBM last year, uses machine and deep learning in order to do natural language processing, semantic text analysis, and computer vision. AlchemyAPI, which offers a 30-day trial API Key, is based on the Watson technology; it allows us to extract a huge amount of information from text, with concepts, entities, keywords, and taxonomy offered by default. Resources about AlchemyAPI
Others tools that allow us to do entity extraction and semantic analysis on a big scale are: Lexical semanticsAs said before, lexical semantics is that branch of semantics that studies the meaning of words and their relations. In the context of semantic search, this area is usually defined as keyword and topical research. Here on Moz you can find several Whiteboard Friday videos on this topic:
How do we conduct semantically focused keyword and topical research?Despite its recent update, Keyword Planner still can be useful for performing semantically focused keyword and topical research. In fact, that update could even be deemed as a logical choice, from a semantic search point of view. Terms like "PPC" and "pay-per-click" are synonyms, and even though each one surely has a different search volume, it's evident how Google presents two very similar SERPs if we search for one or the other, especially if our search history already exhibits a pattern of searches related to SEM. Yet this dimming of keyword data is less helpful for SEOs in that it makes for harder forecasting and prioritization of which keywords to target. This is especially true when we search for head terms, because it exacerbates a problem that Keyword Planner had: combining stemmed keywords that — albeit having "our keyword" as a base — have nothing in common because they mean completely different things and target very different topics. However (and this is a pro tip), there is a way to discover the most useful keyword, even when they all have the same search volume: how much advertisers bids for it. Trust the market ;-). (If you want to learn more about the recent changes to Keyword Planner, go read this post by Bill Slawski.) Keyword Planner for semantic searchLet's say we want to create a site about Star Wars lightsabers (yes, I am a Star Wars geek). What we could do is this:
Google will offer us these Ad Groups as results: The Ad Groups are a collection of semantically related keywords. They're very useful for:
Remember, then, that Keyword Planner allows us to do other kinds of analysis too, such as breaking down how the discovered keywords/Ad Groups are used by device or by location. This information is useful for understanding the context of our audience. If you have one or a few entities for which you want to discover topics and grouped keywords, working directly in Keyword Planner and exporting everything to Google Sheets or an Excel file can be enough. However, when you have tens or hundreds of entities to analyze, it's much better to use the Adwords API or a tool like SEO Powersuite, which allows you to do keyword research following the method I described above. Google Suggest, Related Searches, and Moz Keyword ExplorerAlongside with using Keyword Planner, we can use Google Suggest and Related Searches. Not for simply individuating topics that people search and then writing an instant blog post or a landing page about them, but for reaffirming and perfecting our site's architecture. Continuing with the example of a site or section specializing in lightsabers, if we look at Google Suggest we can see how "lightsaber replica" is one of the suggestions. Moreover, amongst the Related Searches for "lightsaber," we see "lightsaber replica" again, which is a clear signal of its relevance to "lightsaber." Finally, we can click on and discover "lightsaber replica"-related searches, thus creating what I define as the "search landscape" about a topic. The model above is not scalable if we have many entities to analyze. In that case, a tool like Moz Keyword Explorer can be helpful thanks to the options it offers, as you can see in the snapshot below: Other keywords and topical research sourcesRecently, Powerreviews.com presented survey results that state how Internet users tend to prefer Amazon over Google for searching information about a product (38% vs 35%). So, why not use Amazon for doing keyword and topical research, especially if we are doing it for ecommerce websites or for the MOFU and BOFU phases of our customers' journey? We can use the Amazon Suggest: Or we can use a free tool like the Amazon Keyword Tool by SISTRIX. The Suggest function, though, is present in (almost) every website that has a search box (your own site, even, if you have it well-implemented!). This means that if we're searching for more mainstream and top-of-the-funnel topics, we can use the suggestions of social networks like Pinterest (i.e.: explore the voluptous universe of the "lightsaber cakes" and related topics): Pinterest, then, is a real topical research goldmine thanks to its tagging system:On-pageOnce we've defined the architecture, the topics, and prepared our keyword dictionaries, we can finally work on the on-page facet of our work. The details of on-page SEO are another post for another time, so I'll simply recommend you read this evergreen post by Cyrus Shepard. The best way to grade the semantic search optimization of a written textis to use TF-IDF analysis, offered by sites like OnPage.org (which offers also a clear guide about the advantages and disadvantages of TF-IDF analyisis). Remember that TF-IDF can also be used for doing competitive semantic search analysis and to discover the keyword dictionaries used by our competitors. User behavior / Semiotics and contextIn the beginning of this post, we saw how Google is heavily investing in better understanding the meaning of the documents it crawls, so to better answer the queries users perform. Semantics (and semantic search) is only one of the pillars on which Google is basing this tremendous effort. The other pillar consists of understanding user search behaviors and the context of the users performing a search. User search behaviorRecently, Larry Kim shared two posts based on experiments he did, demonstrating his theory about how RankBrain is about factors like CTR and dwell time. While these posts are super actionable, present interesting information with original data, and confirm other tests conducted in the past, these so-called user signals (CTR and dwell time) may not be directly related to RankBrain but, instead, to user search behaviors and personalized search. Be aware, however, that my statement here above should be taken as a personal theory, because Google itself doesn't really know how RankBrain works. AJ Kohn, Danny Sullivan, and David Harry wrote additional interesting posts about RankBrain, if you want to dig into it (for the record, I wrote about it too here on Moz). Even if RankBrain may be included in the semantic search landscape due to its use of Word2Vec technology, I find it better to concentrate on how Google may use user search behaviors to better understand the relevance of the parsed and indexed documents. Click-through rateSince Rand Fishkin presented his theory — backed up with tests — that Google may use CTR as a ranking factor more than two years ago, a lot has been written about the importance of click-through rate. Common sense suggests that if people click more often on one search snippet than another that perhaps ranks in a higher position, then Google should take that users' signal into consideration, and eventually lift the ranking of the page that consistently receives higher CTR. Common sense, though, is not so easy to apply when it comes to search engines, and repeatedly Googlers have declared that they do not use CTR as a ranking factor (see here and here). And although Google has long since developed a click fraud detection system for Adwords, it's still not clear if it would be able to scale it for organic search. On the other hand — let me be a little bit conspiranoiac — if CTR is not important at all, then why Google has changed the pixels of the title tag and meta description? Just for "better design?" But as Eric Enge wrote in this post, one of the few things we know is that Google filed a patent (Modifying search result ranking based on a temporal element of user feedback, May 2015) about CTR. It's surely using CTR in testing environments to better calculate the value and grade of other rankings factors and — this is more speculative — it may give a stronger importance to click-through rate in those subsets of keywords that clearly express a QDF (Query Deserves Freshness) need. What's less discussed is the importance CTR has in personalized search, as we know that Google tends to paint a custom SERP for each of us depending on both our search history and our personal click-through rate history. They're key in helping Google determine which SERPs will be the most useful for us. For instance:
Finally, even if Google does not use CTR as a ranking factor, this doesn't mean it's not an important metric and signal for SEOs. We have years of experience and hundreds of tests proving how important is to optimize our search snippets (and now Rich Cards) with the appropriate use of structured data in order to earn more organic traffic, even if we rank worst than our competitors. Watch timeHaving good CTR metrics is totally useless if the pages our visitors land on don't fulfill the expectation the search snippet created. This is similar to the difference between a clickbait and a persuasive headline. The first will probably cause a click back to the search results page and the second, instead, will trap and engage the visitors. The ability of a site to retain its users is what we usually call dwell time, but that Google defines as watch time in this patent: Watch Time-Based Ranking (March 2013). This patent is usually cited in relation to video because the patent itself uses video as content example, but Google doesn't restrict its definition to videos alone: In general, "watch time" refers to the total time that a user spends watching a video. However, watch times can also be calculated for and used to rank other types of content based on an amount of time a user spends watching the content. Watch time is indeed a more useful user signal than CTR for understanding the quality of a web document and its content. Are you skeptical and don't trust me? Trust Facebook, then, because it also uses watch time in its news feed algorithm: We’re learning that the time people choose to spend reading or watching content they clicked on from News Feed is an important signal that the story was interesting to them. Context and the importance of personalized searchI usually joke and say that the biggest mistake a gang of bank robbers could do is bring along their smartphones. It'd be quite easy to do PreCrime investigations simply by checking their activity board, which includes their location history on Google Maps. In order to fulfill its mission of offering the best answers to its users, Google must not only understand the web documents it crawls so to index them properly, and not only improve its own ranking factors (taking into consideration the signals users give during their search sessions), but it also needs to understand the context in which users performs a search. Here's what Google knows about us: It's because of this compelling need to understand our context that Google hired the entire Behav.io team back in 2013. Behav.io, if you don't know already, was a company that developed an alpha test software based on its open source framework Funf (still alive), the purpose of which was to record and analyze the data that smartphones keep track of: location, speed, nearby devices and networks, phone activity, noise levels, et al. All this information is required in order to better understand the implicit aspects of a query, especially if done from a smartphone and/or via voice search, and to better process what Tom Anthony and Will Critchlow define as compound queries. However, personalized search is also determined by (again) entity search, specifically by search entities. The relation between search entities creates a "probability score," which may determine if a web document is shown in a determined SERP or not. For instance, let's say that someone performs a search about a topic (e.g.: Wookies) for which she never clicked on a search snippet of our site, but on another that had content about that same topic (e.g.: Wookieepedia) and which linked to the page about it on our site (e.g.: "How to distinguish one wookiee from another?"). Those links — specifically their anchor texts — would help our site and page to earn a higher probability score than a competitor site that isn't linked to by those sites present in the user's search history. This means that our page will have a better probability of appearing in that user's personalized SERP than our competitors'. You're probably asking: what's the actionable point of this patent?Link building/earning is not dead at all, because it's relevant not only to the Link Graph, but also to entity search. In other words, link building is semantic search, too. The importance of branding and offline marketing for SEOOne of classic complaints SEOs have about Google is how it favors brands. The real question, though, should be this: "Why aren't you working to become a brand?" Be aware! I am not talking about "vision," "mission," and "values" here — I'm talking about plain and simple semantics. All throughout this post I spoke of entities (named and search ones), cited Word2Vec (vectors are "vast amounts of written language embedded into mathematical entities"), talked about lexical semantics, meaning, ontology, personalized search, and implied topics like co-occurrences and knowledge base. Branding has a lot to do with all of these things. I'll try to explain it with a very personal example. Last May in Valencia I debuted as conference organizer with The Inbounder. One of the problems I faced when promoting the event was that "inbounder," which I thought was a cool name for an event targeting inbound marketers, is also a basketball term. The problem was obvious: how do I make Google understand that The Inbounder was not about basketball, but digital marketing? The strategy we followed from the very beginning was to work on the branding of the event (I explain more about The Inbounder story here on Inbound.org). We did this:
As a result, right now The Inbounder occupies all the first page of Google for its brand name and, more importantly in semantics terms, Google presents The Inbounder events as suggested and related searches. It associates it with all the searches I could ever want: Another example is Trivago and its global TV advertising campaigns: Trivago was very smart in constantly showing "Trivago" and "hotel" in the same phrase, even making their motto "Hotel? Trivago." This is a simple psychological trick for creating word associations. As a result, people searched on Google for "hotel Trivago" (or "Trivago hotel"), especially just after the ads were broadcasted: One of the results is that now, Google suggests "hotel Trivago" when we start typing "hotel" and, as in the case of The Inbounder, it presents "hotel Trivago" as a related search: Wake up SEOs, the new new Google is hereYes, it is. And it's all about better understanding web documents and queries in order to provide the best answers to its users (and make money in the meantime). To achieve this objective, ideally becoming the long-desired "Star Trek computer," Google is investing money, people, and efforts into machine/deep learning, neural networks, semantics, search behavior, context analysis, and personalized search. Remember, SEO is no longer just about "200 ranking factors." SEO is about making our websites become the sources Google cannot help but use for answering queries. This is exactly why semantic search is of utmost importance and not just something worth the attention of a few geeks passionate about linguistics, computer science, and patents. Work on parsing and indexing optimization now, seriously implement semantic search in your SEO strategy, take advantage of the opportunities personalized search offers you, and always put users at the center of everything you do. In doing so you'll build a solid foundation for your success in the years to come, both via classic search and with Google Assistant/Now. Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read! via The Moz Blog http://tracking.feedpress.it/link/9375/4039617
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
October 2016
Categories |