Finding and removing user-generated spam on any site is one of the most important daily tasks of any website owner or SEO expert.
One of the common types of such spam is hidden text and links, which may be placed on your site by malicious users or code.
This may happen in various ways, including:
- The website accepts guest posts.
- The site comment system doesn’t properly filter submitted comments.
- The website is hacked and the hackers used it to benefit and inject hidden links.
- By accident, when a person copies and pastes text with some CSS styling from another webpage to backend editor.
According to Google, the following types of texts and links are in violation of Google’s Webmaster Guidelines:
- Using the same text color as the background color of the page (e.g., white text on white background).
- Text or links hidden using CSS techniques that are not visible to users but visible to search engines.
- Using links on one character in a paragraph (e.g., hyphen in the middle of a paragraph).
But not all hidden elements are considered deceptive.
For example, you may have a responsive website design and a hidden, mobile-specific menu or other elements for mobile users only. These are completely fine since users see the same content as search engines see on a specific device.
What you want to avoid are black hat SEO techniques which aim to manipulate Google using content that is invisible to website users but visible to Google.
In this post, we will explore the tactics you can use to manually spot hidden links or text on a webpage.
We will be using a sample test page with hidden spam text and links to illustrate how the tools work. The sample page below contains image and text as normally as any webpage.
1. Use a Browser Add-On Called Web Developer
When you install this add-on, you will see a gray gear icon on the upper right corner of your browser. By clicking on it, you will see many options.
You can also disable all CSS styles. (It is also available on FireFox with the same UI as on Google Chrome. )
I will begin by disabling inline styles because when users insert hidden content, they usually apply inline styles on elements like
style = "display:none" since they don’t have access to your stylesheet files on your server. They can only access the content editor like TinyMCE which lets them apply inline styles.
From the above example, you’ll find two text blocks which were invisible initially since they had inline
style = " font-size:0px;" and
style = "display:none" styles applied.
If disabling inline styles doesn’t give you any results, you may try disabling all styles.
By applying it you will see the following:
The difference is that it also removes all website styles and you see the text-only version of the website. Here, no one can bypass any hidden elements even if they are hidden via website CSS styles.
You will need to investigate unstyled webpages attentively in order to spot small unusual text or links. Pay attention to the anchor text “the”. It’s easy to miss, isn’t it?
Doing this will find hidden content existing in the HTML DOM at the time you disabled the styles.
So it is good practice to disable styles a few times after some period.
Many of you might have heard of the online tool called “Search engine SPAM detector.” I don’t recommend using it to find hidden content. When I ran my test page using this tool, it couldn’t spot any hidden text issues. Perhaps it is better used for spotting keyword stuffing issues instead.
This add-on will extract all links on the webpage and produce a report so you can spot any unusual link you are linking to. (On Firefox you can use the extension Link Gopher.)
Running this add-on revealed weird links which gave me an idea that something is wrong. This might be the quickest way to spot spam links.
If you find weird links using this tool, you can then start an in-depth audit using the first or third method.
3. Using Your Browser’s ‘Inspect’ Dev Tool
Using the first method is always great but if you are not attentive enough to details you may miss problems. I wrote a small JS code that highlights suspicious HTML elements.
If you have little technical skills, you can use your browser console to find hidden text and links with the help of the code snippet below.
On Google Chrome right click on the page > Inspect > Console (you may refer to this guide to find it for your browser).
var htmldoc = document.querySelectorAll('body *'); // css properties which used to hide text. Any value equal or less than specified will be considered as suspicious /* anyting less than font size 3, height 3px, */ var disallowed_cssproperties = [ [ "display", "none" ], [ "visibility", "hidden" ], [ "font-size", "3" ], [ "position", "absolute" ], [ "opacity", "0" ], [ "height", "3" ], [ "width", "3" ], [ "max-height", "3" ], [ "max-width", "3" ] ]; var reset_cssproperties = [ [ "display", "block" ], [ "visibility", "visible" ], [ "font-size", "40" ], [ "position", "relative" ], [ "opacity", "1" ], [ "height", "auto" ], [ "width", "auto" ], [ "max-height", "none" ], [ "max-width", "none" ] ]; var reported_html = ''; var skip_elements = [ 'script', 'style' ]; var bool_reset_all = false; for (i = 0; i < htmldoc.length; i++) for (k = 0; k < disallowed_cssproperties.length; k++) txt_color == bg_color // reset all elements css if( bool_reset_all ) for (i = 0; i < htmldoc.length; i++) if( skip_elements.includes(htmldoc[i].tagName.toLowerCase()) ) continue; htmldoc[i].style.position = 'relative'; htmldoc[i].style.top = 'auto'; htmldoc[i].style.left = 'auto'; htmldoc[i].style.bottom = 'auto'; htmldoc[i].style.right = 'auto'; htmldoc[i].style.display = 'block'; var tab = window.open('', '_blank'); tab.document.write(reported_html); // where 'html' is a variable containing your HTML tab.document.close(); // to finish loading the page console.log('Finished'); function RGBToHex(rgb) // Choose correct separator let sep = rgb.indexOf(",") > -1 ? "," : " "; // Turn "rgb(r,g,b)" into [r,g,b] rgb = rgb.substr(4).split(")").split(sep); let r = (+rgb).toString(16), g = (+rgb).toString(16), b = (+rgb).toString(16); if (r.length == 1) r = "0" + r; if (g.length == 1) g = "0" + g; if (b.length == 1) b = "0" + b; return "#" + r + g + b; function encodeHTML(str) return str.replace(/[u00A0-u9999<>&](?!#)/gim, function(i) return '&#' + i.charCodeAt(0) + ';'; );
You will see the following report in a new tab with suspicious HTML elements that are hidden to users by applying known CSS techniques.
Not every hidden element is bad. Again, there might be responsive hidden HTML elements such as menu, social icons etc. designed to display on different screen sizes. They are totally fine to have.
Beside this report, you will see those elements highlighted in red border on the webpage itself.
If you want to try this one more time, you will need to refresh the webpage and run it again. Running without refreshing will give you the wrong results on your second try.
Those who are not technically savvy might mistakenly think that this code may affect how site visitors see a website. For their information, the webpage will only be affected on the client’s side (whoever runs the code in console).
The techniques described above will help you manually audit any hidden text and links.
While there are software that can scan webpages and produce reports on the same issue, manually checking your pages is always a good idea because software follow certain algorithms and may not always detect all types of hidden text and links.
All screenshots taken by author, January 2019
Subscribe to SEJ
Get our daily newsletter from SEJ’s Founder Loren Baker about the latest news in the industry!
Share this post if you enjoyed! 🙂