3 Ways to Find Hidden Spam Links & Text on a Webpage



Finding and removing user-generated spam on any site is one of the most important daily tasks of any website owner or SEO expert.

One of the common types of such spam is hidden text and links, which may be placed on your site by malicious users or code.

This may happen in various ways, including:

  • The website accepts guest posts.
  • The site comment system doesn’t properly filter submitted comments.
  • The website is hacked and the hackers used it to benefit and inject hidden links.
  • By accident, when a person copies and pastes text with some CSS styling from another webpage to backend editor.

According to Google, the following types of texts and links are in violation of Google’s Webmaster Guidelines:

  • Using the same text color as the background color of the page (e.g., white text on white background).
  • Text or links hidden using CSS techniques that are not visible to users but visible to search engines.
  • Using links on one character in a paragraph (e.g., hyphen in the middle of a paragraph).

But not all hidden elements are considered deceptive.

For example, you may have a responsive website design and a hidden, mobile-specific menu or other elements for mobile users only. These are completely fine since users see the same content as search engines see on a specific device.

What you want to avoid are black hat SEO techniques which aim to manipulate Google using content that is invisible to website users but visible to Google.

In this post, we will explore the tactics you can use to manually spot hidden links or text on a webpage.

We will be using a sample test page with hidden spam text and links to illustrate how the tools work. The sample page below contains image and text as normally as any webpage.

Sample Page

1. Use a Browser Add-On Called Web Developer

addon called Web Developeraddon called Web Developer

When you install this add-on, you will see a gray gear icon on the upper right corner of your browser. By clicking on it, you will see many options.

You can also disable all CSS styles. (It is also available on FireFox with the same UI as on Google Chrome. )

Apply disable inline styles of Web Developer addonApply disable inline styles of Web Developer addon

I will begin by disabling inline styles because when users insert hidden content, they usually apply inline styles on elements like style = "display:none" since they don’t have access to your stylesheet files on your server. They can only access the content editor like TinyMCE which lets them apply inline styles.

From the above example, you’ll find two text blocks which were invisible initially since they had inline style = " font-size:0px;" and style = "display:none" styles applied.

If disabling inline styles doesn’t give you any results, you may try disabling all styles.

By applying it you will see the following:

3 Ways to Find Hidden Spam Links & Text on a Webpage3 Ways to Find Hidden Spam Links & Text on a Webpage

The difference is that it also removes all website styles and you see the text-only version of the website. Here, no one can bypass any hidden elements even if they are hidden via website CSS styles.

You will need to investigate unstyled webpages attentively in order to spot small unusual text or links. Pay attention to the anchor text “the”. It’s easy to miss, isn’t it?

Doing this will find hidden content existing in the HTML DOM at the time you disabled the styles.

If, for instance, there is a malicious JavaScript code running after one minute and loads hidden content or links via AJAX then those elements won’t be visible. You will need to perform disable all styles action one more time to see it.

Let’s try it on our sample page and click on disable styles button one more time after some period. You’ll see in the screenshot below that one more hidden text is revealed which was previously not visible because it was injected in the page after some time via JavaScript.

Apply disable all styles of Web Developer addonApply disable all styles of Web Developer addon

So it is good practice to disable styles a few times after some period.

Yes, the same JavaScript can also remove content (i.e., inject it for a short time then remove it) but you also have the option to disable JavaScript in order to disable both CSS and JavaScript and deny any malicious code to undo its changes. Any injected malicious content on hacked website will be visible to you with this method.

Many of you might have heard of the online tool called “Search engine SPAM detector.” I don’t recommend using it to find hidden content. When I ran my test page using this tool, it couldn’t spot any hidden text issues. Perhaps it is better used for spotting keyword stuffing issues instead.

3 Ways to Find Hidden Spam Links & Text on a Webpage3 Ways to Find Hidden Spam Links & Text on a Webpage

This add-on will extract all links on the webpage and produce a report so you can spot any unusual link you are linking to. (On Firefox you can use the extension Link Gopher.)

Link Grabber addon reportLink Grabber addon report

Running this add-on revealed weird links which gave me an idea that something is wrong. This might be the quickest way to spot spam links.

If you find weird links using this tool, you can then start an in-depth audit using the first or third method.

3. Using Your Browser’s ‘Inspect’ Dev Tool

Using the first method is always great but if you are not attentive enough to details you may miss problems. I wrote a small JS code that highlights suspicious HTML elements.

If you have little technical skills, you can use your browser console to find hidden text and links with the help of the code snippet below.

On Google Chrome right click on the page > Inspect > Console  (you may refer to this guide to find it for your browser).

Google Chrome Console Google Chrome Console

Copy the JavaScript code below and paste it in the console. Click on new line keyboard button to run the code.

 
var htmldoc = document.querySelectorAll('body  *');
// css properties which used to hide text. Any value equal or less than specified will be considered as suspicious
/*
anyting less than font size 3, height 3px, 
*/
var disallowed_cssproperties = [ [ "display", "none" ],
		[ "visibility", "hidden" ], [ "font-size", "3" ],
		[ "position", "absolute" ], [ "opacity", "0" ], [ "height", "3" ],
		[ "width", "3" ], [ "max-height", "3" ], [ "max-width", "3" ] ];

var reset_cssproperties = [ [ "display", "block" ],
		[ "visibility", "visible" ], [ "font-size", "40" ],
		[ "position", "relative" ], [ "opacity", "1" ], [ "height", "auto" ],
		[ "width", "auto" ], [ "max-height", "none" ], [ "max-width", "none" ] ];

var reported_html = '';
var skip_elements = [ 'script', 'style' ];
var bool_reset_all = false;
for (i = 0; i < htmldoc.length; i++) 
	for (k = 0; k < disallowed_cssproperties.length; k++)   txt_color ==  bg_color  


// reset all elements css
if( bool_reset_all )
	for (i = 0; i < htmldoc.length; i++) 
if( skip_elements.includes(htmldoc[i].tagName.toLowerCase()) )
continue;

htmldoc[i].style.position = 'relative';
htmldoc[i].style.top = 'auto';
htmldoc[i].style.left = 'auto';
htmldoc[i].style.bottom = 'auto';
htmldoc[i].style.right = 'auto';
htmldoc[i].style.display = 'block';
	


var tab = window.open('', '_blank');
tab.document.write(reported_html); // where 'html' is a variable containing your HTML
tab.document.close(); // to finish loading the page
console.log('Finished');

 
function RGBToHex(rgb) 
  // Choose correct separator
  let sep = rgb.indexOf(",") > -1 ? "," : " ";
  // Turn "rgb(r,g,b)" into [r,g,b]
  rgb = rgb.substr(4).split(")")[0].split(sep);

  let r = (+rgb[0]).toString(16),
      g = (+rgb[1]).toString(16),
      b = (+rgb[2]).toString(16);

  if (r.length == 1)
    r = "0" + r;
  if (g.length == 1)
    g = "0" + g;
  if (b.length == 1)
    b = "0" + b;

  return "#" + r + g + b;

 
function encodeHTML(str)
    return str.replace(/[u00A0-u9999<>&](?!#)/gim, function(i) 
      return '&#' + i.charCodeAt(0) + ';';
    );


You will see the following report in a new tab with suspicious HTML elements that are hidden to users by applying known CSS techniques.

Not every hidden element is bad. Again, there might be responsive hidden HTML elements such as menu, social icons etc. designed to display on different screen sizes. They are totally fine to have.

ReportReport

Beside this report, you will see those elements highlighted in red border on the webpage itself.

3 Ways to Find Hidden Spam Links & Text on a Webpage3 Ways to Find Hidden Spam Links & Text on a Webpage

If you want to try this one more time, you will need to refresh the webpage and run it again. Running without refreshing will give you the wrong results on your second try.

Those who are not technically savvy might mistakenly think that this code may affect how site visitors see a website. For their information, the webpage will only be affected on the client’s side (whoever runs the code in console).

Conclusion

The techniques described above will help you manually audit any hidden text and links.

While there are software that can scan webpages and produce reports on the same issue, manually checking your pages is always a good idea because software follow certain algorithms and may not always detect all types of hidden text and links.

More Resources:


Image Credits

All screenshots taken by author, January 2019

Subscribe to SEJ

Get our daily newsletter from SEJ’s Founder Loren Baker about the latest news in the industry!

Ebook

Share this post if you enjoyed! 🙂



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *