… without the menu, sidebar and footer links
Lets say you want to know which articles about Donald Trump are listet here
https://www.blick.ch/dossiers/donald-trump/ and on all paginated pages like https://www.blick.ch/dossiers/donald-trump/page2/ …
You are just looking for these links:
Step 1: Find the related elements in the source code
Open inspect the code in Google Chrome:
Use the inspect tool and hover the element you want to inspect
I want to check the links so I hover them with the tool. In the tool the corresponding HTML is highlighted
Now I’m looking for a link to the highlighted article. It must be somewhere near. The red arrow looks good. If you can’t find something like a link it’s probably still nested. Open with the little arrows (marked with blue arrow)
If you select the HTML code the inspect tool will color the related parts in the site:
Step 2: Get Xpath
More about XPath
has an unclear citation style.citation and footnoting. Violates Wikipedia:External links: “Wikipedia articles may…en.wikipedia.org
In this guide, I’ll show you how to use Screaming Frog’s Custom Extraction feature to scrape schema markup, HTML…uproer.com
You can try like this
In this case its
For me having  or [any other number] in the Xpath is most of the times an indicator that this is not useful.
So let’s create the Xpath manually…
If you are looking for LinkURL in
<a href=”LinkURL”>Link Text</a>
The Xpath is
It’s getting all a-tags and there the @href attribute text
But we are looking for specific links not all.
could be an indicator to identify the links in the list.
The Xpath to address this is
which is looking for all a-tags with
Another option could be to use the parent div-tag (yellow arrow) with
and than the a child a-tag (blue arrow). It’s possible to use contains if you don’t want to check for all these class-names listed there
So summed up
is getting the div with class attribute containing “layout-item”
it getting the child a-tag
is getting the href-attribute content.
The Xpath starts with // (2 slashes) and separates with / (1 slash) hierarchically
Step 3: Xpath Screaming Frog SEO Spider
Configuration > Custom > Extraction
and add the 2 Xpath ideas e.g. like this:
Now run with a include filter, which just checks the needed folder + all paginated pages.
is the placeholder for
Now check in the Custom tab in Screaming Frog SEO Spider
The second try with a clickable seems to be wrong.
It lists menu items too:
The div layout-item a looks good
So this is the Xpath to work with:
Just run and collect the links 🙂
Share this post if you enjoyed! 🙂