This weeks question is around website pages, PDFs and duplicate content:
I run an eCommerce construction site, on each product page I have the full specification and then have a PDF download with the same content.
Will the PDFs cause an issue with internal duplicate content?
This is an interesting question…
PDFs can be crawled by search engines and rank well, see the below example for the search phrase ‘SEO Guide’:
Now, because PDFs can be crawled and ranked, if they are the same as page content, they will cause internal duplication issues if the PDF content isn’t embedded (or flattened).
I can appreciate why PDFs might be important, a way around this would be to have the PDFs within ZIP or RAR files.
The contents of compressed file formats, such as ZIP or tar files, cannot be indexed.
More can be read here around crawlable and non crawlable files.
So you if want to get around duplicate pages and PDFs, add them to a compressed file.