Search code examples
.htaccesscodeigniterhttp-status-code-404url-routingcodeigniter-2

CodeIgniter 404 Routing - 404 broken links and inability for google to crawl site BUT everything looks ok


I'm totally confused.

I run the site http://citylightstours.com

It is built on the CodeIgniter platform.

I noticed in Google Search Console that only 1 page of my site is indexed on Google. All other pages had 404 errors and hence google didnt list them.

I therefore thought it was a faulty sitemap so went to https://www.xml-sitemaps.com/ to generate a new one. I put in the root url and to my surprise only blog entries were contained in generated xml sitemap - NONE of the main pages of my site were there!!

I therefore went to another site to check for broken links http://www.brokenlinkcheck.com/ and to my extra surprise, every page on my site had a status of 404 broken link. HOWEVER, clicking on those links displays a valid page. They are therefore not broken links and i can navigate the site fine.

I therefore dont understand why automated robots come with a list of 404s and wont index the site, when all links appear to work!???

Any ideas?

THanks

UPDATE: I tried doing a Fetch and Render from search console too and a valid page that is displayed on browsers gives a Not Found error!

UPDATE 2: After doing site:citylightstours.com in google i notice that the ONLY pages indexed are the blog pages. All other pages have dropped out of the index - any ideas why???

UPDATE 3: One of the comments suggested it may be an issue with the .htaccess so i am posting it here in the hope that someone spots something. Thanks

UPDATE 4: After reading this post enter link description here I think it may be that the server returns a 404 error with the actual page code as the customer 404 human readable message!! As I said, I use codeigniter so it must have something to do with custom 404 page and routing. I dont know how to debug this though or even what to look at. Can anyone help?...THANKS!

<IfModule mod_rewrite.c>
# Development
    RewriteEngine On
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond $1 !^(index\.php|images|scripts|styles|vendor|robots\.txt)
    RewriteRule ^(.*)$ index.php/$1 [L]
</IfModule>

DirectoryIndex index.php
RewriteEngine on
RewriteCond $1 !^(index\.php|images|css|js|robots\.txt|favicon\.ico)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ ./index.php/$1 [L,QSA]


# ----------------------------------------------------------------------
# Better website experience for IE users
# ----------------------------------------------------------------------

<IfModule mod_setenvif.c>
  <IfModule mod_headers.c>
    BrowserMatch MSIE ie
    Header set X-UA-Compatible "IE=Edge,chrome=1" env=ie
  </IfModule>
</IfModule>

<IfModule mod_headers.c>
  Header append Vary User-Agent
</IfModule>


# ----------------------------------------------------------------------
# Webfont access
# ----------------------------------------------------------------------

<FilesMatch "\.(ttf|otf|eot|woff|font.css)$">
  <IfModule mod_headers.c>
    Header set Access-Control-Allow-Origin "*"
  </IfModule>
</FilesMatch>

# ----------------------------------------------------------------------
# Proper MIME type for all files
# ----------------------------------------------------------------------

# audio
AddType audio/ogg                      oga ogg

# video
AddType video/ogg                      .ogv
AddType video/mp4                      .mp4
AddType video/webm                     .webm

# Proper svg serving. Required for svg webfonts on iPad
#   twitter.com/FontSquirrel/status/14855840545
AddType     image/svg+xml              svg svgz 
AddEncoding gzip                       svgz

# webfonts                             
AddType application/vnd.ms-fontobject  eot
AddType font/truetype                  ttf
AddType font/opentype                  otf
AddType application/x-font-woff        woff

# assorted types                                      
AddType image/x-icon                   ico
AddType image/webp                     webp
AddType text/cache-manifest            appcache manifest
AddType text/x-component               htc
AddType application/x-chrome-extension crx
AddType application/x-xpinstall        xpi
AddType application/octet-stream       safariextz

# ----------------------------------------------------------------------
# gzip compression
# ----------------------------------------------------------------------

<IfModule mod_deflate.c>

<IfModule mod_setenvif.c>
  <IfModule mod_headers.c>
    SetEnvIfNoCase ^(Accept-EncodXng|X-cept-Encoding|X{15}|~{15}|-{15})$ ^((gzip|deflate)\s,?\s(gzip|deflate)?|X{4,13}|~{4,13}|-{4,13})$ HAVE_Accept-Encoding
    RequestHeader append Accept-Encoding "gzip,deflate" env=HAVE_Accept-Encoding
  </IfModule>
</IfModule>

<FilesMatch "^(?!.*\.ogg$|.*\.ogv$|.*\.mp4$).+" >

# html, txt, css, js, json, xml, htc:
<IfModule filter_module>
  FilterDeclare   COMPRESS
  FilterProvider  COMPRESS  DEFLATE resp=Content-Type /text/(html|css|javascript|plain|x(ml|-component))/
  FilterProvider  COMPRESS  DEFLATE resp=Content-Type /application/(javascript|json|xml|x-javascript)/
  FilterChain     COMPRESS
  FilterProtocol  COMPRESS  change=yes;byteranges=no
</IfModule>
</FilesMatch>

# webfonts and svg:
  <FilesMatch "\.(ttf|otf|eot|svg)$" >
    SetOutputFilter DEFLATE
  </FilesMatch>
</IfModule>

# ----------------------------------------------------------------------
# Expires headers (for better cache control)
# ----------------------------------------------------------------------

<IfModule mod_expires.c>
  ExpiresActive on

# Perhaps better to whitelist expires rules? Perhaps.
  ExpiresDefault                          "access plus 1 month"

# cache.appcache needs re-requests in FF 3.6 (thx Remy ~Introducing HTML5)
  ExpiresByType text/cache-manifest       "access plus 0 seconds"

# your document html 
  ExpiresByType text/html                 "access plus 0 seconds"

# data
  ExpiresByType text/xml                  "access plus 0 seconds"
  ExpiresByType application/xml           "access plus 0 seconds"
  ExpiresByType application/json          "access plus 0 seconds"

# rss feed
  ExpiresByType application/rss+xml       "access plus 1 hour"

# favicon (cannot be renamed)
  ExpiresByType image/x-icon              "access plus 1 week" 

# media: images, video, audio
  ExpiresByType image/gif                 "access plus 1 month"
  ExpiresByType image/png                 "access plus 1 month"
  ExpiresByType image/jpg                 "access plus 1 month"
  ExpiresByType image/jpeg                "access plus 1 month"
  ExpiresByType video/ogg                 "access plus 1 month"
  ExpiresByType audio/ogg                 "access plus 1 month"
  ExpiresByType video/mp4                 "access plus 1 month"
  ExpiresByType video/webm                "access plus 1 month"

# htc files  (css3pie)
  ExpiresByType text/x-component          "access plus 1 month"

# webfonts
  ExpiresByType font/truetype             "access plus 1 month"
  ExpiresByType font/opentype             "access plus 1 month"
  ExpiresByType application/x-font-woff   "access plus 1 month"
  ExpiresByType image/svg+xml             "access plus 1 month"
  ExpiresByType application/vnd.ms-fontobject "access plus 1 month"

# css and javascript
  ExpiresByType text/css                  "access plus 2 months"
  ExpiresByType application/javascript    "access plus 2 months"
  ExpiresByType text/javascript           "access plus 2 months"

  <IfModule mod_headers.c>
    Header append Cache-Control "public"
  </IfModule>

</IfModule>

# ----------------------------------------------------------------------
# ETag removal
# ----------------------------------------------------------------------

FileETag None

# ----------------------------------------------------------------------
# Stop screen flicker in IE on CSS rollovers
# ----------------------------------------------------------------------

# The following directives stop screen flicker in IE on CSS rollovers - in
# combination with the "ExpiresByType" rules for images (see above). If
# needed, un-comment the following rules.

# BrowserMatch "MSIE" brokenvary=1
# BrowserMatch "Mozilla/4.[0-9]{2}" brokenvary=1
# BrowserMatch "Opera" !brokenvary
# SetEnvIf brokenvary 1 force-no-vary

RewriteEngine On
RewriteCond %{HTTP_HOST} !^citylightstours\.com$ [NC]
RewriteRule ^(.*)$ http://citylightstours.com/$1 [R=301,L]
RewriteCond %{HTTP_USER_AGENT} libwww-perl.* 
RewriteRule .* ? [F,L]

Solution

  • Solved - the wordpress blog integrated in the site was setting the 404 status for all non wordpress pages i.e. codeigniter pages

    index.php of CI had the following code which needed to be commented out

    /*
     *---------------------------------------------------------------
     * WORDPRESS INTEGRATION
     *---------------------------------------------------------------
     * The ci_site_url function helps to avoid collision between WP & CI.
     */
    
     //header("HTTP/1.0 200 OK");
    
     define('WP_USE_THEMES', false);
     require_once './blog/wp-blog-header.php';
    
     add_filter('site_url', 'ci_site_url', 1);
    
        function ci_site_url()
        {
      include(APPPATH.'/config/config.php');
      return $config['base_url'];
        }