Search for Broken Images

12 posts by 2 authors in: Forums > CMS Builder
Last Post: August 5, 2010   (RSS)

Hey,

Can anything think of a way to search for broken images, i.e.:

A table has an upload field, and each record has a an image "uploaded" - (has urlPath etc in SQL).

However, in some instances, the physical image is missing on the server.

So, the output is that there is a filepath/image, but it's not displayed as it's physically not there.

Is there anyway I can write a script which will check if the image urls (urlPath) works, or 404s?

I just need to be able to identify the records in question (with ultimate aim to delete the urlPath from the SQL, but I can do that once I have identified the broken paths).

Is that even possible?

Cheers!

Re: [chris] Search for Broken Images

By rjbathgate - August 2, 2010 - edited: August 2, 2010

Hey,

Thanks for the reply...

Only issue is I have 110,000 records, so ideally I was after a way to only display those which are broken...

The below will try to display the images so at least I can go thru and identify broken ones, but going thru a page of 110,000 results will be quite time consuming...

EDIT: cannot run the above, memory limit exhausted. I could break it down into limit/offset but again thats even more time consuming... [:(]

But logically I can't see a way, as the php can't know if the url is 404 or not... can it?

Cheers

Re: [rjbathgate] Search for Broken Images

By Chris - August 3, 2010

Hi rjbathgate,

How about this?

<?php header('Content-type: text/html; charset=utf-8'); ?>
<?php
require_once "C:/wamp/www/sb/CMS Builder/cmsAdmin/lib/viewer_functions.php";

function check404($url) {
$handle = @fsockopen("tcp://localhost", 80, $errno, $errstr, 5);
if (!$handle) { return; }

$url = str_replace(' ', '%20', $url);

$request = "GET " . $url . " HTTP/1.0\r\n\r\n";
fwrite($handle, $request);

$response = '';
while (!feof($handle)) {
$buffer = fgets($handle, 128);
if (!isset($buffer)) { break; } // prevent infinite loops on fgets errors
$response .= $buffer;
}

$httpStatusCode = null;
if ($response) {
list($header, $html) = preg_split("/(\r?\n){2}/", $response, 2);
if (preg_match("/^HTTP\S+ (\d+) /", $header, $matches)) { $httpStatusCode = $matches[1]; }
}

fclose($handle);

return intval($httpStatusCode);
}

$page = 0;
while (true) {
$page++;
list($uploads,) = getRecords(array(
'tableName' => 'uploads',
'perPage' => 100,
'pageNum' => $page,
'orderBy' => 'tableName, recordNum+0',
));
if (empty($uploads)) { break; }

foreach ($uploads as $upload) {
$httpStatusCode = check404($upload['urlPath']) . "<br />";
if ($httpStatusCode == 404) {
echo "<a href=\"/cmsAdmin/admin.php?menu={$upload['tableName']}&action=edit&num={$upload['recordNum']}\">";
echo "{$upload['tableName']} {$upload['recordNum']}";
echo "</a><br>";
}
}
}
?>


Does that help?
All the best,
Chris

Re: [chris] Search for Broken Images

Wow, thanks heaps!

It's almost working - although it's returning 404 on everything (ie including thoses which are valid).

I've checked the code through, and it's returning the right file paths throughout , i.e. checking the right path for the image, which is a valid url, but still 404 is returned under httpStatusCode

Is there likely to be some server specific limitations/settings preventing it from working?

Many thanks Chris

Re: [rjbathgate] Search for Broken Images

By Chris - August 4, 2010

Hi rjbathgate,

Glad to be of help! :)

A couple things to check:

1. What do you get when you do a check404() for a URL you know should work? e.g. check404('/')

2. What are the URLs it's trying to check? Can you post one?

I had the same problem until I added the str_replace for spaces.
All the best,
Chris

Re: [chris] Search for Broken Images

Hey Chris,

On checking '/' $httpStatusCode = 403 (forbidden)

On ../index.php I get 400 (bad request) so presume it needs to be root.

On http://www.domain.com/index.php I get 404

and on full path to root I get 404 too

So it doesn't seem to sucessfully get anything,.

Re what urls am I checking, the above I've checked basic ones (/, index.php etc)

And for the images, the urls being checked are in format:

/cmsAdmin/uploads/ist2_11299731-young-serious-man.jpg

for example.

Thanks heaps!
Rob

Re: [rjbathgate] Search for Broken Images

By Chris - August 4, 2010

Hypens and underscores are "safe" characters, so those URLs look fine.

Hmm, getting a 403 for / is troubling...

I wonder if you should be sending the Host header?

Try changing this:

$request = "GET " . $url . " HTTP/1.0\r\n\r\n";

...to this:

$request = "GET " . $url . " HTTP/1.0\r\nHost: www.mywebsite.com\r\n\r\n";

If that doesn't help, can you tell me what your host is and I can try some things from here?
All the best,
Chris

Re: [chris] Search for Broken Images

Hey

Same problem I'm afraid.

Will send u host details in email.

Thanks heaps,
Rob

Re: [rjbathgate] Search for Broken Images

Just thought the fopen script could help here maybe...

if(fopen($url, "r')
{
echo "this image is here";
}
else
{
echo "This image isn't here";
}


Will give this a go later on... might be barking up the wrong tree, but I've just used it successfully in a different instance, same principle thou...

Cheers