Pages

Thursday, June 6, 2019

Using Python to Check List of URLs

Recently at my job I had a need to check a long list of URLs to see if they were valid.  It was a simple list of links to images to make sure all of the images that a client had uploaded were going to work.  I didn't want to do them by hand, so I created this little Python script to do it for me.

Basically, all it does is to open a list of URLs and then use requests to submit that URL.  It then gets the response back.  If I get a 404, which means the file is not there, then it puts that URL in a list and moves on.  

Once the script is finished it puts the list of bad URLs into a file so the bad links can be shared with the client.

I'm not sure if anyone else will find this useful, but I've used it several times lately and wanted to share it.

The code is as follows:

import requests
import pandas
flaturl = []
print('importing file')
urllist = pandas.read_csv('urls.csv')
urllist2 = urllist.values.tolist()
print('flattening file')
for sublist in urllist2:
for item in sublist:
flaturl.append(item)
print('sorting list')
flaturl.sort(reverse = True)
brokenlist = []
print('checking urls')
for l in flaturl:
r = requests.get(l)
c = r.status_code
print(l, ' ', c)
if c != 200:
brokenlist.append(l)
#print(brokenlist)
df = pandas.DataFrame(brokenlist)
df.to_csv('BrokenURLs.csv', index=False)

Let me know if you find this useful. 

No comments: