Basically, all it does is to open a list of URLs and then use requests to submit that URL. It then gets the response back. If I get a 404, which means the file is not there, then it puts that URL in a list and moves on.
Once the script is finished it puts the list of bad URLs into a file so the bad links can be shared with the client.
I'm not sure if anyone else will find this useful, but I've used it several times lately and wanted to share it.
The code is as follows:
import requests
import pandas
flaturl = []
print('importing file')
urllist = pandas.read_csv('urls.csv')
urllist2 = urllist.values.tolist()
print('flattening file')
for sublist in urllist2:
for item in sublist:
flaturl.append(item)
print('sorting list')
flaturl.sort(reverse = True)
brokenlist = []
print('checking urls')
for l in flaturl:
r = requests.get(l)
c = r.status_code
print(l, ' ', c)
if c != 200:
brokenlist.append(l)
#print(brokenlist)
df = pandas.DataFrame(brokenlist)
df.to_csv('BrokenURLs.csv', index=False)
Let me know if you find this useful.