Continuing from my previous post, I was now able to access my AltaVista server, however from a web browser I was unable to actually view any of the documents remotely.
In the pages though I did get the MS-DOS path to the usenet article in question:
Now how do I turn that into a URL?
Well as it turns out mod_rewrite does support regex, which in turn can do variable re-ordering!
After a bit of googling I found this page on stackoverflow, on how to convert a date between UK/US formats:
s/(\d{4})-(\d{2})-(\d{2})/\-$3-$2/
Simple, right? Â So what is going on here? Â The parenthesis define a variable set, and on the substitution part you can recall them with $1, $2 , $3 etc. Â So using this recipe I could take something like this:
u:\b227\comp\sys\laptops\3080
and convert it into the following:
http://debian7/usenet/b227/comp/sys/laptops/3080
The code for this would look something like this:
Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href="\"http://debian7/usenet/$1/$2/$3\""]Click for article|"
Although for some reason it’s embedding the URL’s even though I specified code formatting.
Now all I had to do was install IIS 4.0 off the Option Pack CD-ROM, onto my Windows NT 4.0 workstation, and create a virtual directory of /usenet which then pointed to the U: drive where AltaVista did it’s indexing.
So to this point that gives me a config file much like this:
ServerAdmin webmaster@localhost DocumentRoot /var/www SSLProxyEngine On ProxyPass "/altavista/" "https://10.12.0.16" ProxyPassReverse "/altavista/" "https://10.12.0.16/" ProxyRequests Off RewriteEngine On SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE AddOutputFilterByType SUBSTITUTE text/html #clean up urls Substitute "s|127.0.0.1:6688|debian7/altavista|n" Substitute "s|file:///C:\Program Files\DIGITAL\AltaVista Search\My Computer\images\|http://debian7/images/|n" #protect the page Substitute "s|launch=app||n" Substitute "s|?pg=config&what=init|?pg=h|n" #fix title Substitute "s|<IMG src=\"http://debian7/images/av_personal.gif\" alt=\"[AltaVista] \" BORDER=0 ALIGN=middle HEIGHT=72 VSPACE=0 HSPACE=0>|<a href=\"http://debian7/altavista\"><IMG src=\"http://debian7/images/av_personal.gif\" alt=\"[AltaVista] \" BORDER=0 ALIGN=middle HEIGHT=72 VSPACE=0 HSPACE=0>|--->|n" Substitute "s|u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\/\\">Click for article|" Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\\">Click for article|" Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\\">Click for article|" Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\\">Click for article|" Substitute "s|>u:.([a-z]{1,}[0-9]{3,})\\\([0-9a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\\">Click for article|" # Need links for the u:\news097f1\b120\comp\society\futures22 Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\/\/\\">Click for article|" Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\/\\">Click for article|" Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\/\\">Click for article|" Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z]{1,})\\\([a-z]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\/\\">Click for article|" # Need links for u:\news002f1\b1\fa.poli-sci Substitute "s|>u:.(news[0-9]{3,}f[0-9])\\\([b0-9]{1,})\\\([a-z\.\-]{1,})\\\([0-9]{1,})|---><a href=\"http://debian7/usenet/\/\/\/\\">Click for article|" <Location /usenet/> ProxyPass http://10.12.0.16/usenet/ RewriteEngine On SetOutputFilter INFLATE;SUBSTITUTE;DEFLATE AddOutputFilterByType SUBSTITUTE text/html </Location> bla bla rest of the 000-default crap....
Simple right?
So now I get a nicely formatted page, I can click the mountain icon, and I jump back to home, and I can click on the articles and, because I have no extensions or MIME types to intercept it’ll just download them to my PC. Â I guess I need to go through them all, convert them from UNIX format to MS-DOS, and stick a .txt extension on every single one of them.
I’m still thinking this thing is far too rickety to put on the internet, but we’ll see.
Couldn’t you change the locale settings on Windows?
I also wonder if someone has the workgroup edition…
I tried adding a mime type for ‘*’ to map to text/plain but that didn’t help. I was looking and the easiest way out is to simply rename every file. I think I can do it via perl, maybe even convert them all to DOS text format too. And with the regex just append .txt to everything that is a link , and it should magically work.