The previous blog entry entitled ‘Remote Hosted Sites and ISP Policies‘ within this series [Best Practices for Log File Management] discussed the challenges with respect to ISP policies and how they can impact your ability to get good data. The simple answer is to ensure that the ISP provides you access to your log files and that you warehouse them within part of your IT processes. As log files can provide you with an abundance of information, and we discussed earlier how they can be a source of ‘intellectual property’, a simple script can save you a lot of problems with respect to data.
In order to get your logs on a regular basis, you’ll need 3 things:
- An FTP address (i.e. ftp.yourWeb site.com)
- A username for the FTP account
- A password for the FTP account
Once you’ve got this information, just a few lines in a DOS-based batch file and a scheduler on a server, you can download the log from ‘yesterday’ on a nightly basis. In short form, the following roughly represents what would be included in a simple batch file to get a log file from today’s date minus 1 day.
File named: Log-download.bat
open ftp.yourWeb site.com
This simple batch file can be ran automatically by task schedulers or cron jobs on a nightly basis.
Now, depending on how your ISP inventories these log files, matters can become complicated in the sense that the date is often within the file name. In order to dynamically access the log from “today’s date minus 1 day”, additional scripting is required. While I consider this to require slightly more advanced knowledge of scripting and Dos, there is a program called ‘doff’ that I found online which enables you to calculate this variation in DOS; a good IT person can manage this aspect of the process for you.
In order to accomplish this, I find the simplest solution is to create a batch file which outputs a secondary batch file. The primary batch file can be automatically run at 1:00am for example and the secondary file (which is actually the output from the primary batch file) can be run at 1:05am. The secondary batch file has a dynamically modified reference in the ‘get’ command for the name of the log file.
Sample code for the primary batch file might look something like the following in order to generate the batch file as outlined above.
File named: Log-download-script.bat
echo open [ftp.site.ca] > logs.txt
echo [username] >> logs.txt
echo [password] >> logs.txt
echo binary >> logs.txt
echo cd [root folder] >> logs.txt
echo lcd "[destination folder]" >> logs.txt
echo prompt >> logs.txt
for /f "tokens=1-3 delims=/ " %%a in ('doff mm/dd/yy') do (
echo get ex%yy%%mm%%dd%.log >> logs.txt
for /f "tokens=1-3 delims=/ " %%d in ('doff mm/dd/yy -1') do (
echo get ex%cc%%aa%%bb%.log >> logs.txt
echo bye >> logs.txt
echo exit >> logs.txt
Finally, as there are many components and factors involved in this simple task, I recommend that the nightly process downloads that last 3 days, 5 days, or even 10 days of logs depending on their size. This is a failsafe way to avoid lost data due to internet failure, server failure, or numerous other factors.
In short, it is critical to download and warehouse your logs on a regular basis and it can very easily be automated so it does not become a burden among your list of many things to do each day or week.
PublicInsite Web Analytics Inc.
[Editor's note: For more information on log file management, be sure to read Tyler's ongoing series of blog posts on the topic starting with Best Practices for Log File Management.]This entry was posted in Best Practices for Log File Management, Blog, Web Analytics and tagged data management, data storage, isps, log files, web analytics. Bookmark the permalink.