Re: web file caching question

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
+ (text/plain)
Delete this message
Reply to this message
Author: David Schwartz via PLUG-discuss
Date:  
To: Main PLUG discussion list
CC: David Schwartz
Subject: Re: web file caching question
I asked ChatGPT to give me a solution in php8 for this. Here’s what it wrote:

<?php

$log_file = '/var/log/download.log'; // change this to your desired log file location
$download_folder = '/var/www/downloads'; // change this to your desired download folder location

function log_message($message) {
    global $log_file;
    $timestamp = date('Y-m-d H:i:s');
    $log_message = $timestamp . ' - ' . $message . PHP_EOL;
    file_put_contents($log_file, $log_message, FILE_APPEND);
}


if ($_SERVER['REQUEST_METHOD'] == 'POST') {
    $urls = $_POST['urls'];
    if (is_array($urls) && count($urls) > 0) {
        foreach ($urls as $url) {
            $filename = basename($url);
            $download_path = $download_folder . '/' . $filename;
            log_message('Downloading ' . $url . ' to ' . $download_path);
            $result = file_put_contents($download_path, fopen($url, 'rb'));


            if ($result === false) {
                log_message('Error downloading ' . $url);
            } else {
                log_message('Successfully downloaded ' . $url . ' to ' . $download_path);
            }
        }
        log_message('Downloaded ' . count($urls) . ' files');
    } else {
        log_message('No valid URLs provided');
    }
} else {
    log_message('Invalid request method');
}


?>

Does this look reasonable?

I’m not sure if this can be run on a shared hosting environment, however, b/c it would get triggered on ALL incoming requests, which is not needed.

It would need to listen on a specific port#. Is that possible without making it a daemon?

Can it be run only by requests coming into the CGI folder on a specific domain? ChatGPT is sort of going in circles at this point.

-David Schwartz




> On May 15, 2023, at 9:40 AM, David Schwartz via PLUG-discuss <> wrote:
>
> Hmmm, kind of like a remote wget …
>
> Actually,they’d tend to be done in batches, so I’d send a list of names to be copied.
>
> Is there a super-simple way for a php script handle one single POST request that only does one thing, without a ton of overhead needed for an entire REST-based service with multiple endpoints?
>
> Wget would probably be overkill assuming the php script can just issue an HTTP file download request.
>
> -David Schwartz
>
>
>
>> On May 15, 2023, at 9:28 AM, Bob Elzer via PLUG-discuss < <mailto:plug-discuss@lists.phxlinux.org>> wrote:
>>
>> What about setting up a CGI script on the Linux server that you pass the URL to, it could do a wget to retrieve the file to the directory you specify.
>>
>>
>> On Sun, May 14, 2023, 8:50 PM David Schwartz via PLUG-discuss < <mailto:plug-discuss@lists.phxlinux.org>> wrote:
>> I’m building a web app that uses a 3rd-party text-to-speech (TTS) service; it's one of many things supported by a REST service I’ve created that runs on a Windows host somewhere. This service sends requests to the TTS service and gets back a URL to an MP3 file on their server. These files are only there for about an hour before they get deleted.
>>
>> My service sends back those URLs to the client, which is typically running on a mobile device. They can be consumed without any problem at the moment, telling me the TTS provider has disable CORS restrictions.
>>
>> Many of the requests that will be made are unique and will never be duplicated, so the fact that their vocalizations (the MP3 files) get deleted after an hour is not a problem.
>>
>> However, some of them (20-30%) are very likely to be duplicated, and it’s worth saving them somewhere so they can be re-used in the future. (The TTS service charges based on characters sent to them, and by reusing the MP3 files over time, a lot of cost savings can accrue.)
>>
>> In my mind, I need to set up a way to cache these files somewhere.
>>
>> I don’t want to save them on the server that’s hosting the REST service because of the bandwidth costs.
>>
>> I’ve tried a few different things and it turns out this brings up CORS issues.
>>
>> I have my own web host and found out I can add a line to an htaccess file that will allow the files to be accessed. I can’t do that with hosted services like FileStack (which has other limitations as well).
>>
>> It looks like there’s a way to do it with Dropbox by changing the URL from this form:
>>
>> 
>> https://www.dropbox.com/s/x12nrtdi08ipo352/sample-abc.mp3?dl=0 <https://www.dropbox.com/s/x12nrtdi08ipo352/sample-abc.mp3?dl=0>
>>
>> 
>> to this form:
>>
>>
>> https://dl.dropboxusercontent.com/s/x12nrtdi08ipo352/sample-abc.mp3 <https://dl.dropboxusercontent.com/s/x12nrtdi08ipo352/sample-abc.mp3>
>>
>> So, this brings up the question of HOW TO MOVE THE FILES INTO THE CACHE?
>>
>> Here’s my biggest constraint: I can access the files for up to an hour by using the TTS server’s URLs. Within that time-frame, they need to have been moved over to the host that’s doing the caching. After that, the code will quickly check to see if the requests have already been processed and are in the cache; if so, it will return a URL to the cached file, saving a needless encoding request.
>>
>> If I use Dropbox, I can simply set up the Dropbox app on the server hosting my REST service, and save the files to the Dropbox file tree. They’ll be copied into Dropbox automatically. But this means I’ll have a modest cost associated with maintaining a Dropbox account for this specific purpose ($130/yr).
>>
>>
>> Alternatively, I can copy the files from the REST server over to my own host.
>>
>> What I’d like to ask this group is … what’s the best way to accomplish that?
>>
>> My host is currently on a shared reseller hosting plan, but as this scales-up, I’ll move it to a dedicated host.
>>
>> It’s running cPanel on a Linux server, probably running CentOS.
>>
>> I can set up cron jobs, and I’m told I can get some limited access to a shell (rsh probably) if needed.
>>
>> This is the same host I tested with the htaccess file to allow the files to be accessed without CORS issues.
>>
>> I’m wondering if it’s best to have the REST server copy the files from the TTS site to the other host somehow (eg, with FTP)?
>>
>> Or use something like rsync on the host to sweep files from the TTS site into the host, driven by a list provided by the REST service?
>>
>> Can I run rsync on a Windows host that copies files from server-A to server-B?
>>
>> Or maybe you guys have some better ideas? I’d love to hear some pros and cons about any solutions that might work.
>>
>> Note that I have not considered a cloud service other than FileStack (which I’ve ruled-out using). Regardless, the files will still need to be copied from the TTS provider’s site to the cache host before they get deleted. THIS process is what I’m wanting to resolve.
>>
>> -David Schwartz
>>
>>
>>
>>
>> ---------------------------------------------------
>> PLUG-discuss mailing list: <mailto:PLUG-discuss@lists.phxlinux.org>
>> To subscribe, unsubscribe, or to change your mail settings:
>> https://lists.phxlinux.org/mailman/listinfo/plug-discuss <https://lists.phxlinux.org/mailman/listinfo/plug-discuss>
>> ---------------------------------------------------
>> PLUG-discuss mailing list: <mailto:PLUG-discuss@lists.phxlinux.org>
>> To subscribe, unsubscribe, or to change your mail settings:
>> https://lists.phxlinux.org/mailman/listinfo/plug-discuss <https://lists.phxlinux.org/mailman/listinfo/plug-discuss>
>
> ---------------------------------------------------
> PLUG-discuss mailing list:
> To subscribe, unsubscribe, or to change your mail settings:
> https://lists.phxlinux.org/mailman/listinfo/plug-discuss


---------------------------------------------------
PLUG-discuss mailing list:
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss