Re: web file caching question

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
+ (text/plain)
Delete this message
Reply to this message
Author: Andrew McRobb via PLUG-discuss
Date:  
To: Main PLUG discussion list
CC: Andrew McRobb
Subject: Re: web file caching question
My company has been using AWS S3 very successfully for different types of
data and it's been extremely cheap.


Without knowing much about data/transfers going on I just put together a
rough estimate of $10 a month for 100GB for storage and transfer. I'm over
estimating (I hope). You can also tell S3 to auto delete files after a
certain amount of time. So it would remove the need to run a cron. Just
upload the file to a bucket using server code. Getting the URL from AWS API
and playback the audio file would be how I imagine it working. Hope that
helps!

On Sun, May 14, 2023 at 8:50 PM David Schwartz via PLUG-discuss <
> wrote:

> I’m building a web app that uses a 3rd-party text-to-speech (TTS) service;
> it's one of many things supported by a REST service I’ve created that runs
> on a Windows host somewhere. This service sends requests to the TTS service
> and gets back a URL to an MP3 file on their server. These files are only
> there for about an hour before they get deleted.
>
> My service sends back those URLs to the client, which is typically running
> on a mobile device. They can be consumed without any problem at the moment,
> telling me the TTS provider has disable CORS restrictions.
>
> Many of the requests that will be made are unique and will never be
> duplicated, so the fact that their vocalizations (the MP3 files) get
> deleted after an hour is not a problem.
>
> However, some of them (20-30%) are very likely to be duplicated, and it’s
> worth saving them somewhere so they can be re-used in the future. (The TTS
> service charges based on characters sent to them, and by reusing the MP3
> files over time, a lot of cost savings can accrue.)
>
> In my mind, I need to set up a way to cache these files somewhere.
>
> I don’t want to save them on the server that’s hosting the REST service
> because of the bandwidth costs.
>
> I’ve tried a few different things and it turns out this brings up CORS
> issues.
>
> I have my own web host and found out I can add a line to an htaccess file
> that will allow the files to be accessed. I can’t do that with hosted
> services like FileStack (which has other limitations as well).
>
> It looks like there’s a way to do it with Dropbox by changing the URL from
> this form:
>
> https://www.dropbox.com/s/x12nrtdi08ipo352/sample-abc.mp3?dl=0
>
>
> to this form:
>
> https://dl.dropboxusercontent.com/s/x12nrtdi08ipo352/sample-abc.mp3
>
>
> So, this brings up the question of HOW TO MOVE THE FILES INTO THE CACHE?
>
> Here’s my biggest constraint: I can access the files for up to an hour by
> using the TTS server’s URLs. Within that time-frame, they need to have been
> moved over to the host that’s doing the caching. After that, the code will
> quickly check to see if the requests have already been processed and are in
> the cache; if so, it will return a URL to the cached file, saving a
> needless encoding request.
>
> If I use Dropbox, I can simply set up the Dropbox app on the server
> hosting my REST service, and save the files to the Dropbox file tree.
> They’ll be copied into Dropbox automatically. But this means I’ll have a
> modest cost associated with maintaining a Dropbox account for this specific
> purpose ($130/yr).
>
>
> Alternatively, I can copy the files from the REST server over to my own
> host.
>
> What I’d like to ask this group is … what’s the best way to accomplish
> that?
>
> My host is currently on a shared reseller hosting plan, but as this
> scales-up, I’ll move it to a dedicated host.
>
> It’s running cPanel on a Linux server, probably running CentOS.
>
> I can set up cron jobs, and I’m told I can get some limited access to a
> shell (rsh probably) if needed.
>
> This is the same host I tested with the htaccess file to allow the files
> to be accessed without CORS issues.
>
> I’m wondering if it’s best to have the REST server copy the files from the
> TTS site to the other host somehow (eg, with FTP)?
>
> Or use something like rsync on the host to sweep files from the TTS site
> into the host, driven by a list provided by the REST service?
>
> Can I run rsync on a Windows host that copies files from server-A to
> server-B?
>
> Or maybe you guys have some better ideas? I’d love to hear some pros and
> cons about any solutions that might work.
>
> Note that I have not considered a cloud service other than FileStack
> (which I’ve ruled-out using). Regardless, the files will still need to be
> copied from the TTS provider’s site to the cache host before they get
> deleted. THIS process is what I’m wanting to resolve.
>
> -David Schwartz
>
>
>
>
> ---------------------------------------------------
> PLUG-discuss mailing list:
> To subscribe, unsubscribe, or to change your mail settings:
> https://lists.phxlinux.org/mailman/listinfo/plug-discuss
>

---------------------------------------------------
PLUG-discuss mailing list:
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss