web file caching question

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
+ (text/plain)
Delete this message
Reply to this message
Author: David Schwartz via PLUG-discuss
Date:  
To: plug-discuss
CC: David Schwartz
Subject: web file caching question
I’m building a web app that uses a 3rd-party text-to-speech (TTS) service; it's one of many things supported by a REST service I’ve created that runs on a Windows host somewhere. This service sends requests to the TTS service and gets back a URL to an MP3 file on their server. These files are only there for about an hour before they get deleted.

My service sends back those URLs to the client, which is typically running on a mobile device. They can be consumed without any problem at the moment, telling me the TTS provider has disable CORS restrictions.

Many of the requests that will be made are unique and will never be duplicated, so the fact that their vocalizations (the MP3 files) get deleted after an hour is not a problem.

However, some of them (20-30%) are very likely to be duplicated, and it’s worth saving them somewhere so they can be re-used in the future. (The TTS service charges based on characters sent to them, and by reusing the MP3 files over time, a lot of cost savings can accrue.)

In my mind, I need to set up a way to cache these files somewhere.

I don’t want to save them on the server that’s hosting the REST service because of the bandwidth costs.

I’ve tried a few different things and it turns out this brings up CORS issues.

I have my own web host and found out I can add a line to an htaccess file that will allow the files to be accessed. I can’t do that with hosted services like FileStack (which has other limitations as well).

It looks like there’s a way to do it with Dropbox by changing the URL from this form:

https://www.dropbox.com/s/x12nrtdi08ipo352/sample-abc.mp3?dl=0


to this form:


https://dl.dropboxusercontent.com/s/x12nrtdi08ipo352/sample-abc.mp3

So, this brings up the question of HOW TO MOVE THE FILES INTO THE CACHE?

Here’s my biggest constraint: I can access the files for up to an hour by using the TTS server’s URLs. Within that time-frame, they need to have been moved over to the host that’s doing the caching. After that, the code will quickly check to see if the requests have already been processed and are in the cache; if so, it will return a URL to the cached file, saving a needless encoding request.

If I use Dropbox, I can simply set up the Dropbox app on the server hosting my REST service, and save the files to the Dropbox file tree. They’ll be copied into Dropbox automatically. But this means I’ll have a modest cost associated with maintaining a Dropbox account for this specific purpose ($130/yr).


Alternatively, I can copy the files from the REST server over to my own host.

What I’d like to ask this group is … what’s the best way to accomplish that?

My host is currently on a shared reseller hosting plan, but as this scales-up, I’ll move it to a dedicated host.

It’s running cPanel on a Linux server, probably running CentOS.

I can set up cron jobs, and I’m told I can get some limited access to a shell (rsh probably) if needed.

This is the same host I tested with the htaccess file to allow the files to be accessed without CORS issues.

I’m wondering if it’s best to have the REST server copy the files from the TTS site to the other host somehow (eg, with FTP)?

Or use something like rsync on the host to sweep files from the TTS site into the host, driven by a list provided by the REST service?

Can I run rsync on a Windows host that copies files from server-A to server-B?

Or maybe you guys have some better ideas? I’d love to hear some pros and cons about any solutions that might work.

Note that I have not considered a cloud service other than FileStack (which I’ve ruled-out using). Regardless, the files will still need to be copied from the TTS provider’s site to the cache host before they get deleted. THIS process is what I’m wanting to resolve.

-David Schwartz




---------------------------------------------------
PLUG-discuss mailing list:
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss