How to capture the text contents on a webpage "inside" frame?
Kurt Granroth
plug-discuss at granroth.org
Fri Jul 8 23:16:33 MST 2005
On Jul 8, 2005, at 3:20 PM, Josef Lowder wrote:
> I have a web page open in Konqueror that has a large text file open
> within
> a "frame" window that is inside the host frame. (Not sure if I'm
> using the
> right terms, here.)
>
> I want to capture the contents of this file, but haven't been able
> to do so.
> I have plenty of memory and have captured other files of a similar
> size with
> no problem, but there is something about this one that I can't get.
[snip]
> I also tried view document source thinking that perhaps I could
> copy that
> and clean up all the superfluous html code, but that also doesn't
> work.
Actually, I'm surprised that that method didn't work. In 99% of the
cases, all you need to do is look for "frame src='whatever.html'" and
just get whatever.html directly. I ran into one case a while back
where I absolutely could not directly access the contents of a frame
no matter what I tried... but that's the extreme rarity.
> Any suggestions? Perhaps there is some way to capture whatever
> text is
> suspended in memory?
If all else fails, you could try getting the file directly out of
Konqueror's cache. I don't know if the cache is found in different
places on different systems or not. On my SuSE 9.3 installation, the
cache is in $HOME/.kde/cache-<HOST>/http.
Try something like:
find $HOME/.kde/cache-<HOST>/http -name "*THE_SITE_HOSTNAME*" | xargs
fgrep "SOME TEXT FROM THE FILE"
That should do it.
Kurt
More information about the PLUG-discuss
mailing list