How to capture the text contents on a webpage "inside" frame?

Kurt Granroth plug-discuss at granroth.org
Fri Jul 8 23:16:33 MST 2005


On Jul 8, 2005, at 3:20 PM, Josef Lowder wrote:
> I have a web page open in Konqueror that has a large text file open  
> within
> a "frame" window that is inside the host frame. (Not sure if I'm  
> using the
> right terms, here.)
>
> I want to capture the contents of this file, but haven't been able  
> to do so.
> I have plenty of memory and have captured other files of a similar  
> size with
> no problem, but there is something about this one that I can't get.
[snip]
> I also tried view document source thinking that perhaps I could  
> copy that
> and clean up all the superfluous html code, but that also doesn't  
> work.

Actually, I'm surprised that that method didn't work.  In 99% of the  
cases, all you need to do is look for "frame src='whatever.html'" and  
just get whatever.html directly.  I ran into one case a while back  
where I absolutely could not directly access the contents of a frame  
no matter what I tried... but that's the extreme rarity.

> Any suggestions?  Perhaps there is some way to capture whatever  
> text is
> suspended in memory?

If all else fails, you could try getting the file directly out of  
Konqueror's cache.  I don't know if the cache is found in different  
places on different systems or not.  On my SuSE 9.3 installation, the  
cache is in $HOME/.kde/cache-<HOST>/http.

Try something like:

find $HOME/.kde/cache-<HOST>/http -name "*THE_SITE_HOSTNAME*" | xargs  
fgrep "SOME TEXT FROM THE FILE"

That should do it.

Kurt


More information about the PLUG-discuss mailing list