MySQL reporting question

R P Herrold herrold at owlriver.com
Tue Jun 22 09:05:22 MST 2010


On Mon, 21 Jun 2010, der.hans wrote:

> I need to match the most recently entered status from t2 with stuff from
> t1 in MySQL.

> t1.stuff, t2.status == $some_val where max(t2.timestamp) and t1.varchar1
> == t2.varchar1 and t2.varchar2 == t2.varchar2

> I can also add columns to either table.

ugghhh -- you mention MySQL in the subject line, but just 
because one has a hammer does not mean everything is a nail. 
The MySQL database server kernel engine can use a journal and 
serialization of transactions to attain ACID.  This can slow 
things down a lot if used

There is the 'new' interest in 'eventually consistent' 
databases.  Can you use 'almost the latest' data?  [some 
problem domains do not admit a use case permitting this, but 
some do: consider: DNS which might update detail all the way 
through only every couple of weeks in some cases]

I would take a hint and amend my code to emit both the insert 
to t1 AND a destructive update write to 'last seen' location 
outside of the database.  A strict SQL approach is gonna kill 
you on retrieval time and lock serialization

This avoids that killing JOIN

I would also generate a hash of t1.varchar1, t2.varchar1, 
t2.varchar2 and t2.varchar2 as each is written and as you are 
sorting (selecting, but ... ) on max(t2.timestamp) and 
generate a series of the first four directory permuted values, 
each in their own tree, and spread that across sufficient 
spindles that the write and read load performance can keep up

As this point we need to look at the structure for the problem 
domain
 	t1.stuff, t2.status == $some_val
and we lack enough to design further the structures needed ...

I had to automate precompution of 'cribs' of subsets of data 
and pre-populate a cache for a call center application I wote, 
to keep up with the database read load on one project.  If I 
were doing it again, I would drill in even more 'out of 
database' cacheing

Financial markets data is often like this -- one needs very 
fast access to arbitrary data, and usually just the latest 
trade or the lastest day's trade data; after that it is 
merely of less time critical access needs.  We recently spent 
several months with the trading-shim working in this area

-- Russ herrold
 	614 488 6954


More information about the PLUG-discuss mailing list