MySQL reporting question

Bryan O'Neal Bryan.ONeal at TheONealAndAssociates.com
Tue Jun 22 09:30:30 MST 2010


If your going to drop acid drop mysql and use Casandra. I hate
Casandra because it makes no attempt at being acid complaint but it is
the best of your high performance "eventually/mostly consistent"
databases.  Mind you I think anyone who thinks not using an ACID
complaint database is a good idea did one to many hits of acid and is
now locked in a permenait halucination / delision - OR - just does not
give a $#*! about thier data. But your also talking to an accountant
who would have a kaniption is his $1.5Millon books were off by a
nickle and other may not care quite so much about their business.

That being said with MySQL you can set up replication very easily and
just run your reporting off of the replication box. Thus you can
devoting all of your resources to a single query without worry.  In
addition if you are using an INnoDB engine vs a MyIsam engine your
joins are far less expensive. In addition using inner joins vs
subquerys is much faster. And if you have many tables you can increase
performance by creating temp tables and views but again, you need to
understand your dealing with cached data so design as appropriate.

On Tue, Jun 22, 2010 at 9:05 AM, R P Herrold <herrold at owlriver.com> wrote:
> On Mon, 21 Jun 2010, der.hans wrote:
>
>> I need to match the most recently entered status from t2 with stuff from
>> t1 in MySQL.
>
>> t1.stuff, t2.status == $some_val where max(t2.timestamp) and t1.varchar1
>> == t2.varchar1 and t2.varchar2 == t2.varchar2
>
>> I can also add columns to either table.
>
> ugghhh -- you mention MySQL in the subject line, but just because one has a
> hammer does not mean everything is a nail. The MySQL database server kernel
> engine can use a journal and serialization of transactions to attain ACID.
>  This can slow things down a lot if used
>
> There is the 'new' interest in 'eventually consistent' databases.  Can you
> use 'almost the latest' data?  [some problem domains do not admit a use case
> permitting this, but some do: consider: DNS which might update detail all
> the way through only every couple of weeks in some cases]
>
> I would take a hint and amend my code to emit both the insert to t1 AND a
> destructive update write to 'last seen' location outside of the database.  A
> strict SQL approach is gonna kill you on retrieval time and lock
> serialization
>
> This avoids that killing JOIN
>
> I would also generate a hash of t1.varchar1, t2.varchar1, t2.varchar2 and
> t2.varchar2 as each is written and as you are sorting (selecting, but ... )
> on max(t2.timestamp) and generate a series of the first four directory
> permuted values, each in their own tree, and spread that across sufficient
> spindles that the write and read load performance can keep up
>
> As this point we need to look at the structure for the problem domain
>        t1.stuff, t2.status == $some_val
> and we lack enough to design further the structures needed ...
>
> I had to automate precompution of 'cribs' of subsets of data and
> pre-populate a cache for a call center application I wote, to keep up with
> the database read load on one project.  If I were doing it again, I would
> drill in even more 'out of database' cacheing
>
> Financial markets data is often like this -- one needs very fast access to
> arbitrary data, and usually just the latest trade or the lastest day's trade
> data; after that it is merely of less time critical access needs.  We
> recently spent several months with the trading-shim working in this area
>
> -- Russ herrold
>        614 488 6954
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>


More information about the PLUG-discuss mailing list