Got a text formatting/database question ("bash" it to hell?)

Jim March 1.jim.march at gmail.com
Tue Apr 14 18:34:30 MST 2009


Guys,

I have an interesting database problem that I think can be solved on
the command line in one shot.  But I don't know how :(.

I have a comma separated values text file.  Each line shows a voter ID
number and an election ID number they voted in.  NOT who they voted
for, and not their names, just that they voted in that election (cast
a ballot at all, even if blank).

There are multiple elections a given voter likely voted for.  So
here's the section for two voter IDs (first column) and the elections
they voted in (second column) plus the method used to vote (third
column) if it was early or mail-in (which I can ignore).  In pasting
it to EMail (from Openoffice spreadsheet used as a quick viewer)
they're separated by spaces but in the original data it's commas.

---
233	2	
233	3	
233	4	
233	5	
233	6	
233	7	
233	31	
233	32	
233	38	
233	41	
233	45	
233	55	
233	57	
233	95	
233	96	
235	2	
235	3	
235	4	
235	5	
235	6	
235	7	
235	31	Early Ballot
235	32	Early Ballot
235	38	
235	45	
235	55	
235	57	Early Ballot
235	95	Early Ballot
235	96	Early Ballot
235	125	
235	126	Early Ballot
235	143	
235	147	Early Ballot
235	148	Early Ballot
235	170	Early Ballot
---

So what I want to do is, strip out every line that does NOT have a
"170" in the second column, and then produce a line count.  I need to
know (like ASAP) how many people voted in election 170 as that's the
2006 RTA special election in Pima County now subject to a recount.
And then I can do a second pass using the same technique and find out
how many people filed an early ballot by stripping out those and
counting lines again (and doing basic subtraction).

Help?  This is about a criminal ivestigation going on right now
regarding this election...

Thanks!

Jim March


More information about the PLUG-discuss mailing list