SQL approach to Showing Every Nth Record
Earlier today, Ray Camden posted a quick ColdFusion-centric solution to showing only the Nth records of a query. His solution is fine, but I wonder how it would perform for large queries on a site under high load. Here's a database-centric solution that pulls back every Nth record, allowing CF to just display the final record set.
This example uses SQL Server 2005.
Step 1
There's a table in the database named "Districts". Let's get all the Districts for Texas.
Name
FROM
Districts
WHERE
state = 'TX'
ORDER BY
Name
This brings back 1,063 rows.

Step 2
Let's use the ROW_NUMBER() function to get a record count within the results.
Name,
ROW_NUMBER() OVER ( ORDER BY Name ) AS 'RowNumber'
FROM
Districts
WHERE
state = 'TX'
Notice that the ORDER BY clause is no longer needed at the end of the query. It's now called from within ROW_NUMBER(). We still have 1,063 records, but now we can start manipulating the results.

Step 3
We'll go ahead and select * from the results we just returned (now aliased as EXPR1), but we'll also get the MOD of each RowNumber using 5 as the divisor.
Name,
RowNumber,
(EXPR1.RowNumber % 5) AS ROW_MOD
FROM
(
SELECT
Name,
ROW_NUMBER() OVER ( ORDER BY Name ) AS 'RowNumber'
FROM
Districts
WHERE
state = 'TX'
) EXPR1
Still 1,063 rows, but we're getting close.

Step 4
Now we just select * from the last record set (aliased as EXPR2) WHERE ROW_MOD = 0 and we're done.
EXPR2.*
FROM
(
SELECT
Name,
RowNumber,
(EXPR1.RowNumber % 5) AS ROW_MOD
FROM
(
SELECT
Name,
ROW_NUMBER() OVER ( ORDER BY Name ) AS 'RowNumber'
FROM
Districts
WHERE
state = 'TX'
) EXPR1
) EXPR2
WHERE
EXPR2.ROW_MOD = 0

Conclusion
The final result? 212 records where RowNumber MOD 5 = 0. This is every 5th row of the original query.
Traditional ColdFusion approach: Bring back 1,063 records from the database, then either (a, cfoutput) iterate over the entire record set, checking if the currentrow MOD 5 = 0 or (b, cfloop) iterate over the entire record set, stepping by 5.
ColdFusion with smart SQL approach: Bring back the 212 records we need and display them.
I know which way I prefer. :)
Related Links
Jules Gravinese has posted how to Select Every Nth Record with MySQL.








WHERE (EXPR1.RowNumber % 5) = 0
Good catch.
select a.name,
a.rownum,
a.row_mod
from
(select name,
rownum,
mod(rownum,5) as row_mod
from districts
where state = 'TX') a
where a.row_mod = 0
I was curious to see just how ColdFusion would respond to a large query as you posited in your post. I built a table with 438,272 employee records and tried Ray's method and then yours. Interesting results running against Oracle...
First I ran the following CF code to extract every 1000th record from Oracle and then populate a new query object in CF.
The code executed in about 1 second which is about what I expected to see. I traced the SQL as Oracle executed it and it looked pretty clean: a hard parse of the SQL which took about 80,000 microseconds, execution which took a mere 97 microseconds, a single fetch of 438 records which took about 507,000 microseconds and a single network round trip to return the data to the client which took about 43,000 microseconds. So, the database spent what looks like roughly 3/4 of a second doing its thing and then coldfusion spent whatever was left of the second or so the total page took to run to fill the query object and dump the results.
Next I went through the same exercise except I did it Ray's way - retrieve all 438,272 records from Oracle and then have CF loop through the result set and use a mod function to load every 1000th row into a query object.
As expected, it ran slower. About 5 seconds total vs 1 second for your method. What is interesting though is why it ran slower. Looking at the SQL trace files this time showed two things: Oracle retrieved the data in chunks of 1454 rows at a time (this is determined by the default settings of the JDBC driver and will vary based on how wide your table rows are - retrieving more fields will reduce the number) and made 1 network roundtrip for each chunk of records fetched and returned to the client. Each fetch/roundtrip took about 8000 microseconds (give or take) which translated to roughly 2.5 seconds of elapsed time that Oracle spend executing the SQL and returning it to the client. The remainder of the page execution was spent by CF looping through the result set to load every 1000th record.
The question is, can you do anything to tune this query? The page already executes in like 5 seconds so it's not going to be much but... we can try to reduce the number of network roundtrips (which, for each chunk of records, is accounting for 65-75% of the total elapsed time) by overriding the default driver setting and telling Oracle to retrieve more data in each fetch.
ColdFusion documentation advises that the "blockfactor" attribute in CFQUERY can be specified and has an allowable range of 1-100. This isn't exactly correct, at least when it comes to Oracle. I'm not sure if the following holds true in other DBs but when connecting to Oracle you can specify pretty much whatever your driver that you are using will allow. In my case, I was able to specify a blockfactor of 32,767. Instead of making over 300 network roundtrips, Oracle made only 14 (438,272 / 32,767 = 13.375). 13 fetches in chunks of 32,767 rows and 1 to grab the remaining 12,301 rows. Total execution time was about 2-3 seconds - so, about half the time it took to run using the default driver setting. Not a huge difference in terms of actual elapsed time but effectively the page executed twice as fast.
So my initial hunch that the reason Ray's method ran 5-6 times slower than yours solely because of CF was in fact wrong - the DB shouldered more than half the additional work at first. Work that turned out to be excessive and unnecessary once we took a closer look. Again, this only speaks to Oracle which obviously leaves the legions of mysql and sql server users out of the equation. I'd be curious to know if you can take a similar approach for other DBs.
ADRIAN's method:
<cfscript>
// dsn
variables.dsn = "xxxxxx";
// variable to store the start time - start the clock now
startTime = timeformat(now(),"HH:mm:ss");
// query to store the results
variables.myQuery = querynew("employee_id");
</cfscript>
<!--- start SQL_TRACE --->
<cfstoredproc datasource="#variables.dsn#" procedure="dbms_monitor.session_trace_enable" />
<!--- query the EMPLOYEES table and grab every 1000th row using a modulo function --->
<cfquery name="test" datasource="#variables.dsn#" result="myResult" blockfactor="20000">
select *
from
(select employee_id,
rownum,
mod(rownum,1000) as row_mod
from skunkworks.employees) a
where row_mod = 0
</cfquery>
<!--- end SQL_TRACE --->
<cfstoredproc datasource="#variables.dsn#" procedure="dbms_monitor.session_trace_disable" />
<!--- now, load the data into a query --->
<cfoutput query="test">
<cfscript>
queryaddrow(myQuery);
querysetcell(myQuery, "employee_id", "#test.employee_id#");
</cfscript>
</cfoutput>
<cfscript>
// end time to stop the clock
endTime = timeformat(now(),"HH:mm:ss");
</cfscript>
<cfdump var="#myQuery#" top="10">
<cfdump var="#startTime#"><br />
<cfdump var="#endTime#">
RAY's method (with blockfactor specified):
<cfscript>
// dsn
variables.dsn = "xxxxxx";
// variable to store the start time - start the clock now
startTime = timeformat(now(),"HH:mm:ss");
// query to store the results
variables.myQuery = querynew("employee_id");
// mod divisor
variables.nth = 1000;
</cfscript>
<!--- start SQL_TRACE --->
<cfstoredproc datasource="#variables.dsn#" procedure="dbms_monitor.session_trace_enable" />
<!--- query the EMPLOYEES table and grab every 1000th row using a modulo function --->
<cfquery name="test" datasource="#variables.dsn#" result="myResult" blockfactor="32767" >
select employee_id,
rownum
from employees
</cfquery>
<!--- end SQL_TRACE --->
<cfstoredproc datasource="#variables.dsn#" procedure="dbms_monitor.session_trace_disable" />
<!--- now, load the data into a query --->
<cfoutput query="test">
<cfscript>
if(test.rownum mod 1000 eq 0)
{
queryaddrow(myQuery);
querysetcell(myQuery, "employee_id", "#test.employee_id#");
}
</cfscript>
</cfoutput>
<cfscript>
// end time to stop the clock
endTime = timeformat(now(),"HH:mm:ss");
</cfscript>
<cfdump var="#myQuery#" top="10">
<cfdump var="#startTime#"><br />
<cfdump var="#endTime#">