i Know Kung Foo Consulting

Please stop using SELECT MAX(id)

I know there are a lot of tutorials that use this technique. I also know there's a lot of inherited code (in any language) that contains this SQL. I also also know that - years ago - I too was a slave to SELECT MAX(id). But this is the 21st century and you need to stop using it.

The SQL examples shown here comes from part 7.3 of my OOP with ColdFusion Primer.

The Scenario

1. Insert record into database table.

<cfquery name="qCreateContact" datasource="#variables.DSN#">
INSERT INTO CONTACTS
(
   FIRST_NAME,
   LAST_NAME
)
VALUES
(
   'Adrian',
   'Moreno'
)
</cfquery>

2. Retrieve the ID of the record we just inserted

<cfquery name="qNewContact" datasource="#variables.DSN#">
SELECT MAX( CONTACT_ID ) AS NEW_ID
</cfquery>

<cfset newContactID = qNewContact.NEW_ID />

3. Insert record into a related table using the ID we just retrieved to relate it

<cfloop index="x" from="1"
   to="#arrayLen( categoryIDArray )#">


   <cfquery name="qCreateContactCategory" datasource="#variables.DSN#">
      INSERT INTO CONTACT_CATEGORIES
      (
         CONTACT_ID,
         CATEGORY_ID
         )
      VALUES
      (
         #newContactID#,
         #categoryIDArray[x]#
      )
   </cfquery>

</cfloop>

The Problem

Odds are, if you're using SELECT MAX(id), you're also NOT running these related queries within a transaction. It's also very possible you're not even using a database that is capable of using transactions.

When you're running this code on your local development box, you won't ever see a problem with this code. If your application is only ever used by a few users, most likely you'll never have an issue. It's when your user base increases and the number of concurrent requests to the database start to stack up, the MAX(id) you expect will not be the MAX(id) you get back.

Multiple threads and concurrent requests

Picture the gaps between the cfquery tags in each step. Now picture that the queries are cars driving in the same lane down the highway. That lane is your request for a new record in CONTACTS, the new Contact's ID and the association of that Contact with its related Categories in CONTACT_CATEGORIES.

Now picture there's a lane next to you with another set of cars. That's someone else's Contact creation request. Each lane can only handle so many cars and each individual car runs at a different speed. What happens when all the lanes are full and there are cars ahead and behind you?

You know some jerk is going to swerve in between the cars in the next lane just to try and get where he's going just a little faster.

At some point, a car from another lane is going to jump ahead of or behind the car containing the "SELECT MAX(id)" that should be in your request. The car could be the other lane's INSERT (car1) or "SELECT MAX(id)" (Car2). It's even possible that a car on your left and a car on your right jump into your lane at the same time. Now you're really in trouble.

You have to write your queries so that not only do your cars follow each other in a certain order, you also have to make sure that you use a process that retrieves your new CONTACT_ID and blocks cars in another lane from jumping into yours.

The Solutions

Most modern databases have built-in functionality to allow you to retrieve the most recently created primary key in any table. Depending on your needs, you can even safely create your own primary key.

Here are three common approaches to getting the most recently created Primary Key of a table:

1. Auto-incrementing Primary Key functions

Tables in these databases can be defined to automatically create a new Primary Key as each record is created.

MS-Access

First off, here's Nine Reasons NOT To Use MS Access To Power A DB-Driven Website. But if you absolutely MUST use it, Bruce Johnson has some code here that will safely get the last inserted ID.

SQL Server through version 7 *

Pretty much the only thing you can use here is @@IDENTITY

Step 2 for SQL Server through version 7
<cfquery name="qNewContact" datasource="#variables.DSN#">
SELECT @@IDENTITY AS NEW_ID
</cfquery>

<cfset newContactID = qNewContact.NEW_ID />

SQL Server 2000 through current *

You have three choices here.

1. @@IDENTITY: This is an intrinsic variable in SQL Server that contains the Id of the record that was last created on this connection, regardless of the table. (See code above.)

The problem with @@IDENTITY is that if you do an INSERT to TableA and you also have a Trigger that then causes an INSERT to TableB, then the value of @@IDENTITY shifts from the newest ID in TableA to the newest ID in TableB.

2. IDENT_CURRENT('table'): This function returns the ID of the record that was last created in the specified table.

Step 2 for SQL Server 2000 or greater
<cfquery name="qNewContact" datasource="#variables.DSN#">
SELECT IDENT_CURRENT('CONTACTS') AS NEW_ID
</cfquery>

<cfset newContactID = qNewContact.NEW_ID />

The issue here is that it's possible by the time you call IDENT_CURRENT('CONTACTS') from your request, another request has inserted a record in that table.

3. SCOPE_IDENTITY(): This variable contains the Id of the last record that was created within the current scope, regardless of the table.

Step 2 for SQL Server 2000 or greater
<cfquery name="qNewContact" datasource="#variables.DSN#">
SELECT SCOPE_IDENTITY() AS NEW_ID
</cfquery>

<cfset newContactID = qNewContact.NEW_ID />

This seems to be the safest as it disregards IDs created by Triggers or some other process and sticks to the ID created by your connection's INSERT.

* Some info on MS-Access and SQL Server quoted or paraphrased from this post by Bruce Johnson.

MySQL

LAST_INSERT_ID(): From the documentation - "For LAST_INSERT_ID(), the most recently generated ID is maintained in the server on a per-connection basis. It is not changed by another client. It is not even changed if you update another AUTO_INCREMENT column with a non-magic value (that is, a value that is not NULL and not 0). Using LAST_INSERT_ID() and AUTO_INCREMENT columns simultaneously from multiple clients is perfectly valid. Each client will receive the last inserted ID for the last statement that client executed."

Step 2 for MySQL
<cfquery name="qNewContact" datasource="#variables.DSN#">
SELECT LAST_INSERT_ID() AS NEW_ID
</cfquery>

<cfset newContactID = qNewContact.NEW_ID />

As a matter of fact, you could even eliminate Step 2 and just use Steps 1 and 3 in this manner:

Removing Step 2 for MySQL
<cfquery name="qCreateContact" datasource="#variables.DSN#">
INSERT INTO CONTACTS
(
   FIRST_NAME,
   LAST_NAME
)
VALUES
(
   'Adrian',
   'Moreno'
)
</cfquery>

<cfloop index="x" from="1"
   to="#arrayLen( categoryIDArray )#">


   <cfquery name="qCreateContactCategory" datasource="#variables.DSN#">
      INSERT INTO CONTACT_CATEGORIES
      (
         CONTACT_ID,
         CATEGORY_ID
         )
      VALUES
      (
         LAST_INSERT_ID(),
         #categoryIDArray[x]#
      )
   </cfquery>

</cfloop>

You could do that same for SQL Server by replacing LAST_INSERT_ID() with SCOPE_IDENTITY().

DB2

IDENTITY_VAL_LOCAL(): If your table has been created using an IDENTITY column for its Primary Key, then you can retrieve a new ID using this function.

Step 2 for DB2 when using IDENTITY column
<cfquery name="qNewContact" datasource="#variables.DSN#">
SELECT IDENTITY_VAL_LOCAL() AS NEW_ID
</cfquery>

<cfset newContactID = qNewContact.NEW_ID />

2. Sequences

Tables in these databases use associated objects called SEQUENCES to manage ID values for their Primary Keys. Sequences don't necessarily increment by 1 like auto-generated keys. You can also manipulate a sequence on the fly, but that's another topic. The point is that there are functions which allow you to access the value of a new record's ID when that ID is based on an associated sequence.

Generic Sequence CREATE statement
CREATE SEQUENCE CONTACT_ID_SEQ
START WITH 1
INCREMENT BY 1

DB2

Some versions of DB2 can use IDENTITY, some can't. Those that can't, use SEQUENCES. Using the generic Sequence, here's how we'd change the INSERT Statement in Step 1:

Step 1 for DB2 using a Sequence
<cfquery name="qCreateContact" datasource="#variables.DSN#">
INSERT INTO CONTACTS
(
   CONTACT_ID,
   FIRST_NAME,
   LAST_NAME
)
VALUES
(
   NEXT VALUE FOR CONTACT_ID_SEQ,
   'Adrian',
   'Moreno'
)
</cfquery>

This would create a record with CONTACT_ID = 1 since this would be the first record in the table and we started the sequence at 1. So now we need to get the current value of the sequence to associate this new Contact to its Categories.

Step 2 for DB2 using a Sequence
<cfquery name="qNewContact" datasource="#variables.DSN#">
SELECT PREVIOUS VALUE FOR CONTACT_ID_SEQ AS NEW_ID
</cfquery>

<cfset newContactID = qNewContact.NEW_ID />

"The PREVIOUS VALUE expression returns the most recently generated value for the specified sequence for a previous statement within the current application process."

Documentation for Sequences on DB2.

Oracle

Oracle uses the same generic SQL to create a Sequence, but it has a few differences on its attributes. Check the documentation for more info. Sequences work the same with Oracle as they do with DB2, the difference is that Oracle uses a dot syntax to perform operations on them.

Step 1 for Oracle using a Sequence
<cfquery name="qCreateContact" datasource="#variables.DSN#">
INSERT INTO CONTACTS
(
   CONTACT_ID,
   FIRST_NAME,
   LAST_NAME
)
VALUES
(
   CONTACT_ID_SEQ.NEXTVAL,
   'Adrian',
   'Moreno'
)
</cfquery>

Both accomplish the same thing: they generate the next value of the sequence and pass it to some SQL operation. Getting the current value of the sequence makes more sense in Oracle than it does in DB2.

Step 2 for Oracle using a Sequence
<cfquery name="qNewContact" datasource="#variables.DSN#">
SELECT CONTACT_ID_SEQ.CURRVAL AS NEW_ID
</cfquery>

<cfset newContactID = qNewContact.NEW_ID />

Again, we could remove Step 2 and jump straight to Step 3

Bypassing Step 2 in Oracle using a Sequence
<cfloop index="x" from="1"
   to="#arrayLen( categoryIDArray )#">


   <cfquery name="qCreateContactCategory" datasource="#variables.DSN#">
      INSERT INTO CONTACT_CATEGORIES
      (
         CONTACT_ID,
         CATEGORY_ID
         )
      VALUES
      (
         CONTACT_ID_SEQ.CURRVAL,
         #categoryIDArray[x]#
      )
   </cfquery>

</cfloop>

Documentation for Sequences on Oracle.

PostgreSQL

PostgreSQL uses the same generic SQL to create a Sequence, differences in attributes are in the documentation. We can use Sequences in PostgreSQL just as we do with DB2 or Oracle, but PostgreSQL opted to use a function-based syntax. You can also run into case sensitivity issues, so check the documentation for those.

Step 1 for PostgreSQL using a Sequence
<cfquery name="qCreateContact" datasource="#variables.DSN#">
INSERT INTO CONTACTS
(
   CONTACT_ID,
   FIRST_NAME,
   LAST_NAME
)
VALUES
(
   NEXTVAL('CONTACT_ID_SEQ'),
   'Adrian',
   'Moreno'
)
</cfquery>

But at least they go with Oracle's more sensible approach to getting a sequence's current value:

Step 2 for PostgreSQL using Sequences
<cfquery name="qNewContact" datasource="#variables.DSN#">
SELECT CURRVAL('CONTACT_ID_SEQ') AS NEW_ID
</cfquery>

<cfset newContactID = qNewContact.NEW_ID />

Documentation for Sequences on PostgreSQL.

Summary

Get the next (new) value of a sequence (Primary Key):

DB2:

NEXT VALUE FOR CONTACT_ID_SEQ

Oracle:

CONTACT_ID_SEQ.NEXTVAL

PostgreSQL:

NEXTVAL('CONTACT_ID_SEQ')

Get the current value of a sequence (Primary Key):

DB2:

PREVIOUS VALUE FOR CONTACT_ID_SEQ

Oracle:

CONTACT_ID_SEQ.CURRVAL

PostgreSQL:

CURRVAL('CONTACT_ID_SEQ')

3. User Defined Value

The final option can be used with any database. The downside is that these keys are usually going to be of a string data type and that could have some impact on indexing and data look ups as your data set grows.

Rather than create some arbitrary string of characters on your own. You would most often opt to use what's called a UUID, a universally Unique Identifier.

ColdFusion, for example, has a function called CreateUUID() as of version 4.01 that creates a unique 35 character ID. Many other programming languages have similar functions.

So here are our SQL queries using our own User Defined Value for a CONTACT_ID:

Step 1 and 3 using a UUID
<cfset newContactID = createUUID() />

<cfquery name="qCreateContact" datasource="#variables.DSN#">
INSERT INTO CONTACTS
(
   CONTACT_ID,
   FIRST_NAME,
   LAST_NAME
)
VALUES
(
   #newContactID#,
   'Adrian',
   'Moreno'
)
</cfquery>

<cfloop index="x" from="1"
   to="#arrayLen( categoryIDArray )#">


   <cfquery name="qCreateContactCategory" datasource="#variables.DSN#">
      INSERT INTO CONTACT_CATEGORIES
      (
         CONTACT_ID,
         CATEGORY_ID
         )
      VALUES
      (
         #newContactID#,
         #categoryIDArray[x]#
      )
   </cfquery>

</cfloop>

Since we already know the value of the new Contact ID, there's no need for Step 2.

Summary

The examples shown here use separate queries for each step to walk you through the required process for each database. Certainly you can combine these queries under a single connection (single CFQUERY) by separating them with a semi-colon ( ; ).

Even with separate connections for each step, you'll want to wrap the entire process in a single database transaction to ensure you're getting the correct new ID and to further insulate this series of related queries from those of another request. You'll also want to add in a try/catch block to roll back the transaction in case anything goes wrong. If you're unfamiliar with CFTRANSACTION, CFTRY and CFCATCH, you can read about them here.

Hopefully, this has given you insight on parts of a database that may have gone unseen until now. Trust me, if you poke around a bit, you'll find more and more reasons to code directly in your database instead of in your application. Getting the newest Primary Key value is only the tip of the iceberg.


Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
radekg's Gravatar in MSSQL Server 2005+ you can also do:

DECLARE @MyTableVar table( identColumn int [or whatever data type you need] NOT NULL );
INSERT INTO ... ( ... )
OUTPUT INSERTED.yourIdent INTO @MyTableVar
VALUES ( ... )
select identColumn AS lastId FROM @MyTableVar
# Posted By radekg | 6/2/08 8:26 AM
Ben Nadel's Gravatar AMEN! Way to have a really thorough, comprehensive post!
# Posted By Ben Nadel | 6/2/08 10:08 AM
Jason Dean's Gravatar Adrian - Awesome post. Thank you. I have spent a lot of time querying for these types on tricks for the DBMSs that I use. It will be nice to have examples for the others in my del.icio.us now :).

One thing that I think many have missed, is that in ColdFusion 8 cfquery they have added the row id to the results variable. This is also an option for getting back that primary key. I have not played with it much, as we have not yet fully upgraded to CF8, and I have not tested it on more than a couple of the DBMSs, but it looks like it might be a great solution (although not a backward compatible one). Have you looks at this?
# Posted By Jason Dean | 6/2/08 11:44 AM
William Bowen's Gravatar In Oracle I always thought it was better to get the value from the sequence first and then insert it, like so:

<cftransaction>
   <!--- Get the next value in sequence --->
   <cfquery name="qGetNewContactId" datasource="#Variables.dsn#">
      SELECT contact_id_seq.nextVal AS new_id
   </cfquery>
   <!--- Insert the ID with the new record --->
   <cfquery name="qCreateContact" datasource="#Variables.dsn#">
   INSERT INTO contacts (
    contact_id,
    first_name,
    last_name
   ) VALUES (
    <cfqueryparam cfsqltype="cf_sql_integer" value="#qGetNewContactId.new_id#" />,
    'Adrian',
    'Moreno'
   )
   </cfquery>
   <!--- Transaction will automatically be committed. We can return the ID of the record we just created without having to SELECT again from the database --->
<cftransaction>

When you run the first SQL statement the sequence will automatically increment, so any further SELECT statements against that sequence will return a higher value.

Not sure about this, but couldn't currVal run in to the same issue as MAX(id) if run outside the scope of a transaction and after the INSERT statement? What if another value had been inserted (thus incrementing the sequence) before you got the currVal?

Thanks for your article.
# Posted By William Bowen | 6/2/08 11:58 AM
Terry Schmitt's Gravatar Excellent write-up.
Don't forget the result structure of cfquery, which can return the inserted ID.
# Posted By Terry Schmitt | 6/2/08 12:28 PM
Matt Newby's Gravatar Better yet, for Oracle, use a combination of a sequence and a trigger:

CREATE OR REPLACE TRIGGER "MYSCHEMA".APPUSERS_BI_01
before insert on APPUSERS
for each row
begin
select app_users_seq.nextval
into :new.app_user_id
from sys.dual;
:new.CREATE_DATE := sysdate;
end;
/

Then just ignore the app_user_id and create_date fields in your insert query. No transaction needed at that point.

-matt
# Posted By Matt Newby | 6/2/08 12:33 PM
Jeff Coughlin's Gravatar Great article (and thorough). I'll have to bookmark this one :)
# Posted By Jeff Coughlin | 6/2/08 1:16 PM
Adrian J. Moreno's Gravatar Thanks for the feedback everyone.

@radekg: Until last week, I hadn't used SQL Server since 2000. Thanks for that.

@Jason Dean: The CF8 function is very nice. My only issue with it is that the variable name for the new ID changes depending on your database. If they were going to abstract that functionality, I'd think they'd stick with a single variable name.

@William Bowen: I think that within a transaction, currVal won't run into the same problem as MAX(id), but Matt has a better approach to this by using a Trigger along with a Sequence.

@Matt Newby: Thanks for the code. That process should work for any DB that uses Sequences.
# Posted By Adrian J. Moreno | 6/2/08 2:40 PM
William Bowen's Gravatar @Matt - I use the Sequence/trigger combo a lot as well as long as I don't need the ID. But I think it should be noted that without CF 8's query structure you won't be able to get the ID of the record inserted with that method. Which may be needed to insert related records in other tables.

A lot of people are still forced to use CF7 or below at their jobs (including me) - I couldn't find statistical evidence but my guess is the majority of CF developers are not working on version 8 for their clients, though their personal projects may be a different story.

Hence the two query solution makes sense (query the sequence, insert the record) or vice versa within a transaction as Adrian mentioned.
# Posted By William Bowen | 6/2/08 5:20 PM
Troy Allen's Gravatar Great detailed post! For MS-SQL 2000+, both of the first two options are sketchy at best. Like you said, @@IDENTITY can get incremented by TRIGGERS etc, so that is no good. Even if you have no triggers now, some could add them later and really mess up your code. The second option, IDENT_CURRENT('table'), is unnecessarily convoluted in this scenario. If you want to simply get the last key for "other purposes", then it works great. But not for a string of "Insert into main table, then get key for other child table inserts" scenario.

@radekg: Your solution will work just fine, but again I think it is unnecessarily complex and creating in-memory tables is an additional performance consideration that is just not needed in this scenario.

That leaves us with the one true solution in MS-SQL: SCOPE_IDENTITY(). The only problem with your example is that the entire INSERT & SCOPE_IDENTITY() should be within ONE CFQUERY block. Furthermore, the subsequent inserts should be contained within one CFTRANSACTION block that contains the original INSERT as well. Here is a complete sample block for using SCOPE_IDENTITY() with ColdFusion:

<!--- Insert Data into Tables - Use Transaction to Ensure All Tables Inserted --->
   <cftransaction isolation="READ_COMMITTED">
      <!--- Insert Into Main Table --->
      <cfquery name="qryInsertMainData" datasource="myDS">
         SET NOCOUNT ON
         INSERT INTO db_sample.dbo.MAIN_DATA
            (column1, column2)
         VALUES
            (#value1#, #value2#)

         SELECT SCOPE_IDENTITY() AS myNewIDKey
         SET NOCOUNT OFF
      </cfquery>
      
      <!--- Insert Into Child Table --->
      <cfquery name="qryInsertChildData" datasource="myDS">
         INSERT INTO db_sample.dbo.CHILD_DATA
            (identityKey, column1, column2)
         VALUES
            (#qryInsertMainData.myNewIDKey#, #value1#, #value2#)
      </cfquery>

   </cftransaction>

By using the method above, you are guaranteed to get the exact key that was generated by the associated insert.
# Posted By Troy Allen | 6/3/08 12:03 PM
Kevin Parker's Gravatar Great post! I remember you talking about this when I sat by you at the May DFWCFUG meeting, so I looked it up. I'm changing the cfqueries in my project to do this right now. :)
# Posted By Kevin Parker | 6/10/08 2:36 PM
jfish's Gravatar This is precisely why I have long had the habit of using UUIDs as IDs throughout my apps. It gives the added benefit of allowing use of the ID across object calls as well, since I can have a manager define the UUID as it's collecting the other data ("if ID is blank, then ID = createUUID()") and pass the whole thing on to however many beans/services it may need to access to complete a request. Plus, using createUUID() gives me confidence that even in an environment with multiple clustered applicaion servers hitting a single DB server, I am not going to get those "cut me off in traffic" moments where user A sees user B's shopping cart (or worse) because of an Identity was captured out of order.

As you pointed out, Adrian, it is also a coding practice that is completely safe across all RDBMS platforms: any ID can be an nvarchar(35) or a varchar2(35) or whatever the datatype is called in your DB of choice, with no difference in how the INSERT / UPDATE calls are made.

Great series, thanks!
# Posted By jfish | 7/2/08 5:36 PM
Paul's Gravatar Doesn't work for me. Just gives me the "Variable LAST_INSERT_ID is undefined." deal.
# Posted By Paul | 7/7/08 8:04 AM

Copyright © 2001 - 2008 Adrian J. Moreno and i Know Kung Foo Consulting
BlogCFC was created by Raymond Camden. This blog is running version 5.9.001.