SOQL's Seven Deadly Sins
This article is not about how to be a better SOQL programmer. It
is about pitfalls in SOQL that are obscure and often difficult to
diagnose and understand when encountered.
- The SOQL WHERE clause cannot be trusted.
- The SOQL LIMIT parameter is not always optional.
- Timestamps returned by SOQL believe in time travel.
- SOQL can return field data that is invalid based on the field
definition.
- SOQL has a love/hate relationship with tables containing lots
of custom fields.
- SOQL does not treat all table equally – not knowing the
exceptions can take down your org.
- SOQL chokes on tables with lots of functional columns.
Sin 1: The SOQL WHERE clause cannot be trusted.
Here is a trick question. Will the following SOQL statement return
any records that were modified today?
SELECT id, name FROM Account WHERE
systemModStamp < LAST_YEAR
The answer is......usually not. However it will return a record
modified today when:
- The record fit the WHERE criteria when the record set to
return was determined.
- The record is modified before it was fetched from Salesforce.
This “sin” is easy to reproduce with a Python script and a web
browser.
- In the web browser open the Salesforce GUI on the record that
will be the most recent one to match the LAST_YEAR criteria.
- Start a Python script that fetches records using:
SELECT
id,name,systemModStamp
FROM Account
WHERE systemModStamp <
LAST_YEAR
ORDER BY systemModStamp ASC
- Modify the record in the Salesforce GUI after the Python
script starts fetching records.
- One of the last records you read from Salesforce will have a
modification date of today
Sin 2: The SOQL LIMIT parameter is not always
optional.
Though the SOQL documentation says that the LIMIT parameter is
optional there are a handful of Salesforce tables where it is nearly
impossible to run a large query without the dreaded
Query Timeout
error. If you are not a believer then try running these queries on a
mature Salesforce instance.
- SELECT id, accountId, description FROM Task ORDER BY
SystemModStamp ASC
- SELECT application,browser,loginTime FROM LoginHistory ORDER
BY loginTime ASC
Short of begging Salesforce for a max timeout value longer than
the default, a solution has to “know” good limit values for
problem tables. Here are good values for a handful of the more
troublesome table.
Table Name (or name pattern) |
Good LIMIT Value |
.*History |
5000 |
.*History__c |
5000 |
.*Feed |
5000 |
.*Share |
5000 |
FeedComment |
5000 |
Task |
5000 |
EmailMessage |
5000 |
CaseComment |
50 |
FieldPermissions |
2000 |
LoginHistory |
2000 |
Sin 3: Timestamps returned by SOQL believe in
time travel.
Here's another SOQL quiz. In the following code snippet, is it
possible for the Exception to be thrown? For the quiz assume that all
Account records have unique timestamp values and no records are
deleted.
Account accts1[] = [SELECT id,
systemmodstamp FROM Account ORDER BY systemmodstamp ASC;
Account maxAcct = accts1[acct1.size() -
1]
Account accts2[] = [SELECT id,
systemmodstamp FROM Account WHERE systemmodstamp
<=:maxAcct.systemmodstamp ORDER BY systemmodstamp ASC;
if( accts2.size() > accts2.size()) {
throw new Exception('Timewarp – new
account created in the past!');
}
The answer....usually no. However, Salesforce does not guarantee
that records are committed in timestamp order.
It is possible for a
record with a SystemModStamp of 2014-12-31T00:12:12.000Z to be
committed to Salesforce after a record in the same table with a
SystemModStamp of 2014-12-31T00:13:12.000Z.
This is a hard “sin” to reproduce because it depends on the
current load of the Salesforce servers. We discovered the problem by
monitoring a high volume production instance for three weeks – the
problem occurred just a couple of times during this period. It was
sort of like searching for a Higgs particle.
The moral: Do not assume that the MAX(SystemModStamp) or
MAX(LastModifiedDate) value is a good constraint to find all values
that are older.
The solution: Do not rely on timestamps that are fresh. Give
Salesforce a little while to commit all recent transactions. In
practice this odd state seems to settle out on Salesforce servers
after 10-120 seconds.
Sin 4: SOQL can return invalid field data.
This quiz should be easy for experienced SOQL users.
- Is it possible for an invalid email address to be stored and
returned in a Salesforce Email field?
- Is is possible to store a 132 character string in a 128
character Salesforce field?
The answer to #1 is, of course, possible. Check out contact Ms.
Barbara Levy in any developer force instance. Her email is:
b.levy@expressl&t.net Try saving the record and Salesforce will
tell you that the email format is invalid (using the API will throw
the same error).
The answer to #2 is also “possible.” In fact Salesforce
creates an email template with an invalid length subject when Sites
is enabled.
There is other “invalid” data (related to currency values)
lurking in the developer force instance – you will have to find
this one yourself.
The morals:
- Just because Salesforce meta data says a field is 80
characters long does not mean it may have a longer value.
- Just because Salesforce meta data says a number is
DECIMAL(5,2) do not trust it. An API program may have inserted data
that does not match the spec.
- Do not be surprised if you read data from one Salesforce
instance and it fails to upsert into another instance – the source
data may be bad.
Sin 5: SOQL has a love/hate relationship with
custom fields.
Though Salesforce allows many custom fields to be created on a
table, using too many columns in a SOQL statement will cause a
QUERY_TOO_COMPLICATED error. At least this “sin” is one
Salesforce documents but after 10 years one would assume they would
have a work-around in the back-end.
If you really need to do the equivalent of “SELECT * FROM
Account” and SOQL will not allow the read in a single statement
what needs to be done? There are two problem to solve.
- Figure out the minimum number of SOQL queries to run. If none
of the columns are formulas then this can usually be done by using
500 columns for each query. If formula columns are involved the
problem is much harder – in this case see Sin #7.[
- Join the results of multiple queries together.
Realistically, grab the data from a Salesforce dump file or use a
tool like Capstorm's CopyStorm to extract the table into a relational
database. Doing it yourself is hard.
Sin 6: SOQL does not treat all tables equally.
The transaction cost of querying a table with SOQL varies a lot.
The first time you query a high cost table and cause your production
org to run out of transactions will probably be the last time.
Here a SOQL quiz. How many Salesforce transactions does it take to
run the following SOQL statement?
SELECT id, name FROM TableName LIMIT
2000
The answer, of course, depends on
TableName and is either
2000, 200, 40, 20, 10, 4, 2, or 1.
The “1 transaction per record” rule for Attachment records is
documented. The same rule applies to QuoteDocument and ContentVersion
but we have learned this from experience rather than documentation.
In general 2000 and 1000 are the common number of records per
transaction but the only way to know for certain is the write a bit
of code and see how many come back in a bunch.
Sin 7: SOQL chokes on tables with many functional
columns.
When Salesforce runs a SOQL query it turns it into SQL first. Once
in SQL there is a 64K character query limit. If the 64K SQL query
limit is passed then SOQL will return a QUERY_TOO_COMPLICATED
exception.
Here a SOQL quiz: Given a SOQL statement how can you accurately
(or inaccurately even) guess that it will generated a SQL statement
less than 64 characters?
The answer is......if table has a lot of formula columns you
probably cannot guess accurately.
In practice, the addition of formula columns in a SOQL statement
can cause the size of the resultant SQL to explode – sometimes. It
all depends on the complexity and number of tables referenced by the
formula.
- The complexity of the formula seems to matter. My guess is
that Saleforce is in-lining the formula calculation into the
generated SQL.
- A formula that references other tables likely causes the
table expression and predicate portion of the generated SQL to grow
rapidly.
The solution to this problem is nearly the same as
Sin #5
but the first step, determining the minimum number of SOQL query to
run can difficult. The only solution we have found that works
reliably is iterative.
- Sort column in ascending order by the length of their name or
length of their corresponding formula.
- Set maxColumn = total # of columns
- Run a query run using columns 1 thru maxColumn with a LIMIT
1.
- If the query succeeds then save the query, remove the query's
columns from the list of available columns.
- If the query fails then set maxColumn = maxColumn – 1 and
start step 3 again.
- Set maxColumn = # remaining columns
- If maxColumn > 0 goto Step #3, otherwise all queries have
been determined.
Credits
The “SOQL Sins” described in this article were accumulated
over a number years while developing:
- CopyStorm – keep a hot backup of your Salesforce
instance in SQL/Server, Oracle, MySQL, PostgreSQL, and H2
- CopyStorm/Restore – restore a backup to any
production, development, or sandbox instance. This is a restore tool
even your manager could use.
- SQLForce – a standard Python module for Salesforce.
Works with Python 2.8. 3.3, and 3.4. Use SQLForce when Apex is just
too hard.
For more information see
www.capstorm.com