Monday, April 7, 2014

SOQL's Seven Deadly Sins

SOQL's Seven Deadly Sins

This article is not about how to be a better SOQL programmer. It is about pitfalls in SOQL that are obscure and often difficult to diagnose and understand when encountered.
  1. The SOQL WHERE clause cannot be trusted.
  2. The SOQL LIMIT parameter is not always optional.
  3. Timestamps returned by SOQL believe in time travel.
  4. SOQL can return field data that is invalid based on the field definition.
  5. SOQL has a love/hate relationship with tables containing lots of custom fields.
  6. SOQL does not treat all table equally – not knowing the exceptions can take down your org.
  7. SOQL chokes on tables with lots of functional columns.

Sin 1: The SOQL WHERE clause cannot be trusted.

Here is a trick question. Will the following SOQL statement return any records that were modified today?
SELECT id, name FROM Account WHERE systemModStamp < LAST_YEAR

The answer is......usually not. However it will return a record modified today when:
  • The record fit the WHERE criteria when the record set to return was determined.
  • The record is modified before it was fetched from Salesforce.
This “sin” is easy to reproduce with a Python script and a web browser.
  1. In the web browser open the Salesforce GUI on the record that will be the most recent one to match the LAST_YEAR criteria.
  2. Start a Python script that fetches records using:
SELECT id,name,systemModStamp
FROM Account
WHERE systemModStamp < LAST_YEAR
ORDER BY systemModStamp ASC
  1. Modify the record in the Salesforce GUI after the Python script starts fetching records.
  2. One of the last records you read from Salesforce will have a modification date of today

Sin 2: The SOQL LIMIT parameter is not always optional.

Though the SOQL documentation says that the LIMIT parameter is optional there are a handful of Salesforce tables where it is nearly impossible to run a large query without the dreaded Query Timeout error. If you are not a believer then try running these queries on a mature Salesforce instance.
  1. SELECT id, accountId, description FROM Task ORDER BY SystemModStamp ASC
  2. SELECT application,browser,loginTime FROM LoginHistory ORDER BY loginTime ASC
Short of begging Salesforce for a max timeout value longer than the default, a solution has to “know” good limit values for problem tables. Here are good values for a handful of the more troublesome table.

Table Name (or name pattern) Good LIMIT Value
.*History 5000
.*History__c 5000
.*Feed 5000
.*Share 5000
FeedComment 5000
Task 5000
EmailMessage 5000
CaseComment 50
FieldPermissions 2000
LoginHistory 2000



Sin 3: Timestamps returned by SOQL believe in time travel.

Here's another SOQL quiz. In the following code snippet, is it possible for the Exception to be thrown? For the quiz assume that all Account records have unique timestamp values and no records are deleted.

Account accts1[] = [SELECT id, systemmodstamp FROM Account ORDER BY systemmodstamp ASC;
Account maxAcct = accts1[acct1.size() - 1]
Account accts2[] = [SELECT id, systemmodstamp FROM Account WHERE systemmodstamp <=:maxAcct.systemmodstamp ORDER BY systemmodstamp ASC;
if( accts2.size() > accts2.size()) {
throw new Exception('Timewarp – new account created in the past!');
}
The answer....usually no. However, Salesforce does not guarantee that records are committed in timestamp order.

It is possible for a record with a SystemModStamp of 2014-12-31T00:12:12.000Z to be committed to Salesforce after a record in the same table with a SystemModStamp of 2014-12-31T00:13:12.000Z.

This is a hard “sin” to reproduce because it depends on the current load of the Salesforce servers. We discovered the problem by monitoring a high volume production instance for three weeks – the problem occurred just a couple of times during this period. It was sort of like searching for a Higgs particle.

The moral: Do not assume that the MAX(SystemModStamp) or MAX(LastModifiedDate) value is a good constraint to find all values that are older.

The solution: Do not rely on timestamps that are fresh. Give Salesforce a little while to commit all recent transactions. In practice this odd state seems to settle out on Salesforce servers after 10-120 seconds.

Sin 4: SOQL can return invalid field data.

This quiz should be easy for experienced SOQL users.
  1. Is it possible for an invalid email address to be stored and returned in a Salesforce Email field?
  2. Is is possible to store a 132 character string in a 128 character Salesforce field?
The answer to #1 is, of course, possible. Check out contact Ms. Barbara Levy in any developer force instance. Her email is: b.levy@expressl&t.net Try saving the record and Salesforce will tell you that the email format is invalid (using the API will throw the same error).

The answer to #2 is also “possible.” In fact Salesforce creates an email template with an invalid length subject when Sites is enabled.

There is other “invalid” data (related to currency values) lurking in the developer force instance – you will have to find this one yourself.

The morals:
  1. Just because Salesforce meta data says a field is 80 characters long does not mean it may have a longer value.
  2. Just because Salesforce meta data says a number is DECIMAL(5,2) do not trust it. An API program may have inserted data that does not match the spec.
  3. Do not be surprised if you read data from one Salesforce instance and it fails to upsert into another instance – the source data may be bad.

Sin 5: SOQL has a love/hate relationship with custom fields.

Though Salesforce allows many custom fields to be created on a table, using too many columns in a SOQL statement will cause a QUERY_TOO_COMPLICATED error. At least this “sin” is one Salesforce documents but after 10 years one would assume they would have a work-around in the back-end.

If you really need to do the equivalent of “SELECT * FROM Account” and SOQL will not allow the read in a single statement what needs to be done? There are two problem to solve.
  1. Figure out the minimum number of SOQL queries to run. If none of the columns are formulas then this can usually be done by using 500 columns for each query. If formula columns are involved the problem is much harder – in this case see Sin #7.[
  2. Join the results of multiple queries together.
Realistically, grab the data from a Salesforce dump file or use a tool like Capstorm's CopyStorm to extract the table into a relational database. Doing it yourself is hard.

Sin 6: SOQL does not treat all tables equally.

The transaction cost of querying a table with SOQL varies a lot. The first time you query a high cost table and cause your production org to run out of transactions will probably be the last time.

Here a SOQL quiz. How many Salesforce transactions does it take to run the following SOQL statement?

SELECT id, name FROM TableName LIMIT 2000

The answer, of course, depends on TableName and is either 2000, 200, 40, 20, 10, 4, 2, or 1.

The “1 transaction per record” rule for Attachment records is documented. The same rule applies to QuoteDocument and ContentVersion but we have learned this from experience rather than documentation. In general 2000 and 1000 are the common number of records per transaction but the only way to know for certain is the write a bit of code and see how many come back in a bunch.

Sin 7: SOQL chokes on tables with many functional columns.

When Salesforce runs a SOQL query it turns it into SQL first. Once in SQL there is a 64K character query limit. If the 64K SQL query limit is passed then SOQL will return a QUERY_TOO_COMPLICATED exception.

Here a SOQL quiz: Given a SOQL statement how can you accurately (or inaccurately even) guess that it will generated a SQL statement less than 64 characters?

The answer is......if table has a lot of formula columns you probably cannot guess accurately.

In practice, the addition of formula columns in a SOQL statement can cause the size of the resultant SQL to explode – sometimes. It all depends on the complexity and number of tables referenced by the formula.
  • The complexity of the formula seems to matter. My guess is that Saleforce is in-lining the formula calculation into the generated SQL.
  • A formula that references other tables likely causes the table expression and predicate portion of the generated SQL to grow rapidly.
The solution to this problem is nearly the same as Sin #5 but the first step, determining the minimum number of SOQL query to run can difficult. The only solution we have found that works reliably is iterative.
  1. Sort column in ascending order by the length of their name or length of their corresponding formula.
  2. Set maxColumn = total # of columns
  3. Run a query run using columns 1 thru maxColumn with a LIMIT 1.
  • If the query succeeds then save the query, remove the query's columns from the list of available columns.
  • If the query fails then set maxColumn = maxColumn – 1 and start step 3 again.
  1. Set maxColumn = # remaining columns
  2. If maxColumn > 0 goto Step #3, otherwise all queries have been determined.

Credits

The “SOQL Sins” described in this article were accumulated over a number years while developing:
  • CopyStorm – keep a hot backup of your Salesforce instance in SQL/Server, Oracle, MySQL, PostgreSQL, and H2
  • CopyStorm/Restore – restore a backup to any production, development, or sandbox instance. This is a restore tool even your manager could use.
  • SQLForce – a standard Python module for Salesforce. Works with Python 2.8. 3.3, and 3.4. Use SQLForce when Apex is just too hard.
For more information see www.capstorm.com