Wednesday, July 8, 2015

Initialize New Fields Fast

Since it is raining today riding my motorcycle is not a good idea. Instead, I solved a simple problem for our Tokyo office.

We already keep both an English and Kanji version of contact first and last names. My Japanese friends wanted new fields for Kanji Title and Department.  They also wanted the new fields initialized based on the current Title and Department fields (but just for accounts in Japan). The new fields are called LocalTitle__c and LocalDepartment__c.

How would you solve this data initialization problem? DataLoader? My Python/SQLForce solution took me 6 minutes and that included testing the script on a sandbox.

import SQLForce 
session = SQLForce.Session("production")  
contactSQL = "SELECT id, title, department FROM Contact WHERE  account.marketCountry__c='Japan' AND ((title<>null and localTitle__c=null) or (department<>null and localDepartment__c=null)) 
updateRecs = [] 
for rec in session.selectRecords(contactSQL):
updateRecs.append( [, rec.title, rec.department ]
 if 0 != len(updateRecs):
session.update("Contact", ["localTitle__c", "localDepartment__c" ], updateRecs)
In my case the script updated around 3000 records and took 15 seconds to run. Naturally, the SQLForce library took care of buffering updates into the required packet sizes.

Friday, June 12, 2015

SQLForce Gave Me Time to Ride My Motorcycle.

This morning I was asked to change the "Type" of around 400 Salesforce Campaigns from one type to another.  The type was simply wrong.

No sweat. The following short Python did the job with just a few minutes work on my part, saving my next few hours for a bit of motorcycle riding.

import SQLForce

if __name__ == '__main__':
    session = SQLForce.Session("production")
    updateRecs = []
    campaignSOQL = "SELECT id, owner.alias FROM Campaign WHERE owner.alias='mnakano' AND type='Event-Tradeshow/Conference'"
    for rec in session.selectRecords( campaignSOQL ):
        updateRecs.append( [, 'Training'])
    if len(updateRecs) > 0:
        print("Updating " + str(len(updateRecs)) + " campaign records")
        session.update("Campaign", ["Type"], updateRecs)
        print("No records to update")

I will never understand why so many people put up with DataLoader to do mass updates. Python with SQLForce is free and is so much faster.

You can get the SQLForce Python module for free at

Thursday, September 11, 2014

How to Delete a Lot of Price Book Entries

This week a marketing person cloned a number of products to create new product records. A nifty side effect of the clone was a clone of the prices. This created price book entries for in lots of price books and in currencies in which the new product is not even sold. In other words, a lot of garbage data.

I fixed the problem with a tiny Python program using the SQLForce module (free from Capstorm).
Want to see it?

import SQLForce

session = SQLForce.Session("production")

soql = "DELETE FROM PricebookEntry WHERE productCode LIKE 'MUSI0%' AND = 'NameHidden' AND<>'Standard Price Book'



Naturally I used a SOQL statement first to make sure my WHERE clause was selecting the proper records. Since the prices had not been used in an Opporunity this was all it took to fix my data.

SQLForce for Python -- I love it!

Monday, May 26, 2014

Populate a Sandbox in 5 minutes and learn some graph theory.

Every time I populate a Developer Sandbox with a snapshot of production data I smile. Today, in tracking down an interesting bug, I populated a sandbox with all won Opportunities since 1-Dec-2013 plus their associated

  • Accounts
  • Contacts,
  • Opportunity Line Items
  • Quotes & Quote Line Items
  • Quote PDF Documents 
  • plus a handful of other tables. 
How long did this take: 4 minutes, 27 seconds.

I restored: + all the records + all the links between the records + the attachments & notes
The tool I used was CopyStorm/Restore -- this is not a product promo but a discussion of the graph theory problem faced in restoring Salesforce data.

Anyone who has use data loaders (Salesforce or Jitterbit) knows that the routine for importing Account and Contact records is:

  • Import Accounts
  • Take the generated Ids from importing the Accounts and patch the Contact import file. 
  • Import the Contacts 
Of course, if there are Account->Account relationships or Contact->ReportsTo relationships then the problem is quite a bit more difficult.

The problem is inherently difficult because Salesforce supports almost arbitrary relationships between records.

Finding and restoring all of the relationships is fundamentally a well known graph theory problem (minimal spanning subtree) and has the unfortunate property of being an NP-complete problem.

Roughly, NP-complete means that the complexity of the problem grows at a rate faster than any polynomial as more records and relationships are added.

An exhaustive search is rarely possible for this class of problem if the # of records and relationships is very large -- an algorithm has to "know" something about the problem domain and efficiently "prune" the search space.

Oddly enough the "Salesforce Restore" problem is similar to a problem in computational chemistry. The problem: Given part of a molecular structure, find all molecules that match the pattern. Ring system make this a difficult problem in chemistry -- knowledge about possible ring systems make it do-able.

Monday, April 7, 2014

SOQL's Seven Deadly Sins

SOQL's Seven Deadly Sins

This article is not about how to be a better SOQL programmer. It is about pitfalls in SOQL that are obscure and often difficult to diagnose and understand when encountered.
  1. The SOQL WHERE clause cannot be trusted.
  2. The SOQL LIMIT parameter is not always optional.
  3. Timestamps returned by SOQL believe in time travel.
  4. SOQL can return field data that is invalid based on the field definition.
  5. SOQL has a love/hate relationship with tables containing lots of custom fields.
  6. SOQL does not treat all table equally – not knowing the exceptions can take down your org.
  7. SOQL chokes on tables with lots of functional columns.

Sin 1: The SOQL WHERE clause cannot be trusted.

Here is a trick question. Will the following SOQL statement return any records that were modified today?
SELECT id, name FROM Account WHERE systemModStamp < LAST_YEAR

The answer is......usually not. However it will return a record modified today when:
  • The record fit the WHERE criteria when the record set to return was determined.
  • The record is modified before it was fetched from Salesforce.
This “sin” is easy to reproduce with a Python script and a web browser.
  1. In the web browser open the Salesforce GUI on the record that will be the most recent one to match the LAST_YEAR criteria.
  2. Start a Python script that fetches records using:
SELECT id,name,systemModStamp
FROM Account
WHERE systemModStamp < LAST_YEAR
ORDER BY systemModStamp ASC
  1. Modify the record in the Salesforce GUI after the Python script starts fetching records.
  2. One of the last records you read from Salesforce will have a modification date of today

Sin 2: The SOQL LIMIT parameter is not always optional.

Though the SOQL documentation says that the LIMIT parameter is optional there are a handful of Salesforce tables where it is nearly impossible to run a large query without the dreaded Query Timeout error. If you are not a believer then try running these queries on a mature Salesforce instance.
  1. SELECT id, accountId, description FROM Task ORDER BY SystemModStamp ASC
  2. SELECT application,browser,loginTime FROM LoginHistory ORDER BY loginTime ASC
Short of begging Salesforce for a max timeout value longer than the default, a solution has to “know” good limit values for problem tables. Here are good values for a handful of the more troublesome table.

Table Name (or name pattern) Good LIMIT Value
.*History 5000
.*History__c 5000
.*Feed 5000
.*Share 5000
FeedComment 5000
Task 5000
EmailMessage 5000
CaseComment 50
FieldPermissions 2000
LoginHistory 2000

Sin 3: Timestamps returned by SOQL believe in time travel.

Here's another SOQL quiz. In the following code snippet, is it possible for the Exception to be thrown? For the quiz assume that all Account records have unique timestamp values and no records are deleted.

Account accts1[] = [SELECT id, systemmodstamp FROM Account ORDER BY systemmodstamp ASC;
Account maxAcct = accts1[acct1.size() - 1]
Account accts2[] = [SELECT id, systemmodstamp FROM Account WHERE systemmodstamp <=:maxAcct.systemmodstamp ORDER BY systemmodstamp ASC;
if( accts2.size() > accts2.size()) {
throw new Exception('Timewarp – new account created in the past!');
The answer....usually no. However, Salesforce does not guarantee that records are committed in timestamp order.

It is possible for a record with a SystemModStamp of 2014-12-31T00:12:12.000Z to be committed to Salesforce after a record in the same table with a SystemModStamp of 2014-12-31T00:13:12.000Z.

This is a hard “sin” to reproduce because it depends on the current load of the Salesforce servers. We discovered the problem by monitoring a high volume production instance for three weeks – the problem occurred just a couple of times during this period. It was sort of like searching for a Higgs particle.

The moral: Do not assume that the MAX(SystemModStamp) or MAX(LastModifiedDate) value is a good constraint to find all values that are older.

The solution: Do not rely on timestamps that are fresh. Give Salesforce a little while to commit all recent transactions. In practice this odd state seems to settle out on Salesforce servers after 10-120 seconds.

Sin 4: SOQL can return invalid field data.

This quiz should be easy for experienced SOQL users.
  1. Is it possible for an invalid email address to be stored and returned in a Salesforce Email field?
  2. Is is possible to store a 132 character string in a 128 character Salesforce field?
The answer to #1 is, of course, possible. Check out contact Ms. Barbara Levy in any developer force instance. Her email is: b.levy@expressl& Try saving the record and Salesforce will tell you that the email format is invalid (using the API will throw the same error).

The answer to #2 is also “possible.” In fact Salesforce creates an email template with an invalid length subject when Sites is enabled.

There is other “invalid” data (related to currency values) lurking in the developer force instance – you will have to find this one yourself.

The morals:
  1. Just because Salesforce meta data says a field is 80 characters long does not mean it may have a longer value.
  2. Just because Salesforce meta data says a number is DECIMAL(5,2) do not trust it. An API program may have inserted data that does not match the spec.
  3. Do not be surprised if you read data from one Salesforce instance and it fails to upsert into another instance – the source data may be bad.

Sin 5: SOQL has a love/hate relationship with custom fields.

Though Salesforce allows many custom fields to be created on a table, using too many columns in a SOQL statement will cause a QUERY_TOO_COMPLICATED error. At least this “sin” is one Salesforce documents but after 10 years one would assume they would have a work-around in the back-end.

If you really need to do the equivalent of “SELECT * FROM Account” and SOQL will not allow the read in a single statement what needs to be done? There are two problem to solve.
  1. Figure out the minimum number of SOQL queries to run. If none of the columns are formulas then this can usually be done by using 500 columns for each query. If formula columns are involved the problem is much harder – in this case see Sin #7.[
  2. Join the results of multiple queries together.
Realistically, grab the data from a Salesforce dump file or use a tool like Capstorm's CopyStorm to extract the table into a relational database. Doing it yourself is hard.

Sin 6: SOQL does not treat all tables equally.

The transaction cost of querying a table with SOQL varies a lot. The first time you query a high cost table and cause your production org to run out of transactions will probably be the last time.

Here a SOQL quiz. How many Salesforce transactions does it take to run the following SOQL statement?

SELECT id, name FROM TableName LIMIT 2000

The answer, of course, depends on TableName and is either 2000, 200, 40, 20, 10, 4, 2, or 1.

The “1 transaction per record” rule for Attachment records is documented. The same rule applies to QuoteDocument and ContentVersion but we have learned this from experience rather than documentation. In general 2000 and 1000 are the common number of records per transaction but the only way to know for certain is the write a bit of code and see how many come back in a bunch.

Sin 7: SOQL chokes on tables with many functional columns.

When Salesforce runs a SOQL query it turns it into SQL first. Once in SQL there is a 64K character query limit. If the 64K SQL query limit is passed then SOQL will return a QUERY_TOO_COMPLICATED exception.

Here a SOQL quiz: Given a SOQL statement how can you accurately (or inaccurately even) guess that it will generated a SQL statement less than 64 characters?

The answer is......if table has a lot of formula columns you probably cannot guess accurately.

In practice, the addition of formula columns in a SOQL statement can cause the size of the resultant SQL to explode – sometimes. It all depends on the complexity and number of tables referenced by the formula.
  • The complexity of the formula seems to matter. My guess is that Saleforce is in-lining the formula calculation into the generated SQL.
  • A formula that references other tables likely causes the table expression and predicate portion of the generated SQL to grow rapidly.
The solution to this problem is nearly the same as Sin #5 but the first step, determining the minimum number of SOQL query to run can difficult. The only solution we have found that works reliably is iterative.
  1. Sort column in ascending order by the length of their name or length of their corresponding formula.
  2. Set maxColumn = total # of columns
  3. Run a query run using columns 1 thru maxColumn with a LIMIT 1.
  • If the query succeeds then save the query, remove the query's columns from the list of available columns.
  • If the query fails then set maxColumn = maxColumn – 1 and start step 3 again.
  1. Set maxColumn = # remaining columns
  2. If maxColumn > 0 goto Step #3, otherwise all queries have been determined.


The “SOQL Sins” described in this article were accumulated over a number years while developing:
  • CopyStorm – keep a hot backup of your Salesforce instance in SQL/Server, Oracle, MySQL, PostgreSQL, and H2
  • CopyStorm/Restore – restore a backup to any production, development, or sandbox instance. This is a restore tool even your manager could use.
  • SQLForce – a standard Python module for Salesforce. Works with Python 2.8. 3.3, and 3.4. Use SQLForce when Apex is just too hard.
For more information see

Friday, March 28, 2014

Load Baseball Teams into SFDC without DataLoader

It's spring and I wanted to populate a sandbox Salesforce with major league basedball teams, their rosters, and a few opportunities and cases (made up of course).  Given the hassle of managing all the links between related objects with DataLoader, I knew I want something a lot simpler.

My approach: With a short Python script and an Excel workbook (one sheet per SFDC table) I managed to import teams and rosters as quickly as I could type them into Excel.

My source data in Excel was simple:

  • the first row contained SFDC column names EXCEPT for column 1.
  • Column #1 contained a synthetic primary key that I used to refer to the row from other spreadsheets.
For example: The St. Louis Cardinals Account row looked like:
CardsSt. Louis CardinalsSt LouisBaseball

Contact records referred to their associated team using synthetic keys. For example, here is a player from the Cardinals.


The special syntax AccountId->Account tells the Python import script to look up the parent record id at run time based on the SFDC id given to the synthetic key.

Is this clever? Probably not so much. But it did make it possible to import a lot of interrelated data into Salesforce with the hassle of importing CSV's one at a time and keeping track of SFDC ids.

For the record, the Python script is in Python 3 and uses two extension modules:

  • xlrd for reading Excel workbooks
  • SQLForce for talking to Salesforce.

Friday, February 7, 2014

How to Populate a Developer Sandbox with Real Data in 5 Minutes

Anyone that has ever tried to populate a Salesforce database with a lot of tables using Dataloader knows the level of pain and patience required. I am not suffering that pain anymore when I need to build a test environment.

Take a guess: How long does it take to populate a developer sandbox with a subset of your production Salesforce data when you want:

  • The top 500 revenue Opportunities in the past year.
  • All related Account, Contact, Line Items, Quotes, Quote Line Items, and a few other records.
Would you believe 3 minutes and 38 seconds? Here's the screen shot of the stuff I just restored to a blank developer sandbox.

My formerly empty sandbox now has 518 Opportunities, 451 Accounts, 5072 Contacts, and lots of other stuff. I hardly had time to get a coffee. Perhaps the coolest thing is that all the difficult stuff of making sure the the records reference each other property with their newly assigned Salesforce Ids happened.