Thursday, September 11, 2014

How to Delete a Lot of Price Book Entries

This week a marketing person cloned a number of products to create new product records. A nifty side effect of the clone was a clone of the prices. This created price book entries for in lots of price books and in currencies in which the new product is not even sold. In other words, a lot of garbage data.

I fixed the problem with a tiny Python program using the SQLForce module (free from Capstorm).
Want to see it?

import SQLForce

session = SQLForce.Session("production")

soql = "DELETE FROM PricebookEntry WHERE productCode LIKE 'MUSI0%' AND = 'NameHidden' AND<>'Standard Price Book'



Naturally I used a SOQL statement first to make sure my WHERE clause was selecting the proper records. Since the prices had not been used in an Opporunity this was all it took to fix my data.

SQLForce for Python -- I love it!

Monday, May 26, 2014

Populate a Sandbox in 5 minutes and learn some graph theory.

Every time I populate a Developer Sandbox with a snapshot of production data I smile. Today, in tracking down an interesting bug, I populated a sandbox with all won Opportunities since 1-Dec-2013 plus their associated

  • Accounts
  • Contacts,
  • Opportunity Line Items
  • Quotes & Quote Line Items
  • Quote PDF Documents 
  • plus a handful of other tables. 
How long did this take: 4 minutes, 27 seconds.

I restored: + all the records + all the links between the records + the attachments & notes
The tool I used was CopyStorm/Restore -- this is not a product promo but a discussion of the graph theory problem faced in restoring Salesforce data.

Anyone who has use data loaders (Salesforce or Jitterbit) knows that the routine for importing Account and Contact records is:

  • Import Accounts
  • Take the generated Ids from importing the Accounts and patch the Contact import file. 
  • Import the Contacts 
Of course, if there are Account->Account relationships or Contact->ReportsTo relationships then the problem is quite a bit more difficult.

The problem is inherently difficult because Salesforce supports almost arbitrary relationships between records.

Finding and restoring all of the relationships is fundamentally a well known graph theory problem (minimal spanning subtree) and has the unfortunate property of being an NP-complete problem.

Roughly, NP-complete means that the complexity of the problem grows at a rate faster than any polynomial as more records and relationships are added.

An exhaustive search is rarely possible for this class of problem if the # of records and relationships is very large -- an algorithm has to "know" something about the problem domain and efficiently "prune" the search space.

Oddly enough the "Salesforce Restore" problem is similar to a problem in computational chemistry. The problem: Given part of a molecular structure, find all molecules that match the pattern. Ring system make this a difficult problem in chemistry -- knowledge about possible ring systems make it do-able.

Monday, April 7, 2014

SOQL's Seven Deadly Sins

SOQL's Seven Deadly Sins

This article is not about how to be a better SOQL programmer. It is about pitfalls in SOQL that are obscure and often difficult to diagnose and understand when encountered.
  1. The SOQL WHERE clause cannot be trusted.
  2. The SOQL LIMIT parameter is not always optional.
  3. Timestamps returned by SOQL believe in time travel.
  4. SOQL can return field data that is invalid based on the field definition.
  5. SOQL has a love/hate relationship with tables containing lots of custom fields.
  6. SOQL does not treat all table equally – not knowing the exceptions can take down your org.
  7. SOQL chokes on tables with lots of functional columns.

Sin 1: The SOQL WHERE clause cannot be trusted.

Here is a trick question. Will the following SOQL statement return any records that were modified today?
SELECT id, name FROM Account WHERE systemModStamp < LAST_YEAR

The answer is......usually not. However it will return a record modified today when:
  • The record fit the WHERE criteria when the record set to return was determined.
  • The record is modified before it was fetched from Salesforce.
This “sin” is easy to reproduce with a Python script and a web browser.
  1. In the web browser open the Salesforce GUI on the record that will be the most recent one to match the LAST_YEAR criteria.
  2. Start a Python script that fetches records using:
SELECT id,name,systemModStamp
FROM Account
WHERE systemModStamp < LAST_YEAR
ORDER BY systemModStamp ASC
  1. Modify the record in the Salesforce GUI after the Python script starts fetching records.
  2. One of the last records you read from Salesforce will have a modification date of today

Sin 2: The SOQL LIMIT parameter is not always optional.

Though the SOQL documentation says that the LIMIT parameter is optional there are a handful of Salesforce tables where it is nearly impossible to run a large query without the dreaded Query Timeout error. If you are not a believer then try running these queries on a mature Salesforce instance.
  1. SELECT id, accountId, description FROM Task ORDER BY SystemModStamp ASC
  2. SELECT application,browser,loginTime FROM LoginHistory ORDER BY loginTime ASC
Short of begging Salesforce for a max timeout value longer than the default, a solution has to “know” good limit values for problem tables. Here are good values for a handful of the more troublesome table.

Table Name (or name pattern) Good LIMIT Value
.*History 5000
.*History__c 5000
.*Feed 5000
.*Share 5000
FeedComment 5000
Task 5000
EmailMessage 5000
CaseComment 50
FieldPermissions 2000
LoginHistory 2000

Sin 3: Timestamps returned by SOQL believe in time travel.

Here's another SOQL quiz. In the following code snippet, is it possible for the Exception to be thrown? For the quiz assume that all Account records have unique timestamp values and no records are deleted.

Account accts1[] = [SELECT id, systemmodstamp FROM Account ORDER BY systemmodstamp ASC;
Account maxAcct = accts1[acct1.size() - 1]
Account accts2[] = [SELECT id, systemmodstamp FROM Account WHERE systemmodstamp <=:maxAcct.systemmodstamp ORDER BY systemmodstamp ASC;
if( accts2.size() > accts2.size()) {
throw new Exception('Timewarp – new account created in the past!');
The answer....usually no. However, Salesforce does not guarantee that records are committed in timestamp order.

It is possible for a record with a SystemModStamp of 2014-12-31T00:12:12.000Z to be committed to Salesforce after a record in the same table with a SystemModStamp of 2014-12-31T00:13:12.000Z.

This is a hard “sin” to reproduce because it depends on the current load of the Salesforce servers. We discovered the problem by monitoring a high volume production instance for three weeks – the problem occurred just a couple of times during this period. It was sort of like searching for a Higgs particle.

The moral: Do not assume that the MAX(SystemModStamp) or MAX(LastModifiedDate) value is a good constraint to find all values that are older.

The solution: Do not rely on timestamps that are fresh. Give Salesforce a little while to commit all recent transactions. In practice this odd state seems to settle out on Salesforce servers after 10-120 seconds.

Sin 4: SOQL can return invalid field data.

This quiz should be easy for experienced SOQL users.
  1. Is it possible for an invalid email address to be stored and returned in a Salesforce Email field?
  2. Is is possible to store a 132 character string in a 128 character Salesforce field?
The answer to #1 is, of course, possible. Check out contact Ms. Barbara Levy in any developer force instance. Her email is: b.levy@expressl& Try saving the record and Salesforce will tell you that the email format is invalid (using the API will throw the same error).

The answer to #2 is also “possible.” In fact Salesforce creates an email template with an invalid length subject when Sites is enabled.

There is other “invalid” data (related to currency values) lurking in the developer force instance – you will have to find this one yourself.

The morals:
  1. Just because Salesforce meta data says a field is 80 characters long does not mean it may have a longer value.
  2. Just because Salesforce meta data says a number is DECIMAL(5,2) do not trust it. An API program may have inserted data that does not match the spec.
  3. Do not be surprised if you read data from one Salesforce instance and it fails to upsert into another instance – the source data may be bad.

Sin 5: SOQL has a love/hate relationship with custom fields.

Though Salesforce allows many custom fields to be created on a table, using too many columns in a SOQL statement will cause a QUERY_TOO_COMPLICATED error. At least this “sin” is one Salesforce documents but after 10 years one would assume they would have a work-around in the back-end.

If you really need to do the equivalent of “SELECT * FROM Account” and SOQL will not allow the read in a single statement what needs to be done? There are two problem to solve.
  1. Figure out the minimum number of SOQL queries to run. If none of the columns are formulas then this can usually be done by using 500 columns for each query. If formula columns are involved the problem is much harder – in this case see Sin #7.[
  2. Join the results of multiple queries together.
Realistically, grab the data from a Salesforce dump file or use a tool like Capstorm's CopyStorm to extract the table into a relational database. Doing it yourself is hard.

Sin 6: SOQL does not treat all tables equally.

The transaction cost of querying a table with SOQL varies a lot. The first time you query a high cost table and cause your production org to run out of transactions will probably be the last time.

Here a SOQL quiz. How many Salesforce transactions does it take to run the following SOQL statement?

SELECT id, name FROM TableName LIMIT 2000

The answer, of course, depends on TableName and is either 2000, 200, 40, 20, 10, 4, 2, or 1.

The “1 transaction per record” rule for Attachment records is documented. The same rule applies to QuoteDocument and ContentVersion but we have learned this from experience rather than documentation. In general 2000 and 1000 are the common number of records per transaction but the only way to know for certain is the write a bit of code and see how many come back in a bunch.

Sin 7: SOQL chokes on tables with many functional columns.

When Salesforce runs a SOQL query it turns it into SQL first. Once in SQL there is a 64K character query limit. If the 64K SQL query limit is passed then SOQL will return a QUERY_TOO_COMPLICATED exception.

Here a SOQL quiz: Given a SOQL statement how can you accurately (or inaccurately even) guess that it will generated a SQL statement less than 64 characters?

The answer is......if table has a lot of formula columns you probably cannot guess accurately.

In practice, the addition of formula columns in a SOQL statement can cause the size of the resultant SQL to explode – sometimes. It all depends on the complexity and number of tables referenced by the formula.
  • The complexity of the formula seems to matter. My guess is that Saleforce is in-lining the formula calculation into the generated SQL.
  • A formula that references other tables likely causes the table expression and predicate portion of the generated SQL to grow rapidly.
The solution to this problem is nearly the same as Sin #5 but the first step, determining the minimum number of SOQL query to run can difficult. The only solution we have found that works reliably is iterative.
  1. Sort column in ascending order by the length of their name or length of their corresponding formula.
  2. Set maxColumn = total # of columns
  3. Run a query run using columns 1 thru maxColumn with a LIMIT 1.
  • If the query succeeds then save the query, remove the query's columns from the list of available columns.
  • If the query fails then set maxColumn = maxColumn – 1 and start step 3 again.
  1. Set maxColumn = # remaining columns
  2. If maxColumn > 0 goto Step #3, otherwise all queries have been determined.


The “SOQL Sins” described in this article were accumulated over a number years while developing:
  • CopyStorm – keep a hot backup of your Salesforce instance in SQL/Server, Oracle, MySQL, PostgreSQL, and H2
  • CopyStorm/Restore – restore a backup to any production, development, or sandbox instance. This is a restore tool even your manager could use.
  • SQLForce – a standard Python module for Salesforce. Works with Python 2.8. 3.3, and 3.4. Use SQLForce when Apex is just too hard.
For more information see

Friday, March 28, 2014

Load Baseball Teams into SFDC without DataLoader

It's spring and I wanted to populate a sandbox Salesforce with major league basedball teams, their rosters, and a few opportunities and cases (made up of course).  Given the hassle of managing all the links between related objects with DataLoader, I knew I want something a lot simpler.

My approach: With a short Python script and an Excel workbook (one sheet per SFDC table) I managed to import teams and rosters as quickly as I could type them into Excel.

My source data in Excel was simple:

  • the first row contained SFDC column names EXCEPT for column 1.
  • Column #1 contained a synthetic primary key that I used to refer to the row from other spreadsheets.
For example: The St. Louis Cardinals Account row looked like:
CardsSt. Louis CardinalsSt LouisBaseball

Contact records referred to their associated team using synthetic keys. For example, here is a player from the Cardinals.


The special syntax AccountId->Account tells the Python import script to look up the parent record id at run time based on the SFDC id given to the synthetic key.

Is this clever? Probably not so much. But it did make it possible to import a lot of interrelated data into Salesforce with the hassle of importing CSV's one at a time and keeping track of SFDC ids.

For the record, the Python script is in Python 3 and uses two extension modules:

  • xlrd for reading Excel workbooks
  • SQLForce for talking to Salesforce.

Friday, February 7, 2014

How to Populate a Developer Sandbox with Real Data in 5 Minutes

Anyone that has ever tried to populate a Salesforce database with a lot of tables using Dataloader knows the level of pain and patience required. I am not suffering that pain anymore when I need to build a test environment.

Take a guess: How long does it take to populate a developer sandbox with a subset of your production Salesforce data when you want:

  • The top 500 revenue Opportunities in the past year.
  • All related Account, Contact, Line Items, Quotes, Quote Line Items, and a few other records.
Would you believe 3 minutes and 38 seconds? Here's the screen shot of the stuff I just restored to a blank developer sandbox.

My formerly empty sandbox now has 518 Opportunities, 451 Accounts, 5072 Contacts, and lots of other stuff. I hardly had time to get a coffee. Perhaps the coolest thing is that all the difficult stuff of making sure the the records reference each other property with their newly assigned Salesforce Ids happened.

Friday, January 31, 2014

Salesforce Email Templates & Attachments

This morning I needed to understand what attachments are included with each Email Template. I looked at answering the problem using the Salesforce GUI -- no luck.  I though about doing via SOQL using but knew that SOQL could not represent the query directly.

In the end I answered the question with a short SQL query against my hot backup of Salesforce in SQL/Server.

JOIN Attachment ATT ON
WHERE = 'License Keys' and isActive=1 and LIKE 'Confidential%'

It gave me exactly what I wanted.

Email Template Name Folder Attachment Name
Confidential API Float License Keys Confidential Float API Activation.pdf
Confidential JMS Float License Keys Confidential Float JMS Activation.pdf
Confidential NLME Float License Keys Confidential Float Software Activation.pdf
Legacy Confidential Float License Keys Confidential Float Software Activation.pdf

A backup is always a good idea...Especially when it save me time.

Friday, January 24, 2014

How to Delete 100,000 Salesforce Tasks While Watching TV

Do you subscribe to one of those pesky mass marketing providers that clutter your Salesforce Tasks with 1000s of new entries every day? I discovered that our Salesforce org had 100,000's of obsolete tasks created by a "Unnamed Company starting with Genius" They were taking up a lot of expensive Salesforce data space and I needed to delete them. Where did I turn? Not to the Salesforce mass delete tools -- I had 100,000s to delete. Instead I wrote the following short Python script and solved the problem in about 10 minutes.

import SQLForce

session = SQLForce.Session("production" )
session.setenv('QUERY_ALL', "true" ) # Force archived records to be read.

## SOQL that determines what records to delete. BE VERY CAREFUL HERE!
taskSOQL = """SELECT id, subject FROM Task WHERE isDeleted=false AND subject LIKE 'Genius e-mail %' order by systemmodstamp DESC"""

nDeleted = 0
maxPerBatch = 1000 # This is a good size to prevent the Task query from timing out.
maxPerRun = 50000   # quit once this number of deletes has happened.

while True:
    idsToDelete = []
    soql = taskSOQL + " LIMIT " + str(maxPerBatch)
    for rec in session.selectRecords(soql):
        idsToDelete.append( )

    if not idsToDelete:

    nThisTime = session.delete('Task', idsToDelete )
    nDeleted += nThisTime
    print("Deleted " + str(nDeleted) + " records so far")
    if nDeleted >= maxPerRun:


The only tricky part is the SOQL that selects tasks to delete. Get this one wrong and you can do major damage to your Salesforce instance quickly.

The Python I used, SQLForce, is freely available to anyone. There's nothing but good news with SQLForce/Python.

Tuesday, January 21, 2014

Upload an Attachment to Salesforce in One Line of Code

I do a lot of Salesforce work using Python. The experimental SQLForce/Jython work I did a few years ago has been my toolkit of choice until my recent migration of the code to Python (2.7 and 3.3). In the migration I added a couple of packages to make reading/writing attachments dirt simple. Here's an example for writing an attachment:

import SQLForce
from SQLForce import AttachmentWriter

session = SQLForce.Session("MyConnectionParameters")

salesforceParentId = '001A0000014fia3'  # The unique id of the Salesforce object to own the attachment
filename = "/mydir/salesforcecast.docx"   # The file to add as an attachment.

attachmentId = AttachmentWriter( session, salesforceParentId, filename )

The code that uploads the attachment is the last line -- The only tough part is determining the attachment parent. For a more realistic example that reads the attachment data to upload from a spreadsheet see this longer example.

There's nothing but good news with SQLForce/Python.