Article

Connect Teradata Vantage to Salesforce Using Amazon Appflow

Many Teradata customers are interested in integrating Vantage with AWS First Party Services. This guide will help you to connect Vantage to Salesforce using Amazon Appflow.

April 15, 2021 10 min read
Connecting Vantage to Salesforce Using AWS First Party Services
Many Teradata customers are interested in integrating Teradata Vantage with Amazon Web Services (AWS) First Party Services. This guide will help you to connect Teradata Vantage to Salesforce using Amazon Appflow.

The procedure offered in this guide has been implemented and tested by Teradata. However, it is offered on an as-is basis. Amazon does not provide validation of Teradata Vantage using Amazon Appflow.

This guide includes content from both Amazon and Teradata product documentation.


Overview

This article describes how to retrieve customer information from Salesforce using Amazon Appflow and combine it with order and shipping information from Vantage to derive analytical insights.
Amazon Appflow transfers the customer data from Salesforce via Amazon S3. Teradata Vantage then uses Native Object Store (NOS) functionality to join the data on S3 with data in Vantage within a single query. 

About Amazon Appflow

AWS AppFlow is a fully managed integration service that enables users to securely transfer data between Software-as-a-Service (SaaS) applications, like Salesforce, Marketo, Slack, and ServiceNow, and AWS services, like Amazon S3 and Amazon Redshift.

AppFlow automatically encrypts data in motion and allows users to restrict data from flowing over the public internet for SaaS applications that are integrated with AWS PrivateLink, reducing exposure to security threats. 

About Teradata Vantage

Vantage is the connected multi-cloud data platfrom for enterprise analytics. 

Vantage combines descriptive, predictive, prescriptive analytics, autonomous decision-making, ML functions, and visualization tools into a unified, integrated platform that uncovers real-time business intelligence at scale, no matter where the data resides.

Vantage enables companies to start small and elastically scale compute or storage, paying only for what they use, harnessing low-cost object stores and integrating their analytic workloads.

Vantage supports R, Python, Teradata Studio, and any other SQL-based tools. You can deploy Vantage across public clouds, on-premises, on optimized or commodity infrastructure, or as-a-service.

Teradata Vantage Native Object Store (NOS) can be used to explore data in external object stores, like Amazon S3, using standard SQL. No special object storage-side compute infrastructure is required to use NOS. You can explore data located in an S3 bucket by simply creating a NOS table definition that points to your bucket. With NOS, you can quickly import data from S3 or even join it with other tables in the Vantage database. 

See the documentation for more information on Teradata Vantage.
 

Prerequisites

You should be familiar with Amazon Web Services concepts, Amazon Appflow, and Teradata Vantage.
 
You will need the following accounts and systems:
  • AWS account with appropriate permissions (you can create free account),
  • An S3 bucket,
  • Teradata Vantage with the Advanced SQL Engine 17.0 or higher, and
  • A Salesforce account that satisfies the following requirements:
    • Your Salesforce account must be enabled for API access.
      API access is enabled by default for Enterprise, Unlimited, Developer, and Performance editions.
    • Your Salesforce account must allow you to install connected apps.
      If this is disabled, contact your Salesforce administrator. After you create a Salesforce connection in Amazon AppFlow, verify that the connected app named "Amazon AppFlow Embedded Login App" is installed in your Salesforce account.
    • The refresh token policy for the "Amazon AppFlow Embedded Login App" must be set to "Refresh token is valid until revoked". Otherwise, your flows will fail when your refresh token expires.
    • You must enable Change Data Capture in Salesforce to use event-driven flow triggers. From Setup, enter "Change Data Capture" in Quick Find.
    • If your Salesforce app enforces IP address restrictions, you must whitelist the addresses used by Amazon AppFlow. For more information, see AWS IP address ranges in the Amazon Web Services General Reference.
    • If you are transferring over 1 million Salesforce records, you cannot choose any Salesforce compound field. Amazon AppFlow uses Salesforce Bulk APIs for the transfer, which does not allow transfer of compound fields.
    • To create private connections using AWS PrivateLink, you must enable both "Manager Metadata" and "Manage External Connections" user permissions in your Salesforce account. Private connections are currently available in the Northern Virginia (us-east-1) and Oregon (us-west-2) AWS Regions.
    • Some Salesforce objects can't be updated, such as history objects. For these objects, Amazon AppFlow does not support incremental export (the "Transfer new data only" option) for schedule-triggered flows. Instead, you can choose the "Transfer all data" option and then select the appropriate filter to limit the records you transfer.

Procedure

These are the steps to connect Teradata Vantage to Salesforce using Amazon Appflow:
  • Create a flow in Amazon Appflow
  • Explore the data using NOS in Vantage

Create a flow in Amazon Appflow

This step creates a flow using Amazon AppFlow. For this example, we are using a Salesforce developer account to connect to Salesforce.

Log into the AWS Console

Sign into the AppFlow console with your AWS login credentials.

Click Create flow.

Ensure that you are in the right region.

Picture1.png

Ensure that you have an S3 bucket created. We will use the bucket, “ptctsoutput”, for this guide.

Specify flow details

This step provides basic information for your flow.

Fill in Flow name (i.e. salesforce) and Flow description (optional).

Leave Customize encryption settings (advanced) unchecked.

Click Next.
 

Configure flow

This step provides information about the source and destination for your flow.

For this guide, we will be using Salesforce as the source, and AWS S3 as the destination.

For Source name, choose Salesforce, then Create new connection for Choose Salesforce connection.

Picture1-(1).png
 

Use the defaults for Salesforce environment and Data encryption.

Give your connection a name (i.e. salesforce).

Click Continue.

Picture1-(2).pngAt the Salesforce login window, enter your Username and Password.

Click Log In.

Picture1-(3).png


Click Allow to allow Appflow to access your Salesforce data and information.
Picture1-(4).png

On the Appflow Configure flow window, use Salesforce objects.

Picture1-(5).png

Choose Account to be the Salesforce object.

Picture1-(6).pngUse Amazon S3 as Destination name.

Select your S3 bucket (referenced above) where you want the data to be stored (i.e. ptctsoutput).
Picture1-(7).png
Click Run on demand for the Flow trigger.

Click Next.

Picture1-(8).png
 

 

Map data fields

This step determines how data is transferred from the source to the destination.

Select Manually map fields for the Mapping method.

For simplicity, choose Map all fields directly for the Source to destination filed mapping.

Picture1-(9).png

Once you click on Map all fields directly, all of the fields will display under Mapped fields.

You may click on the checkbox for the field(s) you want to Add formula (e.g. concatenate), Modify values (e.g. mask or truncate field values), or Remove selected mappings.

For simplicity, no checkboxes will be ticked in this guide.

For Validations, add in a condition to ignore the record that contains no “Billing Address”.

Click Next.

Picture1-(10).png

Add a filter

You can add filters to determine which records to transfer.

For this guide, add a condition to filter out the records that are Deleted.

Click Next.

Picture1-(11).png

Review and create the flow

Review all the information you just entered. Modify if necessary.

Click Create flow.

A “salesforce successfully created” message will display with the flow information once the flow is created.

Picture1-(12).png

 

Run the flow

Click Run flow in the upper right corner.

Upon successful completion of the flow, a message will display.

Picture1-(13).pngClick the link to the bucket to view the data.

The data from Salesforce will be in JSON format.
 

Change the data file properties

By default, the data from Salesforce is encrypted. We need to remove the encryption for NOS to access it.

Click on the data file in your S3 bucket, then click the Properties tab.

Picture1-(14).png

On the Properties tab, click on the Encryption tile.

Click on the None option to remove the encryption (e.g. AWS-KMS).

Click Save.

Picture1-(15).png
 

Explore the data in S3 using NOS dynamically

The Native Object Store (NOS) feature in Vantage has built in functionality to explore and analyze data in object storage, such as Amazon S3.

This section describes how you may dynamically read an object (file) in S3 without creating any permanent objects in Vantage. This is an agile method when you are investigating data.
 

Use the READ_NOS Table Operator

The READ_NOS table operator can be used to sample and explore a percent of the data without having to define a foreign table or a view on an object in S3.

The following SQL example shows the reading of an object in S3 using the full name of the object. You can use Teradata Studio or Teradata Studio Express to run SQL commands.

Screen-Shot-2021-04-16-at-12-06-20-PM.png 
The ACCESS_ID is the access key ID for your AWS account and the ACCESS_KEY is your secret access key.

This is output from the above SQL command:

Picture1-(16).png

 

Import data from a file on S3 into Vantage with CREATE TABLE

If you expect to use the data from a file on S3 many times, you can increase performance by making a persistent copy of the data within Vantage. Accessing data on S3 through a foreign table does not make a persistent copy, which requires Vantage to repeatedly retrieve data from S3 for each query.

This method uses the CREATE TABLE AS…WITH DATA statement. Using this approach allows you to choose which attributes within the foreign table payload that you want to include in the target table, and what the relational table columns will be named.

Screen-Shot-2021-04-16-at-12-08-06-PM.png
The above statement creates a table named, salesforceReadNOS.

Use the following SQL statement to query it.
Screen-Shot-2021-04-16-at-12-09-00-PM.pngThis is a partial result from the SELECT statement.

Picture1-(17).png
 

Explore the data in S3 using NOS with a foreign table

The Native Object Store (NOS) feature in Vantage has built in functionality to explore and analyze data in object storage, such as Amazon S3.

Typically, you will define a permanent object, a foreign table, in Vantage to access the data in S3 as if it were a table within Vantage.

This section describes how you may use different capabilities of NOS to explore the data from Salesforce that is in the S3 bucket using a foreign table.

Create Foreign Table

A foreign table allows the data in S3 to be easily referenced within the Vantage SQL Engine and makes the data available in a structured relational format.

Login to Vantage using Teradata Studio or Teradata Studio Express.

Create an AUTHORIZATION object with the access key ID and secret access key for S3 bucket access. The Authorization object enhances security by establishing control over who is allowed to use a foreign table to access AWS S3 data.

Screen-Shot-2021-04-16-at-12-10-25-PM.png
 
The USER is the access key ID for your AWS account and the PASSWORD is your secret access key.

Create a foreign table on the JSON file in S3 using following command.

Screen-Shot-2021-04-16-at-12-11-40-PM.pngAt a minimum, the foreign table definition must include a table name and location clause (highlighted in yellow) which points to the object store data. The Location requires a top-level single name, which is the S3 bucket.

If the file name doesn’t have standard extension (.json, .csv, .parquet) at the end, the Location and Payload columns definition is also required (highlighted in turquoise) to indicate the type of the data file.

Foreign tables are always defined as No Primary Index (NoPI) tables.

Once foreign table is created, you can query the contents of the S3 data set by using SELECT.

Screen-Shot-2021-04-16-at-12-12-21-PM.png
The foreign table only contains two columns: Location and Payload.

Location is the address in the object store system. The data itself is represented in the payload column, with the payload value within each record representing a single JSON object and all its name-value pairs.

This is a sample output from “SELECT * FROM salesforce;”.

Picture1-(18).png

This is a sample output from “SELECT payload.* FROM salesforce;”.

Picture1-(19).png
 

Use the JSON_KEYS Table Operator

JSON data may contain different attributes in different records. To determine the full list of possible attributes in a data store, you can use the JSON_KEYS table operator. (You must have first created the foreign table from the previous step.)

Screen-Shot-2021-04-16-at-12-14-07-PM.pngThis is the partial output from the JSON_KEYS table operator.

Picture1-(20).png
 

Create View

Views can simplify the names associated with the payload attributes, which makes it easier to use the foreign table definition. (You must have first created the foreign table from the previous step.)

The following view statement uses attributes discovered from the JSON_KEYS table operator above.

Screen-Shot-2021-04-16-at-12-15-34-PM.pngWith the view created, you can select the data from the file in S3 with a much simpler SQL statement.

Screen-Shot-2021-04-16-at-12-16-04-PM.pngThis is the partial output from the view.

Picture1-(21).png
 

Join data from a file on S3 to in-database tables

Once a foreign table is defined, it may be treated like most other tables in Vantage, including joining them in a single statement for detailed analysis.

For example, imagine that Vantage has tables for orders and shipping. The following are the DDL statement that define these tables.

This is the Orders table, which holds the header information (one per order).

Screen-Shot-2021-04-16-at-12-17-11-PM.pngThis is the Order Items table, which holds the individual items on each order (one or more per order).

Screen-Shot-2021-04-16-at-12-17-36-PM.pngThis is the Address table, which holds the shipping address for customers.
Screen-Shot-2021-04-16-at-12-18-03-PM.png
Now further imagine that these table have the following data.

Orders:
Picture1-(22).png

Order items:
Picture1-(23).png

Shipping addresses:
Picture1-(24).png

By joining the salesforce foreign table to the above tables, we can retrieve a customer’s order information with a customer’s shipping information.
Screen-Shot-2021-04-16-at-12-20-32-PM-(1).png

This is a sample of the results of the above query.

Picture1-(25).png
 

Import data from a file on S3 into Vantage with CREATE TABLE

If you expect to use the data from a file on S3 many times, you can increase performance by making a persistent copy of the data within Vantage. Accessing data on S3 through a foreign does not make a persistent copy, which requires Vantage to repeatedly retrieve data from S3 for each query.

This method uses the CREATE TABLE AS…WITH DATA statement. Using this approach allows you to choose which attributes within the foreign table payload that you want to include in the target table, and what the relational table columns will be named.

Screen-Shot-2021-04-16-at-12-22-36-PM.png
The above statement creates a table named, salesforceVantage.

Use the following SQL statement to query it.

Screen-Shot-2021-04-16-at-12-23-15-PM.png

This is a partial result from the SELECT statement.

Picture1-(26).png
 

Import data from a file on S3 into Vantage with INSERT SELECT Statement

If you expect to use the data from a file on S3 many times, you can increase performance by making a persistent copy of the data within Vantage. Accessing data on S3 through a foreign does not make a persistent copy, which requires Vantage to repeatedly retrieve data from S3 for each query.

This method uses the INSERT...SELECT statement. Using this approach allows you to both choose which attributes within the foreign table payload that you want to include in the target table as well as change the target table’s attributes. For example, you can specify that the target table be MULTISET or not, or you can choose a different primary index.

This approach does require the target table to be create beforehand, which the following SQL statement will do.

Screen-Shot-2021-04-16-at-12-24-50-PM.pngThe data must then be added (inserted) from the file on S3 into the target table that was just created.

Screen-Shot-2021-04-16-at-12-26-17-PM.pngThe previous statement above created a table named, salesforcePerm.

Use the following SQL statement to query it.

Screen-Shot-2021-04-16-at-12-27-46-PM.pngThis is a partial result from the SELECT statement.

Picture1-(27).png
 

Cleanup

This is an optional step to help you avoid extra charges in your AWS account.
 

Remove Amazon Appflow

Delete the Connections that you created for the flow.

Delete the flow.
 

Delete data in the S3 bucket

Delete the data from Salesforce in the S3 bucket.

If you no longer need the bucket, you may also delete the bucket.
 

Stop the Vantage instance

If you no longer need the Vantage instance, you may stop and terminate it.

Tags

About Shamira Joshua

Shamira Joshua is a part of the Product team at Teradata and leads cloud native integrations for Teradata’s flagship platform Vantage.

Shamira is a goal driven technologist and has a breadth of experience gained from working in companies like Cisco, Alcon labs. In her role in Teradata as a lead for cloud native integrations, Shamira is responsible for driving Vantage integrations with services & applications developed by hyperscalers to provide customers the best options to modernize as they journey to the cloud.

View all posts by Shamira Joshua

About Wenjie Tehan

Wenjie is a Technical Consulting Manager, currently working with the Teradata Global Alliances team. 
 
With over 20 years in the IT industry, Wenjie has worked as developer, tester, business analyst, solution designer and project manager. This breadth of roles makes her perfect for the current role, understanding how the business needs data and how this data can be managed to meet those business needs.  
 
Wenjie has a BS in computer science from University of California at San Diego, and ME in computer engineering at Cornell University. Wenjie is also certified on both Teradata and AWS. View all posts by Wenjie Tehan

Stay in the know

Subscribe to get weekly insights delivered to your inbox.



I consent that Teradata Corporation, as provider of this website, may occasionally send me Teradata Marketing Communications emails with information regarding products, data analytics, and event and webinar invitations. I understand that I may unsubscribe at any time by following the unsubscribe link at the bottom of any email I receive.

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Statement.