In data management, it is crucial to connect with external data sources. SQL Server provides a powerful feature known as CONNECTION_OPTIONS in the CREATE EXTERNAL DATA SOURCE command, which allows users to specify the client behavior when connecting to an external data source. This feature offers flexibility and control, enabling users to tailor the connection to their needs.
Connecting to external data sources is a fundamental aspect of many business operations. Whether it’s for integrating data from CRM systems, importing product data from suppliers, or merging internal data with social media feeds, the ability to connect and interact with external data sources is crucial. It allows businesses to gather insights from different data, leading to better decision-making and forecasting.
Understanding SQL CONNECTION_OPTIONS in External Data Source
The CONNECTION_OPTIONS argument is a key-value pair where the key is the keyword and the value is for a specific connection option. The key-value pair is used to define the properties of the connection. For example, to specify the server address, database name, or authentication method.
A connection option is a setting that specifies the behavior of a connection. In the context of databases, it often refers to the parameters of the connection URI that specify the client behavior. For example, in MySQL, the Option= value in the connection string is the sum of the numeric values for various flags that specify how Connector/ODBC should work.
Connection_options are separated by a semi-colon, to use multiple connection options. When the CREATE EXTERNAL DATA SOURCE command is executed, SQL Server uses these options to connect to the external data source. The connection_options are interpreted by the driver to the external data provider.
Here’s an example of how to create an external data source with multiple connection options:
CREATE EXTERNAL DATA SOURCE MyExtDataSource
WITH
(
TYPE = RDBMS,
LOCATION = 'myserver.database.windows.net',
DATABASE_NAME = 'mydb',
CREDENTIAL = MyCredential,
CONNECTION_OPTIONS = 'encrypt=true;trustServerCertificate=false'
);
In this example, MyExtDataSource is the name of the external data source. The TYPE is RDBMS, indicating that the data source is a relational database management system. The LOCATION is the server address, and DATABASE_NAME is the name of the database CREDENTIAL is the credential used to connect to the database.
The CONNECTION_OPTIONS argument is where you specify the behavior of the client. In this case, two connection options are specified: encrypt=true and trustServerCertificate=false. These connection_options are separated by a semi-colon.
Examples of Connection Options for Different Providers
Here are some examples of connection options for different providers:
SQL Server
Oracle
Teradata
MongoDB API for Cosmos DB
Generic ODBC
1. SQL Server
You can only specify the key-value pairs having an entry in DSN and Connection String Keywords and Attributes under the DSN / Connection String Keyword column. For example, the SQL_ATTR_TXN_ISOLATION keyword is not supported, because that is an attribute set using SQLSetConnectAttr, not in the connection string.
In this case, we’ll use the SQL_ATTR_TXN_ISOLATION attribute, which sets the transaction isolation level. However, it’s not supported in the connection string because it’s an attribute set using SQLSetConnectAttr, not in the connection string.
Here’s an example of how to set it:
SQLSetConnectAttr(hdbc, SQL_ATTR_TXN_ISOLATION, (SQLPOINTER)SQL_TXN_SERIALIZABLE, 0);
In this example, hdbc is the connection handle, SQL_ATTR_TXN_ISOLATION is the attribute to set, SQL_TXN_SERIALIZABLE is the isolation level to set, and 0 is the length of the string value.
This code sets the transaction of the highest isolation level to SQL_TXN_SERIALIZABLE. This means that all transactions are completely isolated; each transaction sees a snapshot of the database at the start of the transaction and does not see changes made by other transactions until they commit.
2. Oracle
Oracle connection options can be specified in the connection string. Here are some examples:
Basic: Specify the Host address, Port address, and Database Service Name.
Advanced: In the Connection String field specify the Single Client Access Name (SCAN) ID of databases running in a RAC cluster.
Here is an example of how to connect to Oracle Database from SQL*Plus:
sqlplus your_user_name/your_password@localhost:1521/service_name
In this example, your_user_name is your Oracle username, your_password is your Oracle password, localhost is the hostname of the Oracle server (replace with the actual server name or IP address), 1521 is the port number, and service_name is the name of the Oracle service you are connecting to.
3. Teradata:
You can connect using the Teradata database option from Get Data. Specify the Teradata server to connect to in the Server. If you’re connecting from Power BI Desktop, select either the Import or DirectQuery data connectivity mode.
Here is an example of how to connect to Teradata using JDBC:
String DB_URL = "jdbc:teradata://localhost";
String USER = "dbc";
String PASS = "dbc";
String QUERY = "SELECT * FROM dbc.dbcinfo";
Connection conn = DriverManager.getConnection(DB_URL, USER, PASS);
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(QUERY);
while (rs.next()) {
System.out.println(String.format("setting: %s, value: %s", rs.getString(1), rs.getString(2)));
}
In this example, localhost is the hostname of the Teradata server (replace with the actual server name or IP address), dbc is both the username and password and SELECT * FROM dbc.dbcinfo is the SQL query to execute.
4. MongoDB API for Cosmos DB
Get the MongoDB connection string from the Azure portal to connect to a MongoDB API for Cosmos DB. The connection string format:
mongodb://username:password@host:port/[database]?ssl=true
Here is an example of how to connect to MongoDB API for Cosmos DB using PyMongo:
from pymongo import MongoClient
client = MongoClient("mongodb://username:password@host:port/database?ssl=true")
db = client.test
In this example, the username is your Cosmos DB username, the password is your Cosmos DB password, the host is the hostname of the Cosmos DB server (replace with the actual server name or IP address), the port is the port number, and database is the name of the database you are connecting to.
5. Generic ODBC
For a generic ODBC connection, make sure the driver you want is installed. Then, you can select the data source and provide the connection info. You can either provide a DSN (data source name) that already exists or create with the ODBC Data Source Administrator applet. Alternatively, you can provide a connection string.
Here is an example of how to connect to a generic ODBC data source:
Driver={ODBC Driver 17 for SQL Server};Server=myServerAddress;Database=myDataBase;Uid=myUsername;Pwd=myPassword;
In this example, ODBC Driver 17 for SQL Server is the name of the ODBC driver, myServerAddress is the address of the server, myDataBase is the name of the database, myUsername is the username, and myPassword is the password.
Specifying Key-Value Pair
Key-value pairs in SQL are often used to store flexible data, where the keys and values can vary for different rows. Here’s an example of how to create a table for key-value pairs:
CREATE TABLE key_value_pairs (
itemid varchar(32) NOT NULL,
itemkey varchar(32) NOT NULL,
itemvalue varchar(32) NOT NULL,
CONSTRAINT ct_primarykey PRIMARY KEY(itemid,itemkey)
)
In this example, itemid is the item ID, itemkey is the key, and itemvalue is the value. The primary key combines itemid and itemkey, ensuring that each item can have each key only once.
When specifying key-value pairs in CONNECTION_OPTIONS, the key is the keyword and the value is for a specific connection option.
Here’s an example:
CREATE EXTERNAL DATA SOURCE MyExtDataSource
WITH
(
TYPE = RDBMS,
LOCATION = 'myserver.database.windows.net',
DATABASE_NAME = 'mydb',
CREDENTIAL = MyCredential,
CONNECTION_OPTIONS = 'encrypt=true;trustServerCertificate=false'
);
In this example, two connection options are specified: encrypt=true and trustServerCertificate=false.
DSN and Connection String Keywords and Attributes
A Data Source Name (DSN) is a data structure that contains the information required to connect to a database. It is essentially a string that identifies the source database, including the driver details, the database name, authentication credentials, and other necessary connection parameters.
Connection string keywords and attributes are used to specify the details of a connection to a database. For example, in a connection string for SQL Server, you might specify the server address, database name, or authentication method.
Here’s an example of a connection string:
Server=myServerAddress;
Database=myDataBase;
User Id=myUsername;
Password=myPassword;
These DSN and connection string keywords and attributes are used in CONNECTION_OPTIONS to specify how SQL Server connects to external data sources.
Connection String Keywords for Specific Providers
Different providers have different connection string keywords. Here are some examples for specific providers:
The Microsoft OLE DB Driver for SQL Server
SQL Native Client ODBS Driver
For Microsoft OLE DB Driver for SQL Server:
Basic: The basic connection string includes the provider, server address, database name, username, and password. Here’s an example:
Provider=MSOLEDBSQL;
Server=myServerAddress;
Database=myDataBase;
User Id=myUsername;
Password=myPassword;
Trusted: The trusted connection string is similar to the basic connection string, but includes the Trusted_Connection keyword, which indicates whether to use Windows authentication. Here’s an example:
Provider=MSOLEDBSQL;
Server=myServerAddress;
Database=myDataBase;
Trusted_Connection=yes;
For the SQL Native Client ODBC driver:
Basic:
Driver={SQL Server Native Client 11.0};
Server=myServerAddress;
Database=myDataBase;
Uid=myUsername;
Pwd=myPassword;
Trusted Connection:
Driver={SQL Server Native Client 11.0};
Server=myServerAddress;
Database=myDataBase;
Trusted_Connection=yes;
In these examples, myServerAddress is the server address, myDataBase is the name of the database, myUsername is the username, and myPassword is the password. The Trusted_Connection keyword indicates whether to use Windows authentication.
How do I Choose the right connection option for my use case?
Choosing the right connection option for your use case depends on several factors:
1. Type of Data Source:
In databases, a data source refers to the location from which the data being used originates. This could be a SQL Server database, an Oracle database, a MongoDB database, or any other type of database. Each of these databases is a different type of data source.
Each type of data source (database) has different features and capabilities.
SQL Server is a relational database management system from Microsoft for the enterprise environment. It supports a range of transaction processing, business intelligence, and analytics applications in corporate IT environments.
Oracle Database is a multi-model database management system produced and marketed by Oracle Corporation. It is a database to run online transaction processing, data warehousing, and mixed database workloads.
MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas.
Because of these differences in features and capabilities, each data source type supports a different set of connection options. Connection options are settings that you can use to customize the behavior of your connection to the data source.
For example, a connection option in SQL Server might allow you to specify the initial catalog (database) to connect to when you open the connection. This option might not be available or be named differently in MongoDB because MongoDB uses a different model (document-oriented) for organizing data.
2. Performance Needs:
Performance needs are all about how fast and efficient you need your connection to be. Some connection options can help improve the performance of your connection.
For instance
Caching: Caching stores data in a temporary storage area (cache) to speed up retrieval. The result is stored in the cache when a database query is executed. If the same query is executed again, the database checks the cache first. If the result is found there, it is returned immediately, bypassing the need to access the database again. This can significantly speed up data retrieval and improve performance. However, it’s important to manage the cache properly to ensure data accuracy, as the data in the cache can become outdated.
Batch Size: This refers to the number of records retrieved from the database. If you’re retrieving a large amount of data, it can be more efficient to retrieve data in smaller batches rather than all at once. This can reduce the load on the database and network making your application more responsive. However, if the batch size is too small, it can result in more round trips to the database, decreasing the performance. Therefore, choosing the right batch size can reduce load and minimize round trips.
Timeout Settings: This is the maximum amount of time that the system will wait for a response from the database before stopping the connection attempt. If your database is slow to respond or executing complex queries that take a long time to complete, you might need to increase the timeout setting. However, a longer timeout means that your application could be waiting for a long time if there’s a problem with the database. Therefore, it’s important to choose a timeout setting that is appropriate for your specific use case.
3. Security Requirements:
If you’re dealing with sensitive data, you can ensure it’s secure. Some connection options provide additional security measures.
For instance,
Encryption: Encryption is a method to convert data into a coded form (cipher text) to prevent unauthorized access. When data is sent between your application and the database, it can be encrypted to ensure that even if the data is intercepted, it cannot be read without the encryption key. Different types of encryption, such as SSL/TLS encryption, are commonly used for securing internet connections. In the context of databases, encryption can be applied to data at rest (stored data) and data ‘in transit’ (data being transmitted).
Authentication: Authentication is the process of verifying the identity of a user, process, or device, often as a prerequisite to allowing access to resources in an information system. This usually involves a username and a password but can include any other method of demonstrating identity, such as digital certificates, hardware tokens, or biometric data.
Data Masking: Data masking is a method to create a structurally similar but inauthentic version of an organization’s data that can be used for software testing and user training. This technique is used to protect the actual data while having a functional substitute for occasions when the real data is not required. For example, in a database table that stores customer information, you could mask the customer’s contact number and address while leaving other details, like their purchase history, intact for analysis.
4. Reliability Needs:
This refers to the need for your application to maintain a consistent and stable connection to the database. If your application requires constant data access, the connection to the database must be reliable.
Here are two key connection options related to reliability:
Automatic Retries: This is a feature where the system automatically attempts to re-establish the connection if it gets lost or interrupted. This can be crucial for those applications that need a persistent connection to the database. However, it is important to handle retries to avoid potential issues like infinite loops or excessive network traffic.
Transaction Support: Transactions ensure that a group of database operations are treated as a single unit of work. The changes made in a transaction are either all committed to the database (if the transaction is successful) or all rolled back (if the transaction fails), ensuring data consistency. This is crucial for operations where you wouldn’t want partial updates. For example, in a banking system, when transferring money from one account to another, you wouldn’t want the money to be withdrawn from one account and not deposited into another.
5. Compatibility Issues:
You can have compatibility issues when working with older systems or specific database versions. These issues can prevent your application from communicating properly with the database. To mitigate this, you might need to use specific connection options that are compatible with the version of the database you’re using.
For example, older versions of databases might not support certain features or syntax that newer versions do. In such cases, you would need to use connection options or write queries, that are compatible with the older version.
Conclusion
Understanding and effectively using CONNECTION_OPTIONS in SQL Server is crucial for managing connections to external data sources. Whether you’re working with different providers, specifying key-value pairs, or dealing with DSN and connection string keywords and attributes, CONNECTION_OPTIONS offers a powerful and flexible way to tailor your connections to meet your needs. As always, remember to refer to the documentation of the specific driver or data provider when configuring the CONNECTION_OPTIONS.
Comments