Tracardi Data Partitioning Configuration
Overview
Tracardi allows flexible data partitioning to enhance system performance, optimize storage, and streamline data retrieval. By configuring partitioning intervals for various data types, such as events, profiles, and sessions, users can better manage large volumes of data and ensure efficient data handling. This guide provides details on configuring data partitioning via environment variables and Helm deployment.
1. Configuring Partitioning Using Environment Variables
Tracardi supports setting data partitioning intervals through environment variables. Each environment variable defines how frequently specific data types (e.g., events, profiles, sessions) are segmented.
Available Partitioning Environment Variables
Below are the environment variables used to control partitioning frequencies in Tracardi:
EVENT_PARTITIONING
- Default:
month
- Purpose: Controls partitioning for event data.
-
Example:
EVENT_PARTITIONING=month
(for monthly partitions) -
PROFILE_PARTITIONING
- Default:
quarter
- Purpose: Manages partitioning intervals for user profiles.
-
Example:
PROFILE_PARTITIONING=quarter
(for quarterly partitions) -
SESSION_PARTITIONING
- Default:
quarter
- Purpose: Specifies partitioning frequency for session data.
-
Example:
SESSION_PARTITIONING=quarter
(for quarterly partitions) -
ENTITY_PARTITIONING
- Default:
quarter
- Purpose: Defines partitioning strategy for entity data.
-
Example:
ENTITY_PARTITIONING=quarter
-
LOG_PARTITIONING
- Default:
month
- Purpose: Sets the partitioning schedule for log data.
-
Example:
LOG_PARTITIONING=month
(for monthly partitions) -
USER_LOG_PARTITIONING
- Default:
year
- Purpose: Configures partitioning frequency for user log data.
- Example:
USER_LOG_PARTITIONING=year
Applying Environment Variables
To set these variables, add them to your environment configuration file or directly to your deployment settings. Ensure each variable aligns with your data retention policies and system performance goals.
2. Partitioning Configuration in Helm Chart
For deployments using Helm, partitioning can be directly configured in the values.yaml
file. This method allows partitioning settings to apply consistently across both public and private APIs.
Example Configuration in values.yaml
Below is an example of how to configure partitioning in the Helm chart under the api
section:
api:
public:
config:
eventPartitioning: "month" # Monthly partitioning for events in the public API
profilePartitioning: "quarter" # Quarterly partitioning for profiles in the public API
sessionPartitioning: "quarter" # Quarterly partitioning for sessions in the public API
private:
config:
eventPartitioning: "month" # Monthly partitioning for events in the private API
profilePartitioning: "quarter" # Quarterly partitioning for profiles in the private API
sessionPartitioning: "quarter" # Quarterly partitioning for sessions in the private API
Applying Helm Partitioning Settings
- Public API Configuration: Use this section to control partitioning for data accessible via the public API.
- Private API Configuration: Use this section to manage partitioning for data restricted to the private API.
After configuring the values.yaml
file, deploy the Helm chart to apply the partitioning settings across your Tracardi installation.
3. Best Practices for Partitioning in Tracardi
To maximize the efficiency and effectiveness of partitioning in Tracardi, consider these best practices:
- Optimize Based on Data Type:
- Use shorter intervals (e.g., monthly) for high-frequency data types such as
EVENT_PARTITIONING
andLOG_PARTITIONING
. -
For less frequently accessed data, like user logs, set a longer interval (e.g., yearly) with
USER_LOG_PARTITIONING
. -
Align with Retention Policies: Establish partitioning intervals that align with your organization’s data retention policies to minimize storage costs while ensuring data availability.
Understanding Data Partitioning in Tracardi
Data partitioning is a database management technique where large datasets are divided into segments, or partitions, which can be managed and accessed independently. This strategy is especially useful in Tracardi for handling large-scale data, improving performance, and simplifying management tasks. Benefits of Data Partitioning in Tracardi
Tracardi partitions its data into time-based indices, such as monthly or quarterly partitions, which enhances system efficiency in multiple ways:
- Improved Performance: Smaller indices are faster to read from and write to, optimizing system performance.
- Easier Data Management: Time-based partitions simplify archiving, where older data can be moved to slower, more cost-effective storage.
- Efficient Deletion: Deleting old data becomes easier when it is contained within specific time-based partitions.
Tracardi applies this partitioning to several key data types:
- Events: Capturing user actions and interactions.
- Profiles: Storing user information, such as demographics and preferences.
- Sessions: Documenting individual user sessions on a platform.
- Logs: System logs for monitoring and debugging purposes.
How Tracardi Uses Aliases for Partitioned Data
To unify these partitions for seamless access, Tracardi employs Elasticsearch aliases. An alias allows the GUI to access data from all relevant indices as if it were stored in a single, consolidated index. For example, session data across multiple quarters can be accessed through an alias such as prod-09x.8504a.tracardi-session, enabling Tracardi’s GUI to display data across partitions without manual index selection. Alias Naming Convention in Tracardi
Tracardi follows a specific naming convention for its aliases:
- Environment: Typically prod for production.
- db_version: The database version, found in the GUI by clicking on the Tracardi logo (distinct from the system version).
- Tenant: The tenant identifier, also visible in the GUI.
- Type of Data: Specifies data type, such as session, event, etc.
For instance, an alias for all production session data could be:
Troubleshooting: Data Not Visible in GUI After a Quarter
If data becomes inaccessible in the GUI at the beginning of a new month or quarter, it might indicate an issue with the automatic alias update. Although uncommon, this problem can arise if updates are incomplete or not configured correctly. Follow these steps to resolve the issue:
Manually Adding a New Index to an Alias
Verify the Current Alias Configuration Use the following curl command to list all aliases and their associated indices, checking if the new index is included:
If the new quarterly or monthly index (e.g., prod-09x.8504a.tracardi-session-2024-q4) is missing from the alias, proceed to the next step.
Add the New Index to the Alias
Manually add the missing index to the alias using the curl command below. Adjust db_version and tenant values based on your setup:
curl -X POST "http://localhost:9200/_aliases" -H "Content-Type: application/json" -d' {
"actions": [
{
"add": {
"index": "prod-09x.8504a.tracardi-session-2024-q4",
"alias": "prod-09x.8504a.tracardi-session"
}
}
]
}
'
Confirm the Alias Update
Verify the alias update with the following command to ensure the new index has been added:
Example output
Alias | Index | Filter | Routing.Index | Routing.Search | Is_Write_Index |
---|---|---|---|---|---|
09x.8504a.tracardi-entity | 09x.8504a.tracardi-entity-2024-q4 | - | - | - | - |
09x.8504a.tracardi-session | 09x.8504a.tracardi-session-2024-q4 | - | - | - | - |
prod-09x.8504a.tracardi-event | prod-09x.8504a.tracardi-event-2024-q4 | - | - | - | - |
09x.8504a.tracardi-profile | 09x.8504a.tracardi-profile-2024-q4 | - | - | - | - |
prod-09x.8504a.tracardi-session | prod-09x.8504a.tracardi-session-2024-q4 | - | - | - | - |
09x.8504a.tracardi-log | 09x.8504a.tracardi-log-2024-11 | - | - | - | - |
09x.8504a.tracardi-field-update-log | 09x.8504a.tracardi-field-update-log-2024-11 | - | - | - | - |
prod-09x.8504a.tracardi-entity | prod-09x.8504a.tracardi-entity-2024-q4 | - | - | - | - |
prod-09x.8504a.tracardi-field-update-log | prod-09x.8504a.tracardi-field-update-log-2024-11 | - | - | - | - |
prod-09x.8504a.tracardi-profile | prod-09x.8504a.tracardi-profile-2024-q4 | - | - | - | - |
prod-09x.8504a.tracardi-log | prod-09x.8504a.tracardi-log-2024-11 | - | - | - | - |
09x.8504a.tracardi-event | 09x.8504a.tracardi-event-2024-q4 | - | - | - | - |
After verifying, refresh the Tracardi GUI to ensure the new data is visible.
Routine Maintenance Tips
- Check Aliases Regularly: After updates or the beginning of new time partitions, it’s beneficial to verify alias configurations to ensure the GUI displays all relevant data partitions.
- Automate Alias Verification: For organizations managing multiple tenants or high data volume, consider automating alias checks to minimize potential downtime in data access.