Dataset Webhook Export

Export large datasets asynchronously with webhook notifications

This feature requires an active Query Builder subscription. Contact our sales team if you need access to this product.

Export datasets asynchronously and receive the results via webhook when processing is complete. This is ideal for large datasets that would timeout with synchronous requests, or when you want to process data in the background without maintaining an active connection.

The webhook export is perfect for:

  • Large datasets with thousands of rows
  • Scheduled or automated data exports
  • Background processing workflows
  • Integration with data pipelines

How It Works

  1. Submit Export Request: Send your dataset query along with a webhook URL
  2. Receive Confirmation: Get immediate 201 response with export queued
  3. Processing: Your export is processed asynchronously in the background
  4. Webhook Notification: Your endpoint receives the results when complete

Request Format

The request requires:

  • dataset: Your query configuration (columns, filters, limits)
  • webhookURL: HTTPS endpoint where results will be sent
  • name: Descriptive name for your export
  • format: Either ‘csv’ or ‘jsonl’

Webhook Response

When your export completes, you’ll receive a POST request with this payload:

1{
2 "queryUUID": "123e4567-e89b-12d3-a456-426614174000",
3 "name": "Seattle Rental Data",
4 "format": "csv",
5 "payload": { /* your original dataset query */ },
6 "hmacSignature": "a1b2c3d4...",
7 "gcsBlob": {
8 "url": "https://storage.googleapis.com/...",
9 "expires": "2024-02-15T10:30:00Z",
10 "timeTakenMs": 45000,
11 "numberRows": 12847
12 },
13 "requestedOn": "2024-01-15T10:00:00Z",
14 "completedOn": "2024-01-15T10:00:45Z"
15}
$ curl --location 'https://api.hellodata.ai/dataset/export' \
>--header 'Content-Type: application/json' \
>--header 'x-api-key: your-api-key' \
>--data '{
> "dataset": {
> "scopes": [
> {"column": "street_address"},
> {"column": "asking_rent", "aggregate": "Avg"}
> ],
> "filters": [
> {"column": "msa", "filter": {"equals": "Seattle, WA"}},
> {"column": "bed", "filter": {"in": [1, 2, 3]}}
> ],
> "limit": 10000
> },
> "webhookURL": "https://yourapp.com/webhook/dataset-export",
> "name": "Seattle Rental Data Export",
> "format": "csv"
>}'

Implementing Your Webhook Handler

Your webhook endpoint should handle the incoming POST request and verify the HMAC signature for security:

1const express = require('express');
2const crypto = require('crypto');
3const app = express();
4
5app.use(express.json());
6
7app.post('/webhook/dataset-export', (req, res) => {
8 const { hmacSignature, payload, gcsBlob, queryUUID } = req.body;
9
10 // Verify webhook authenticity
11 const hmac = crypto.createHmac('sha256', 'your-api-key');
12 hmac.update(JSON.stringify(payload));
13 const expectedSignature = hmac.digest('hex');
14
15 if (expectedSignature !== hmacSignature) {
16 return res.status(401).send('Invalid signature');
17 }
18
19 console.log(`Export ${queryUUID} completed:`);
20 console.log(`- Download URL: ${gcsBlob.url}`);
21 console.log(`- Rows: ${gcsBlob.numberRows}`);
22 console.log(`- Processing time: ${gcsBlob.timeTakenMs}ms`);
23
24 // Download and process your data
25 // processExportedData(gcsBlob.url);
26
27 res.status(200).send('OK');
28});

Security & Verification

Always verify the HMAC signature to ensure the webhook request is legitimate:

  1. Compute HMAC: Use your API key and the original request payload
  2. Compare Signatures: Match against the hmacSignature field
  3. Reject Invalid Requests: Return 401 for signature mismatches

Important Notes

Download URLs expire in 30 days - make sure to download your files promptly after receiving the webhook notification.

  • Processing Time: Large exports may take several minutes to complete
  • File Format: Files are compressed with gzip for efficient transfer
  • Webhook Requirements: Your endpoint must be HTTPS and publicly accessible
  • Response Expected: Your webhook should respond with 2xx status to confirm receipt
  • Tracking: Use the queryUUID to match requests with responses

Error Handling

Status CodeDescription
400Invalid webhook URL (must be HTTPS, no localhost/private IPs)
403Invalid API key or insufficient permissions
201Export successfully queued

If the export fails during processing, your webhook will not be called. Monitor your webhook endpoint for delivery failures.

Use Cases

Data Pipeline Integration

1// Automatically trigger analysis when new data arrives
2app.post('/webhook/dataset-export', (req, res) => {
3 const { gcsBlob, name } = req.body;
4
5 // Download the data
6 const csvData = await downloadFile(gcsBlob.url);
7
8 // Trigger your data pipeline
9 await triggerDataPipeline(csvData, name);
10
11 res.status(200).send('OK');
12});

Scheduled Reports

1# Process weekly market reports
2@app.route('/webhook/dataset-export', methods=['POST'])
3def process_weekly_report():
4 data = request.get_json()
5
6 if 'weekly' in data['name'].lower():
7 # Send to reporting system
8 send_to_reporting_dashboard(data['gcsBlob']['url'])
9
10 return 'OK', 200

Full Documentation

Find complete technical details in our API Reference.