EveryFileImporter

Premium Plugin for Automated File & Metadata Import into the ECM EveryDataStore

EveryFileImporter is a premium plugin for the EveryDataStore ECM platform, designed to automate the import of documents and associated metadata into structured RecordSets (also referred to as ECM DataStores). Each document requires a corresponding .json metadata file that defines its classification, metadata, and archive target. 

High-Speed Import Engine
EveryFileImporter processes up to 8,000 structured records per hour – including full metadata mapping, file attachments, and real-time logging. Optimized for high-throughput environments. Performance was benchmarked on a Linux Apache server with 12 cores and 32 GB RAM.

Batch Size per Cron Job
EveryFileImporter processes a defined batch size per execution. This allows for controlled resource usage and predictable performance even in environments with continuous file input. The batch size can be configured to align with server capacity and workload expectations.

This plugin requires a valid license.
To request a license, please contact: info@everydatastore.org

Key Features

  • Automates import of physical files and metadata into RecordSets
  • Supports folder tree and form field mapping
  • JSON-based configuration with flexible metadata structure
  • Logging and error tracking built in
  • Compatible with UploadField and multi-value fields
  • Can be scheduled via cron for hands-free operation

How It Works

  • EveryFileImporter processes *.json files placed in a monitored directory (JobPath). Each JSON file contains:
  • Target RecordSet
  • Metadata mapping (form fields)
  • Path references to documents stored on the server
  • Optional folder assignment for structured archiving
{
   "RecordSetSlug":{
      "Slug":"201898f23946e1cfb78b"
   },
   "ItemData":[
      {
         "formFieldSlug":"1cb93a7db86d198af383",
         "formFieldValue":"Invoice-2025-01"
      },
      {
         "formFieldSlug":"f088b503f474184e796c",
         "formFieldValue":"2024-08-13T00:00:00.000Z"
      }
   ],
   "Documents":[
      "/var/www/invoicing/files/invoice1.pdf"
   ],
   "FolderSlug":"87547a22b3c04a90d7c4"
}

MetaData Structure

The ItemData field maps metadata from the JSON file to the form fields of the RecordSet.
Each entry must define:

  • formFieldSlug: Unique field ID from the RecordSet Form
  • formFieldValue: The value to be assigned (string, number, date, or array)

Special case – Upload Fields:
If a form field is an UploadField, the formFieldValue must be an array of absolute file paths to the files being uploaded.

{
   "RecordSetSlug":{
      "Slug":"201898f23946e1cfb78b"
   },
   "ItemData":[
      {
         "formFieldSlug":"1cb93a7db86d198af383",
         "formFieldValue":"Invoice-2025-01"
      },
      {
         "formFieldSlug":"f088b503f474184e796c",
         "formFieldValue":"2024-08-13T00:00:00.000Z"
      },
      {
         "formFieldSlug":"63fc95c44ab63ff3de9d",
         "formFieldValue":[
            "/var/www/invoicing/files/invoice1.pdf",
          "/var/www/invoicing/files/attachment1.pdf"
         ]
      }
   ],
   "Documents":[
      
   ],
   "FolderSlug":""
}

Use Cases

  • Import scanned invoices into a finance RecordSet
  • Archive documents from an external file system with structured metadata
  • Sync files from shared drives or legacy systems into EveryDataStore ECM
  • Process files exported from external systems (e.g., ERPs or CRMs)
  • Automate classification and document ingestion from email importers

Job Configuration via Admin Interface

Via the admin panel: Frontend > Administration > FileImporter
Each job requires:

  • Active: Whether the job is enabled
  • Title: A meaningful job name
  • Description: Purpose and logic of the job
  • JobPath: Directory path where the JSON files and documents are located
  • JobLastMessage: The result of the last import operation:
Example Messages
- Error: $JobPath not exists or not readable
- Success: 8 JSON Files have been imported successfully
- Error: 3 JSON Files could not be imported
- Info: 2025-05-29 12:34:00 Job is finished

Automation via Cron

You can schedule the importer using a standard cron job.

*/5 * * * * cd /full/path/to/your/project; php vendor/silverstripe/framework/cli-script.php FileImporter 
This runs the importer every 5 minutes, checking for new files in the JobPath.

Monitoring & File Handling

EveryFileImporter automatically organizes files into the following subdirectories inside the configured JobPath:

  • done/: Successfully imported JSON files are moved here.
  • error/: Failed or invalid JSON files are moved here.
  • log/: Detailed logs of each run, including timestamps and error messages.
  • This structure allows administrators to track, verify, and debug import activity without manual intervention.

Test & Training

To support testing and training purposes, the EveryFileImporter plugin provides a built-in utility called generateTestFiles. This tool allows you to automatically generate realistic test JSON files that simulate real imports. These files help you understand the correct JSON structure and verify the behavior of the FileImporter before going live.

How it works

The file generator uses a YAML configuration file named dynamicTestFileGenerator.yml. In this file, you define how many test JSON files should be created, where they should be stored, and what type of sample data should be included. The generator automatically detects the metadata structure (FormFieldSlugs) from the selected RecordSets and fills in random dummy values for each field.

Sample dynamicTestFileGenerator.yml

dynamic_generator:
 #Number of test JSON files to generate
  fileNumber: 100

  # Output path where the JSON files should be saved
  outputPath: /var/www/invoicing/

  # Include FormFields with HasOne relation (e.g. select fields)
  allowHasOne: true

  # Include FormFields with HasMany relation (e.g. multi-select fields)
  allowHasMany: true

  # Start date range for generating random values for FormFields of type "date"
  random_date_start: '2025-01-01'

  # End date range for generating random values for FormFields of type "date"
  random_date_end: '2025-05-30'

  # Optional: Provide custom dummy values using a CSV file
  # The column names in the CSV must exactly match the FormFieldLabel
  csvpath: /full/path/to/your/dummyData.csv

  # Define one or more RecordSet Slugs to generate test files for
  RecordSets:
    - Slug: b078e3a652270600cafb

  # Define test files (images and documents) that will be attached to each generated RecordSetItem
  testFiles:
    images:
      - /full/path/to/your/testfiles/invoice1.jpg
      - /full/path/to/your/testfiles/invoice2.jpg
      - /full/path/to/your/testfiles/invoice3.jpg
    documents:
      - /full/path/to/your/testfiles/invoice1.pdf
      - /full/path/to/your/testfiles/invoice2.pdf
      - /full/path/to/your/testfiles/invoice3.pdf

Commands to Run

After editing and saving the dynamicTestFileGenerator.yml, execute the following commands in your terminal:

# Refresh SilverStripe cache
cd /full/path/to/your/project
php vendor/silverstripe/framework/cli-script.php dev/build/?flush=all

# Generate test JSON files
php vendor/silverstripe/framework/cli-script.php FileImporter/generateTestFiles

# Start JSON import job
php vendor/silverstripe/framework/cli-script.php FileImporter/

Speed & Performance

The EveryFileImporter plugin is engineered for high throughput and optimal efficiency. In performance tests conducted on a dedicated Linux server equipped with 12 CPU cores and 32 GB RAM, the importer achieved an average processing rate of 7,000 to 8,000 JSON files per hour.

Please note that actual performance depends on several factors, including:

  • The size and number of attached documents per import
  • The complexity of metadata structures
  • The logging level and mechanisms configured
  • CPU & RAM Size
  • Network Connection
  • PHP Configuration: memory_limit, max_execution_time, and opcache
  • Batch Size & Parallel Jobs

To maximize performance, we strongly recommend running EveryFileImporter on a dedicated Linux instance with a remote database connection to your primary EveryDataStore ECM system. This separation ensures clean resource allocation and optimal parallel processing. By isolating the importer process and reducing database latency, users can achieve significantly better performance, especially in high-volume or enterprise-scale scenarios.