Skip to content

extremely slow when exporting plain python list to sbdf #93

@lwlwlwlw

Description

@lwlwlwlw

The execution time of blew code differs greatly when n is large.

import spotfire.sbdf as sb
import random
import pandas as pd
import numpy as np

n=100000

rand = [random.normalvariate(mu=0, sigma=1) for n in range(n)]
sb.export_data(rand,"d:/tmp/slow.sbdf")

If n=100,000 it only took 2 seconds.
If n=1,000,000 it took more than 14 minutes, both CPU(5% on a machine with 20 logical processors ) and memory(140MB) usage is not high.

The execution time of "random.normalvariate()" doesn't change much so sb.export_data() is contributing the most.

If casting rand to pandas DataFrame / Series or numpy array in advance, then it took less than 1 second to finish.

rand = np.array([random.normalvariate(mu=0, sigma=1) for n in range(n)])

Above took 0.7 seconds.

rand = pd.DataFrame([random.normalvariate(mu=0, sigma=1) for n in range(n)])

Above took 0.4 seconds. (fastest)

rand = pd.Series([random.normalvariate(mu=0, sigma=1) for n in range(n)])

Above took 1.4 seconds. (slowest)

It took me quite a while to figure out this problem and it's hard to notice because sb.export_data() does accept plain python list as the first argument.
Please improve this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions