Hi, I want to know what is the best way to keep the databases I use in different projects? I use a lot of CSVs that I need to prepare every time I’m working with them (I just copy paste the code from other projects) but would like to make some module that I can import and it have all the processes of the databases for example for this database I usually do columns = [(configuration of, my columns)], names = [names], dates = [list of columns dates], dtypes ={column: type},

then database_1 = pd.read_fwf(**kwargs), database_2 = pd.read_fwf(**kwargs), database_3 = pd.read_fwf(**kwargs)…

Then database = pd.concat([database_1…])

But I would like to have a module that I could import and have all my databases and configuration of ETL in it so I could just do something like ‘database = my_module.dabase’ to import the database, without all that process everytime.

Thanks for any help.

  • @gedhrel
    link
    1
    edit-2
    7 months ago

    If it is the first thing, just put the db setup code you’re using in one file, call it “database.py

    database.py

    # the code you commonly use, ending with
    database = ...
    

    From a second file in the same directory, write: main_program.py

    from database import database
    # The first "database" here is the module name.
    # The second "database" is a variable you set inside that module.
    # You can also write this as follows:
    # import database
    # ... and use `database.database` to refer to the same thing
    # but that involves "stuttering" throughout your code.
    
    # use `database` as you would before - it refers to the "database" object that was found in the "database.py" module
    

    then run it with python main_program.py

    The main thing to realise here is that there are two names involved. One’s the module, the other is the variable (or function name) you set inside that module that you want to get access to.