Training failure

#24
by myomeroz - opened

Hello good afternoon, I have tried training several times today, but I keep getting this error, I restart it and finally it is canceled, the system keeps charging me for it.

Exception: Training failed.
INFO | 2024-09-08 23:40:52 | autotrain.trainers.generic.utils:run_command:108 - Command finished.
INFO | 2024-09-08 23:40:52 | autotrain.trainers.common:pause_space:77 - Pausing space...

AttributeError: np.string_ was removed in the NumPy 2.0 release. Use np.bytes_ instead.. Did you mean: 'strings'?
Traceback (most recent call last):
File "/app/omrozvn/script.py", line 129, in
main()
File "/app/omrozvn/script.py", line 123, in main
do_train(script_args)
File "/app/omrozvn/script.py", line 26, in do_train
raise Exception("Training failed.")
Exception: Training failed.
INFO | 2024-09-08 23:46:03 | autotrain.trainers.generic.utils:run_command:108 - Command finished.
INFO | 2024-09-08 23:46:03 | autotrain.trainers.common:pause_space:77 - Pausing space...

+1. I just got this same error right after latent caching. Logs:

Caching latents:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 36/37 [00:22<00:00,  1.59it/s]
Caching latents: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 37/37 [00:23<00:00,  1.50it/s]
Caching latents: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 37/37 [00:23<00:00,  1.59it/s]
Traceback (most recent call last):
  File "/app/aerial-photography/trainer.py", line 2136, in <module>
    main(args)
  File "/app/aerial-photography/trainer.py", line 1673, in main
    accelerator.init_trackers("dreambooth-lora-sd-xl", config=vars(args))
  File "/app/env/lib/python3.10/site-packages/accelerate/accelerator.py", line 619, in _inner
    return PartialState().on_main_process(function)(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/accelerate/accelerator.py", line 2337, in init_trackers
    tracker.store_init_configuration(config)
  File "/app/env/lib/python3.10/site-packages/accelerate/tracking.py", line 79, in execute_on_main_process
    return PartialState().on_main_process(function)(self, *args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/accelerate/tracking.py", line 211, in store_init_configuration
    self.writer.add_hparams(values, metric_dict={})
  File "/app/env/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py", line 330, in add_hparams
    exp, ssi, sei = hparams(hparam_dict, metric_dict, hparam_domain_discrete)
  File "/app/env/lib/python3.10/site-packages/torch/utils/tensorboard/summary.py", line 194, in hparams
    from tensorboard.plugins.hparams.metadata import (
  File "/app/env/lib/python3.10/site-packages/tensorboard/plugins/hparams/metadata.py", line 32, in <module>
    NULL_TENSOR = tensor_util.make_tensor_proto(
  File "/app/env/lib/python3.10/site-packages/tensorboard/util/tensor_util.py", line 405, in make_tensor_proto
    numpy_dtype = dtypes.as_dtype(nparray.dtype)
  File "/app/env/lib/python3.10/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py", line 677, in as_dtype
    if type_value.type == np.string_ or type_value.type == np.unicode_:
  File "/app/env/lib/python3.10/site-packages/numpy/__init__.py", line 397, in __getattr__
    raise AttributeError(
AttributeError: `np.string_` was removed in the NumPy 2.0 release. Use `np.bytes_` instead.. Did you mean: 'strings'?
Traceback (most recent call last):
  File "/app/aerial-photography/script.py", line 129, in <module>
    main()
  File "/app/aerial-photography/script.py", line 123, in main
    do_train(script_args)
  File "/app/aerial-photography/script.py", line 26, in do_train
    raise Exception("Training failed.")
Exception: Training failed.
INFO     | 2024-09-10 14:14:31 | autotrain.trainers.generic.utils:run_command:108 - Command finished.
INFO     | 2024-09-10 14:14:31 | autotrain.trainers.common:pause_space:77 - Pausing space...```

This should be fixed now

multimodalart changed discussion status to closed

Sign up or log in to comment